module datasets.artificial

Short summary

module pymlbenchmark.datasets.artificial

Artificial datasets.

source on GitHub

Functions

function

truncated documentation

random_binary_classification

Returns data for a binary classification problem (linear) with N observations and dim features.

random_regression

Returns data for a binary classification problem (linear) with N observations and dim features.

Documentation

Artificial datasets.

source on GitHub

pymlbenchmark.datasets.artificial.rand(d0, d1, ..., dn)

Random values in a given shape.

Note

This is a convenience function for users porting code from Matlab, and wraps random_sample. That function takes a tuple to specify the size of the output, which is consistent with other NumPy functions like numpy.zeros and numpy.ones.

Create an array of the given shape and populate it with random samples from a uniform distribution over [0, 1).

Parameters

d0, d1, …, dnint, optional

The dimensions of the returned array, must be non-negative. If no argument is given a single Python float is returned.

Returns

outndarray, shape (d0, d1, ..., dn)

Random values.

See Also

random

Examples

>>> np.random.rand(3,2)
array([[ 0.14022471,  0.96360618],  #random
       [ 0.37601032,  0.25528411],  #random
       [ 0.49313049,  0.94909878]]) #random
pymlbenchmark.datasets.artificial.randn(d0, d1, ..., dn)

Return a sample (or samples) from the “standard normal” distribution.

Note

This is a convenience function for users porting code from Matlab, and wraps standard_normal. That function takes a tuple to specify the size of the output, which is consistent with other NumPy functions like numpy.zeros and numpy.ones.

Note

New code should use the standard_normal method of a default_rng() instance instead; please see the Quick Start.

If positive int_like arguments are provided, randn generates an array of shape (d0, d1, ..., dn), filled with random floats sampled from a univariate “normal” (Gaussian) distribution of mean 0 and variance 1. A single float randomly sampled from the distribution is returned if no argument is provided.

Parameters

d0, d1, …, dnint, optional

The dimensions of the returned array, must be non-negative. If no argument is given a single Python float is returned.

Returns

Zndarray or float

A (d0, d1, ..., dn)-shaped array of floating-point samples from the standard normal distribution, or a single such float if no parameters were supplied.

See Also

standard_normal : Similar, but takes a tuple as its argument. normal : Also accepts mu and sigma arguments. random.Generator.standard_normal: which should be used for new code.

Notes

For random samples from N(\mu, \sigma^2), use:

sigma * np.random.randn(...) + mu

Examples

>>> np.random.randn()
2.1923875335537315  # random

Two-by-four array of samples from N(3, 6.25):

>>> 3 + 2.5 * np.random.randn(2, 4)
array([[-4.49401501,  4.00950034, -1.81814867,  7.29718677],   # random
       [ 0.39924804,  4.68456316,  4.99394529,  4.84057254]])  # random
pymlbenchmark.datasets.artificial.random_binary_classification(N, dim)

Returns data for a binary classification problem (linear) with N observations and dim features.

Parameters:
  • N – number of observations

  • dim – number of features

Returns:

X, y

<<<

from pymlbenchmark.datasets import random_binary_classification
X, y = random_binary_classification(3, 6)
print(y)
print(X)

>>>

    [1 0 0]
    [[0.851 0.121 0.711 0.263 0.44  0.297]
     [0.047 0.95  0.434 0.104 0.564 0.943]
     [0.777 0.043 0.647 0.264 0.349 0.635]]

source on GitHub

pymlbenchmark.datasets.artificial.random_regression(N, dim)

Returns data for a binary classification problem (linear) with N observations and dim features.

Parameters:
  • N – number of observations

  • dim – number of features

Returns:

X, y

<<<

from pymlbenchmark.datasets import random_regression
X, y = random_regression(3, 6)
print(y)
print(X)

>>>

    [1.706 3.282 2.874]
    [[0.359 0.48  0.899 0.103 0.274 0.381]
     [0.329 0.699 0.851 0.45  0.775 0.199]
     [0.634 0.994 0.122 0.658 0.772 0.622]]

source on GitHub