module datasets.duration

Short summary

module papierstat.datasets.duration

Jeux de données artificiel lié à la prédiction de durées.

source on GitHub

Functions

function

truncated documentation

duration_selling

Construit un jeu de données artificiel qui simule des paquets préparés par un magasin. Chaque paquet est préparé dès …

Documentation

Jeux de données artificiel lié à la prédiction de durées.

source on GitHub

papierstat.datasets.duration.duration_selling(date_begin=None, date_end=None, mean_per_day=10, sigma_per_day=5, week_pattern=None, hour_begin=9, hour_end=19, gamma_k=6.0, gamma_theta=0.25)[source]

Construit un jeu de données artificiel qui simule des paquets préparés par un magasin. Chaque paquet est préparé dès la réception d’une commande à une heure précise, il est ensuite stocké jusqu’à ce qu’un client viennent le chercher.

Paramètres
  • date_begin – première date

  • date_end – dernière date

  • hour_begin – heure d’ouverture du magasin

  • hour_end – heure de fermeture du magasin

  • week_pattern – tableau de 7 valeurs ou None pour une distribution uniforme sur les jours de la semaine

  • mean_per_day – nombre de paquets moyen par jour (suit une loi gaussienne)

  • sigma_per_day – écart type pour la loi gaussienne

  • gamma_k – paramètre k d’une loi gamma

  • gamma_theta – paramètre \theta d’une loi gamma

Renvoie

jeu de données

<<<

from papierstat.datasets.duration import duration_selling
print(duration_selling().head())

>>>

                        commande                  reception  true_duration
    0 2018-05-15 09:23:04.085856 2018-05-15 11:34:51.341110       2.196460
    1 2018-05-16 15:39:33.745830 2018-05-16 16:40:30.298305       1.015709
    2 2018-05-16 12:53:11.815556 2018-05-16 13:58:24.302246       1.086802
    3 2018-05-16 15:03:18.440118 2018-05-16 16:53:05.537821       1.829749
    4 2018-05-16 13:40:17.097781 2018-05-16 15:04:16.855638       1.399933

Les commandes sont réparties de façon uniformes sur la journée même si c’est peu probable. La durée suit une loi \Gamma. Cette durée est ajoutée à l’heure où est passée la commande, les heures nocturnes et le week-end ne sont pas comptées. La durée ne peut excéder 10h.

source on GitHub

papierstat.datasets.duration.gamma(shape, scale=1.0, size=None)

Draw samples from a Gamma distribution.

Samples are drawn from a Gamma distribution with specified parameters, shape (sometimes designated « k ») and scale (sometimes designated « theta »), where both parameters are > 0.

Paramètres
  • shape (float or array_like of floats) – The shape of the gamma distribution. Should be greater than zero.

  • scale (float or array_like of floats, optional) – The scale of the gamma distribution. Should be greater than zero. Default is equal to 1.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if shape and scale are both scalars. Otherwise, np.broadcast(shape, scale).size samples are drawn.

Renvoie

out – Drawn samples from the parameterized gamma distribution.

Type renvoyé

ndarray or scalar

Voir aussi

scipy.stats.gamma()

probability density function, distribution or cumulative density function, etc.

Notes

The probability density for the Gamma distribution is

p(x) = x^{k-1}\frac{e^{-x/\theta}}{\theta^k\Gamma(k)},

where k is the shape and \theta the scale, and \Gamma is the Gamma function.

The Gamma distribution is often used to model the times to failure of electronic components, and arises naturally in processes for which the waiting times between Poisson distributed events are relevant.

Références

1

Weisstein, Eric W. « Gamma Distribution. » From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/GammaDistribution.html

2

Wikipedia, « Gamma distribution », https://en.wikipedia.org/wiki/Gamma_distribution

Exemples

Draw samples from the distribution:

>>> shape, scale = 2., 2.  # mean=4, std=2*sqrt(2)
>>> s = np.random.gamma(shape, scale, 1000)

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> import scipy.special as sps
>>> count, bins, ignored = plt.hist(s, 50, density=True)
>>> y = bins**(shape-1)*(np.exp(-bins/scale) /
...                      (sps.gamma(shape)*scale**shape))
>>> plt.plot(bins, y, linewidth=2, color='r')
>>> plt.show()
papierstat.datasets.duration.rand(d0, d1, ..., dn)

Random values in a given shape.

Create an array of the given shape and populate it with random samples from a uniform distribution over [0, 1).

Paramètres

d1, .., dn (d0,) – The dimensions of the returned array, should all be positive. If no argument is given a single Python float is returned.

Renvoie

out – Random values.

Type renvoyé

ndarray, shape (d0, d1, ..., dn)

Voir aussi

random()

Notes

This is a convenience function. If you want an interface that takes a shape-tuple as the first argument, refer to np.random.random_sample .

Exemples

>>> np.random.rand(3,2)
array([[ 0.14022471,  0.96360618],  #random
       [ 0.37601032,  0.25528411],  #random
       [ 0.49313049,  0.94909878]]) #random
papierstat.datasets.duration.randn(d0, d1, ..., dn)

Return a sample (or samples) from the « standard normal » distribution.

If positive, int_like or int-convertible arguments are provided, randn generates an array of shape (d0, d1, ..., dn), filled with random floats sampled from a univariate « normal » (Gaussian) distribution of mean 0 and variance 1 (if any of the d_i are floats, they are first converted to integers by truncation). A single float randomly sampled from the distribution is returned if no argument is provided.

This is a convenience function. If you want an interface that takes a tuple as the first argument, use numpy.random.standard_normal instead.

Paramètres

d1, .., dn (d0,) – The dimensions of the returned array, should be all positive. If no argument is given a single Python float is returned.

Renvoie

Z – A (d0, d1, ..., dn)-shaped array of floating-point samples from the standard normal distribution, or a single such float if no parameters were supplied.

Type renvoyé

ndarray or float

Voir aussi

standard_normal()

Similar, but takes a tuple as its argument.

Notes

For random samples from N(\mu, \sigma^2), use:

sigma * np.random.randn(...) + mu

Exemples

>>> np.random.randn()
2.1923875335537315 #random

Two-by-four array of samples from N(3, 6.25):

>>> 2.5 * np.random.randn(2, 4) + 3
array([[-4.49401501,  4.00950034, -1.81814867,  7.29718677],  #random
       [ 0.39924804,  4.68456316,  4.99394529,  4.84057254]]) #random