.. _bayesianwithpythonrst:

===================================
2A.ml - Bayesian models with Python
===================================


.. only:: html

    **Links:** :download:`notebook <bayesian_with_python.ipynb>`, :downloadlink:`html <bayesian_with_python2html.html>`, :download:`python <bayesian_with_python.py>`, :downloadlink:`slides <bayesian_with_python.slides.html>`, :githublink:`GitHub|_doc/notebooks/2a/bayesian_with_python.ipynb|*`


Modèles de mélanges de lois. Statistiques bayésiennes. *bayespy*,
*scikit-learn*.

.. code:: ipython3

    from jyquickhelper import add_notebook_menu
    add_notebook_menu()


.. contents::
    :local:


You can read `Probabilistic Programming and Bayesian Methods for
Hackers <http://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Prologue/Prologue.ipynb>`__.
Results might be different between examples. The example used is the
same but the default parameters the optimisation uses are different.

We try different python model to deal with a Bayesian problem: a
Gaussian Mixture. We will use the following example.

.. code:: ipython3

    %matplotlib inline

.. code:: ipython3

    import matplotlib.pyplot as plt

.. code:: ipython3

    import numpy as np
    y0 = np.random.multivariate_normal([0, 0], [[2, 0], [0, 0.1]], size=50)
    y1 = np.random.multivariate_normal([0, 0], [[0.1, 0], [0, 2]], size=50)
    y2 = np.random.multivariate_normal([5, 2], [[2, -1.5], [-1.5, 2]], size=50)
    y3 = np.random.multivariate_normal([-2, -2], [[0.5, 0], [0, 0.5]], size=50)
    y = np.vstack([y0, y1, y2, y3])
    X=y
    
    fig, ax = plt.subplots(1, 1, figsize=(10, 7))
    ax.plot(y[:,0], y[:,1], "o");


.. image:: bayesian_with_python_6_0.png


.. raw:: html

   <h3 id="bayespy">

bayespy

.. raw:: html

   </h3>

The module `bayespy <http://www.bayespy.org/>`__ allows to build and
estimate simple bayesian models. I just replicate the example on the
`Gaussian mixture model <http://www.bayespy.org/examples/gmm.html>`__.

We define the model:

.. code:: ipython3

    from bayespy import __version__ as v1
    from numpy import __version__ as v2
    v1, v2


.. parsed-literal::
    ('0.5.22', '1.22.1')


Si cela ne marche pas avec la version 1.14 de numpy, vous devriez
essayez la version 1.13 (a priori, cette
`exception <https://github.com/numpy/numpy/blob/master/numpy/core/einsumfunc.py#L710>`__
pose problème).

.. code:: ipython3

    N = 200  # number of data vectors
    D = 2    # dimension
    K = 10   # maximum number of clusters

.. code:: ipython3

    from bayespy.nodes import Dirichlet, Categorical, Gaussian, Wishart, Mixture
    alpha = Dirichlet(1e-5*np.ones(K), name='alpha')
    Z = Categorical(alpha, plates=(N,), name='z')
    mu = Gaussian(np.zeros(D), 1e-5*np.identity(D), plates=(K,), name='mu')
    sigma = Wishart(D, 1e-5*np.identity(D), plates=(K,), name='Lambda')
    Y = Mixture(Z, Gaussian, mu, sigma, name='Y')

.. code:: ipython3

    Z.initialize_from_random()

.. code:: ipython3

    from bayespy.inference import VB
    Q = VB(Y, mu, sigma, Z, alpha)

.. code:: ipython3

    Y.observe(y)

.. code:: ipython3

    import time
    if not hasattr(time, 'clock'):
        # bayespy still clock and it was removed in python 3.8
        setattr(time, 'clock', time.perf_counter)

.. code:: ipython3

    Q.update(repeat=1000)


.. parsed-literal::
    Iteration 1: loglike=-1.497701e+03 (0.012 seconds)
    Iteration 2: loglike=-1.359319e+03 (0.009 seconds)
    Iteration 3: loglike=-1.345008e+03 (0.007 seconds)
    Iteration 4: loglike=-1.333128e+03 (0.007 seconds)
    Iteration 5: loglike=-1.321888e+03 (0.008 seconds)
    Iteration 6: loglike=-1.308002e+03 (0.007 seconds)
    Iteration 7: loglike=-1.290391e+03 (0.007 seconds)
    Iteration 8: loglike=-1.275647e+03 (0.006 seconds)
    Iteration 9: loglike=-1.264646e+03 (0.008 seconds)
    Iteration 10: loglike=-1.251051e+03 (0.006 seconds)
    Iteration 11: loglike=-1.229172e+03 (0.006 seconds)
    Iteration 12: loglike=-1.182531e+03 (0.006 seconds)
    Iteration 13: loglike=-1.162046e+03 (0.007 seconds)
    Iteration 14: loglike=-1.129390e+03 (0.008 seconds)
    Iteration 15: loglike=-1.114487e+03 (0.007 seconds)
    Iteration 16: loglike=-1.089347e+03 (0.008 seconds)
    Iteration 17: loglike=-1.086340e+03 (0.007 seconds)
    Iteration 18: loglike=-1.083643e+03 (0.008 seconds)
    Iteration 19: loglike=-1.080921e+03 (0.007 seconds)
    Iteration 20: loglike=-1.077116e+03 (0.008 seconds)
    Iteration 21: loglike=-1.064775e+03 (0.007 seconds)
    Iteration 22: loglike=-1.052069e+03 (0.006 seconds)
    Iteration 23: loglike=-1.031870e+03 (0.006 seconds)
    Iteration 24: loglike=-1.020058e+03 (0.006 seconds)
    Iteration 25: loglike=-1.008205e+03 (0.011 seconds)
    Iteration 26: loglike=-9.924005e+02 (0.011 seconds)
    Iteration 27: loglike=-9.920968e+02 (0.016 seconds)
    Iteration 28: loglike=-9.917380e+02 (0.018 seconds)
    Iteration 29: loglike=-9.912025e+02 (0.008 seconds)
    Iteration 30: loglike=-9.903257e+02 (0.009 seconds)
    Iteration 31: loglike=-9.892435e+02 (0.008 seconds)
    Iteration 32: loglike=-9.877160e+02 (0.007 seconds)
    Iteration 33: loglike=-9.859192e+02 (0.006 seconds)
    Iteration 34: loglike=-9.836574e+02 (0.009 seconds)
    Iteration 35: loglike=-9.814675e+02 (0.007 seconds)
    Iteration 36: loglike=-9.798386e+02 (0.006 seconds)
    Iteration 37: loglike=-9.793723e+02 (0.006 seconds)
    Iteration 38: loglike=-9.792857e+02 (0.006 seconds)
    Iteration 39: loglike=-9.792329e+02 (0.008 seconds)
    Iteration 40: loglike=-9.791737e+02 (0.006 seconds)
    Iteration 41: loglike=-9.791036e+02 (0.008 seconds)
    Iteration 42: loglike=-9.790199e+02 (0.007 seconds)
    Iteration 43: loglike=-9.789201e+02 (0.006 seconds)
    Iteration 44: loglike=-9.788011e+02 (0.006 seconds)
    Iteration 45: loglike=-9.786598e+02 (0.006 seconds)
    Iteration 46: loglike=-9.784926e+02 (0.007 seconds)
    Iteration 47: loglike=-9.782935e+02 (0.007 seconds)
    Iteration 48: loglike=-9.780498e+02 (0.007 seconds)
    Iteration 49: loglike=-9.777315e+02 (0.006 seconds)
    Iteration 50: loglike=-9.772782e+02 (0.005 seconds)
    Iteration 51: loglike=-9.766198e+02 (0.006 seconds)
    Iteration 52: loglike=-9.757762e+02 (0.010 seconds)
    Iteration 53: loglike=-9.748480e+02 (0.011 seconds)
    Iteration 54: loglike=-9.738679e+02 (0.011 seconds)
    Iteration 55: loglike=-9.728080e+02 (0.007 seconds)
    Iteration 56: loglike=-9.716088e+02 (0.006 seconds)
    Iteration 57: loglike=-9.701531e+02 (0.005 seconds)
    Iteration 58: loglike=-9.684097e+02 (0.006 seconds)
    Iteration 59: loglike=-9.668369e+02 (0.007 seconds)
    Iteration 60: loglike=-9.658912e+02 (0.006 seconds)
    Iteration 61: loglike=-9.654709e+02 (0.004 seconds)
    Iteration 62: loglike=-9.652997e+02 (0.000 seconds)
    Iteration 63: loglike=-9.652227e+02 (0.000 seconds)
    Iteration 64: loglike=-9.651824e+02 (0.023 seconds)
    Iteration 65: loglike=-9.651584e+02 (0.006 seconds)
    Iteration 66: loglike=-9.651426e+02 (0.008 seconds)
    Iteration 67: loglike=-9.651314e+02 (0.007 seconds)
    Iteration 68: loglike=-9.651230e+02 (0.007 seconds)
    Converged at iteration 68.


.. code:: ipython3

    import bayespy.plot as bpplt
    fig, ax = plt.subplots(1, 1, figsize=(10, 7))
    bpplt.gaussian_mixture_2d(Y, alpha=alpha, scale=2, color="black", fill=True, axes=ax)


.. image:: bayesian_with_python_19_0.png


We get the result of the optimization:

.. code:: ipython3

    from bayespy.inference.vmp.nodes.gaussian import GaussianWishartMoments
    parent = Y.parents[1]
    (mu, _, sigma, _) = parent.get_moments()
    mu, sigma


.. parsed-literal::
    (array([[-1.36058471e-01, -5.54785591e-01],
            [ 0.00000000e+00,  0.00000000e+00],
            [ 0.00000000e+00,  0.00000000e+00],
            [ 7.32146904e+00,  4.25483005e+00],
            [ 0.00000000e+00,  0.00000000e+00],
            [ 8.75857358e-02,  9.11669540e-03],
            [ 3.72250304e+04,  1.40121275e+04],
            [ 0.00000000e+00,  0.00000000e+00],
            [ 0.00000000e+00,  0.00000000e+00],
            [-7.71374280e+00, -7.35395097e+00]]),
     array([[[ 6.52316512e-01, -1.63828746e-01],
             [-1.63828746e-01,  8.96234523e+00]],
            [[ 2.00000000e+05,  0.00000000e+00],
             [ 0.00000000e+00,  2.00000000e+05]],
            [[ 2.00000000e+05,  0.00000000e+00],
             [ 0.00000000e+00,  2.00000000e+05]],
            [[ 1.28998747e+00,  6.13731151e-01],
             [ 6.13731151e-01,  6.90986759e-01]],
            [[ 2.00000000e+05,  0.00000000e+00],
             [ 0.00000000e+00,  2.00000000e+05]],
            [[ 8.65297348e+00,  9.29871380e-03],
             [ 9.29871380e-03,  3.55806853e-01]],
            [[ 4.64596327e+03,  1.74893727e+03],
             [ 1.74893727e+03,  6.60113981e+02]],
            [[ 2.00000000e+05,  0.00000000e+00],
             [ 0.00000000e+00,  2.00000000e+05]],
            [[ 2.00000000e+05,  0.00000000e+00],
             [ 0.00000000e+00,  2.00000000e+05]],
            [[ 2.88152054e+00,  9.46475241e-01],
             [ 9.46475241e-01,  2.54402981e+00]]]))


.. code:: ipython3

    import numpy as np
    mu2 = np.linalg.solve(sigma, mu)
    mu2


.. parsed-literal::
    array([[-0.22515766, -0.06601764],
           [ 0.        ,  0.        ],
           [ 0.        ,  0.        ],
           [ 4.75563304,  1.93368381],
           [ 0.        ,  0.        ],
           [ 0.01009479,  0.02535878],
           [ 8.21785907, -0.54595493],
           [ 0.        ,  0.        ],
           [ 0.        ,  0.        ],
           [-1.96797779, -2.158508  ]])


The way you can build your model is quite nice but it still needs some
development. scikit-learn proposes a better interface.

.. raw:: html

   <h3 id="scikit">

scikit-learn

.. raw:: html

   </h3>

We try to solve the same problem with another module:
`scikit-learn <http://scikit-learn.org/stable/auto_examples/mixture/plot_gmm.html#example-mixture-plot-gmm-py>`__.

.. code:: ipython3

    from sklearn import mixture
    gmm = mixture.GaussianMixture (n_components=10, covariance_type='full')
    gmm.fit(X)


.. parsed-literal::
    GaussianMixture(n_components=10)


.. code:: ipython3

    dpgmm = mixture.BayesianGaussianMixture(n_components=10, covariance_type='full')
    dpgmm.fit(X)


.. parsed-literal::
    BayesianGaussianMixture(n_components=10)


.. code:: ipython3

    import itertools
    import matplotlib as mpl
    color_iter = itertools.cycle(['r', 'g', 'b', 'c', 'm'])
    f, axarr = plt.subplots(1, 2, figsize=(14,7))
    
    for i, (clf, title) in enumerate([(gmm, 'GMM'),
                                      (dpgmm, 'Dirichlet Process GMM')]):
        splot = axarr[i]
        Y_ = clf.predict(X)
        for i, (mean, covar, color) in enumerate(zip(
                clf.means_, clf.covariances_, color_iter)):
            v, w = np.linalg.eigh(covar)
            u = w[0] / np.linalg.norm(w[0])
            # as the DP will not use every component it has access to
            # unless it needs it, we shouldn't plot the redundant
            # components.
            if not np.any(Y_ == i):
                continue
            splot.scatter(X[Y_ == i, 0], X[Y_ == i, 1], 0.8, color=color)
    
            # Plot an ellipse to show the Gaussian component
            angle = np.arctan(u[1] / u[0])
            angle = 180 * angle / np.pi  # convert to degrees
            ell = mpl.patches.Ellipse(mean, v[0], v[1], 180 + angle, color=color)
            ell.set_clip_box(splot.bbox)
            ell.set_alpha(0.5)
            splot.add_artist(ell)
            splot.set_title(title)


.. image:: bayesian_with_python_28_0.png