{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# 1A.data - D\u00e9corr\u00e9lation de variables al\u00e9atoires\n", "\n", "On construit des variables corr\u00e9l\u00e9es gaussiennes et on cherche \u00e0 construire des variables d\u00e9corr\u00e9l\u00e9es en utilisant le calcul matriciel."]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [{"data": {"text/html": ["
run previous cell, wait for 2 seconds
\n", ""], "text/plain": [""]}, "execution_count": 2, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Ce TD appliquera le calcul matriciel aux vecteurs de variables normales [corr\u00e9l\u00e9es](http://fr.wikipedia.org/wiki/Covariance) ou aussi [d\u00e9composition en valeurs singuli\u00e8res](https://fr.wikipedia.org/wiki/D%C3%A9composition_en_valeurs_singuli%C3%A8res)."]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Cr\u00e9ation d'un jeu de donn\u00e9es"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q1\n", "\n", "La premi\u00e8re \u00e9tape consiste \u00e0 construire des variables al\u00e9atoires normales corr\u00e9l\u00e9es dans une matrice $N \\times 3$. On cherche \u00e0 construire cette matrice au format [numpy](http://www.numpy.org/). Le programme suivant est un moyen de construire un tel ensemble \u00e0 l'aide de combinaisons lin\u00e9aires. Compl\u00e9tez les lignes contenant des ``....``."]}, {"cell_type": "code", "execution_count": 2, "metadata": {"collapsed": true}, "outputs": [], "source": ["import random\n", "import numpy as np\n", "\n", "def combinaison () :\n", " x = random.gauss(0,1) # g\u00e9n\u00e8re un nombre al\u00e9atoire\n", " y = random.gauss(0,1) # selon une loi normale\n", " z = random.gauss(0,1) # de moyenne null et de variance 1\n", " x2 = x\n", " y2 = 3*x + y\n", " z2 = -2*x + y + 0.2*z\n", " return [x2, y2, z2]\n", " \n", "# mat = [ ............. ]\n", "# npm = np.matrix ( mat )"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q2 \n", "\n", "A partir de la matrice ``npm``, on veut construire la matrice des corr\u00e9lations."]}, {"cell_type": "code", "execution_count": 3, "metadata": {"collapsed": true}, "outputs": [], "source": ["npm = ... # voir question pr\u00e9c\u00e9dente\n", "t = npm.transpose ()\n", "a = t * npm \n", "a /= npm.shape[0]"]}, {"cell_type": "markdown", "metadata": {}, "source": ["A quoi correspond la matrice ``a`` ? "]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Corr\u00e9lation de matrices"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q3\n", "\n", "Construire la matrice des corr\u00e9lations \u00e0 partir de la matrice ``a``. Si besoin, on pourra utiliser le module [copy](https://docs.python.org/3/library/copy.html)."]}, {"cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [{"ename": "NameError", "evalue": "name 'a' is not defined", "output_type": "error", "traceback": ["\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mcopy\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0mb\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mcopy\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcopy\u001b[0m \u001b[1;33m(\u001b[0m\u001b[0ma\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;31m# remplacer cette ligne par b = a\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 3\u001b[0m \u001b[0mb\u001b[0m \u001b[1;33m[\u001b[0m\u001b[1;36m0\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;36m0\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;36m44444444\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 4\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mb\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;31m# et comparer le r\u00e9sultat ici\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;31mNameError\u001b[0m: name 'a' is not defined"]}], "source": ["import copy\n", "b = copy.copy (a) # remplacer cette ligne par b = a\n", "b[0,0] = 44444444 \n", "print(b) # et comparer le r\u00e9sultat ici"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q4 \n", "\n", "Construire une fonction qui prend comme argument la matrice ``npm`` et qui retourne la matrice de corr\u00e9lation. Cette fonction servira plus pour v\u00e9rifier que nous avons bien r\u00e9ussi \u00e0 d\u00e9corr\u00e9ler."]}, {"cell_type": "code", "execution_count": 5, "metadata": {"collapsed": true}, "outputs": [], "source": ["def correlation(npm):\n", " # ..........\n", " return \".....\""]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Un peu de math\u00e9matiques\n", "\n", "Pour la suite, un peu de math\u00e9matique. On note $M$ la matrice ``npm``. $V=\\frac{1}{n}M'M$ correspond \u00e0 la matrice des *covariances* et elle est n\u00e9cessairement sym\u00e9trique. C'est une matrice diagonale si et seulement si les variables normales sont ind\u00e9pendantes. Comme toute matrice sym\u00e9trique, elle est diagonalisable. On peut \u00e9crire :\n", "\n", "$$\\frac{1}{n}M'M = P \\Lambda P'$$\n", "\n", "$P$ v\u00e9rifie $P'P= PP' = I$. La matrice $\\Lambda$ est diagonale et on peut montrer que toutes les valeurs propres sont positives ($\\Lambda = \\frac{1}{n}P'M'MP = \\frac{1}{n}(MP)'(MP)$).\n", " \n", "On d\u00e9finit alors la racine carr\u00e9e de la matrice $\\Lambda$ par : \n", "\n", "$$\\begin{array}{rcl} \\Lambda &=& diag(\\lambda_1,\\lambda_2,\\lambda_3) \\\\ \\Lambda^{\\frac{1}{2}} &=& diag\\left(\\sqrt{\\lambda_1},\\sqrt{\\lambda_2},\\sqrt{\\lambda_3}\\right)\\end{array}$$\n", "\n", "On d\u00e9finit ensuite la racine carr\u00e9e de la matrice $V$ :\n", "\n", "$$V^{\\frac{1}{2}} = P \\Lambda^{\\frac{1}{2}} P'$$\n", "\n", "On v\u00e9rifie que $\\left(V^{\\frac{1}{2}}\\right)^2 = P \\Lambda^{\\frac{1}{2}} P' P \\Lambda^{\\frac{1}{2}} P' = P \\Lambda^{\\frac{1}{2}}\\Lambda^{\\frac{1}{2}} P' = V = P \\Lambda P' = V$."]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Calcul de la racine carr\u00e9e"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q6\n", "\n", "Le module [numpy](http://www.numpy.org/) propose une fonction qui retourne la matrice $P$ et le vecteur des valeurs propres $L$ :\n", "\n", "```\n", "L,P = np.linalg.eig(a)\n", "```\n", "\n", "V\u00e9rifier que $P'P=I$. Est-ce rigoureusement \u00e9gal \u00e0 la matrice identit\u00e9 ?"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q7\n", "\n", "Que fait l'instruction suivante : ``np.diag(L)`` ?"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q8\n", "\n", "Ecrire une fonction qui calcule la racine carr\u00e9e de la matrice $\\frac{1}{n}M'M$ (on rappelle que $M$ est la matrice ``npm``). Voir aussi [Racine carr\u00e9e d'une matrice](https://fr.wikipedia.org/wiki/Racine_carr%C3%A9e_d%27une_matrice)."]}, {"cell_type": "markdown", "metadata": {}, "source": ["## D\u00e9corr\u00e9lation\n", "\n", "``np.linalg.inv(a)`` permet d'obtenir l'inverse de la matrice ``a``."]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q9 \n", "\n", "Chaque ligne de la matrice $M$ repr\u00e9sente un vecteur de trois variables corr\u00e9l\u00e9es. La matrice de covariance est $V=\\frac{1}{n}M'M$. Calculer la matrice de covariance de la matrice $N=M V^{-\\frac{1}{2}}$ (math\u00e9matiquement).\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q10\n", "\n", "V\u00e9rifier num\u00e9riquement."]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Simulation de variables corr\u00e9l\u00e9es"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q11\n", "\n", "A partir du r\u00e9sultat pr\u00e9c\u00e9dent, proposer une m\u00e9thode pour simuler un vecteur de variables corr\u00e9l\u00e9es selon une matrice de covariance $V$ \u00e0 partir d'un vecteur de lois normales ind\u00e9pendantes."]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q12\n", "\n", "Proposer une fonction qui cr\u00e9e cet \u00e9chantillon :"]}, {"cell_type": "code", "execution_count": 6, "metadata": {"collapsed": true}, "outputs": [], "source": ["def simultation (N, cov) :\n", " # simule un \u00e9chantillon de variables corr\u00e9l\u00e9es\n", " # N : nombre de variables\n", " # cov : matrice de covariance\n", " # ...\n", " return M"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q13\n", "\n", "V\u00e9rifier que votre \u00e9chantillon a une matrice de corr\u00e9lations proche de celle choisie pour simuler l'\u00e9chantillon."]}, {"cell_type": "code", "execution_count": 7, "metadata": {"collapsed": true}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1"}}, "nbformat": 4, "nbformat_minor": 2}