{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# 1A.1 - Simuler une loi multinomiale\n", "\n", "On part d'une loi uniforme et on simule une loi multinomiale."]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["Populating the interactive namespace from numpy and matplotlib\n"]}], "source": ["%matplotlib inline"]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [{"data": {"text/html": ["Plan\n", "
run previous cell, wait for 2 seconds
\n", ""], "text/plain": [""]}, "execution_count": 3, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Une variable qui suit une [loi multinomiale](http://fr.wikipedia.org/wiki/Loi_multinomiale) est une variable \u00e0 valeurs enti\u00e8res qui prend ses valeurs dans un ensemble fini, et chacune de ces valeurs a une probabilit\u00e9 diff\u00e9rente."]}, {"cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [{"data": {"text/plain": [""]}, "execution_count": 4, "metadata": {}, "output_type": "execute_result"}, {"data": {"image/png": "iVBORw0KGgoAAAANSUhEUgAAAegAAAEACAYAAAB4R+XjAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGUNJREFUeJzt3VFsU+f9xvHnIFuqCpQSmrFgRzLFUWw2Eqw6SWkbyWWQ\nFDYswpDmirXdllUREqqYdlHtqqEXU6Otk1qyiwi1SIgq5AIJow08loI1lpW5o1QgMbbAmtW4Swcr\niBC2hXjnf0HrP2mSY6c4+K35fqRIPj7v7+jnF+Qn7/E5jmXbti0AAGCUOaVuAAAATEZAAwBgIAIa\nAAADEdAAABiIgAYAwEAENAAABsob0IlEQoFAQDU1Nerq6pp23LvvviuXy6X9+/fnnvP5fKqrq1Mo\nFFJjY2NxOgYA4B7gctqZzWa1bds29ff3y+PxqKGhQdFoVMFgcNK4F198UU899dSE5y3LUjKZVEVF\nRfE7BwCgjDmuoFOplPx+v3w+n9xut2KxmOLx+KRxO3fu1ObNm1VZWTlpH9+DAgDAzDkGdCaTUXV1\ndW7b6/Uqk8lMGhOPx7V161ZJt1bNn7EsS2vWrFE4HNauXbuK2TcAAGXN8RT37WE7ne3bt+uVV16R\nZVmybXvCinlgYEBVVVW6dOmS1q5dq0AgoObm5jvvGgCAMucY0B6PR+l0OredTqfl9XonjDl58qRi\nsZgk6fLlyzp8+LDcbrei0aiqqqokSZWVlWpra1MqlZoU0H6/XxcuXCjKiwEA4Mtg2bJlOn/+vPMg\n28HNmzfthx9+2P7ggw/s//73v3Z9fb199uzZacd/73vfs/fv32/btm2Pjo7a165ds23btq9fv24/\n9thj9m9+85tJNXlaQBG89NJLpW7hnsA8zz7mePYxx3dHIdnnuIJ2uVzq7u5Wa2urstms2tvbFQwG\n1dPTI0nq6OiYtnZ4eFibNm2SJI2Pj2vLli1qaWmZ0W8YAADcqxwDWpLWrVundevWTXhuumDevXt3\n7vHDDz+s999//w7bAwDg3sQ3id0DIpFIqVu4JzDPs485nn3MsTmsT8+Fl66BT6/+BgDgXlFI9rGC\nBgDAQAQ0AAAGIqABADAQAQ0AgIEIaAAADERAAwBgIAIaAAADEdAAABiIgAYAwEAENAAABiKgAQAw\nEAENAICBCGgAAAxEQAMAYKC8AZ1IJBQIBFRTU6Ourq5px7377rtyuVzav3//jGsBAMBEjgGdzWa1\nbds2JRIJnT17Vr29vfrzn/885bgXX3xRTz311IxrAQDAZI4BnUql5Pf75fP55Ha7FYvFFI/HJ43b\nuXOnNm/erMrKyhnXAgCAyRwDOpPJqLq6Orft9XqVyWQmjYnH49q6daskybKsgmsBAMDUXE47Pwtb\nJ9u3b9crr7wiy7Jk27Zs2y649jOdnZ25x5FIRJFIpOBaAEDxPPBAhUZGrpS6DWPNn79Q1659MuO6\nZDKpZDI5oxrHgPZ4PEqn07ntdDotr9c7YczJkycVi8UkSZcvX9bhw4fldrsLqv3M7QENACidW+Fs\nl7oNY42MFL74vN3nF587duzIW+MY0OFwWIODgxoaGtKSJUvU19en3t7eCWP+9re/5R5///vf14YN\nGxSNRjU+Pp63FgAATM0xoF0ul7q7u9Xa2qpsNqv29nYFg0H19PRIkjo6OmZcCwAA8rPszz40LlUD\nn352DQAovVvXD/GePL3iZFYh2cc3iQEAYCACGgAAAxHQAAAYiIAGAMBABDQAAAYioAEAMBABDQCA\ngQhoAAAMREADAGAgAhoAAAMR0AAAGIiABgDAQAQ0AAAGIqABADAQAQ0AgIEIaAAADJQ3oBOJhAKB\ngGpqatTV1TVpfzweV319vUKhkB555BEdPXo0t8/n86murk6hUEiNjY3F7RwAgDJm2bZtT7czm82q\ntrZW/f398ng8amhoUG9vr4LBYG7M6Oio5s6dK0k6c+aM2tradP78eUnS0qVLdfLkSVVUVEzfgGXJ\noQUAwF1kWZYk3pOnV5zMKiT7HFfQqVRKfr9fPp9PbrdbsVhM8Xh8wpjPwlmSrl+/roceemjCfsIX\nAICZcwzoTCaj6urq3LbX61Umk5k07sCBAwoGg1q3bp1ef/313POWZWnNmjUKh8PatWtXEdsGAKC8\nuZx23jrVkd/GjRu1ceNGHT9+XM8884z+8pe/SJIGBgZUVVWlS5cuae3atQoEAmpubp5U39nZmXsc\niUQUiUQKfwUAABgumUwqmUzOqMYxoD0ej9LpdG47nU7L6/VOO765uVnj4+P617/+pUWLFqmqqkqS\nVFlZqba2NqVSqbwBDQBAufn84nPHjh15axxPcYfDYQ0ODmpoaEhjY2Pq6+tTNBqdMObChQu5z5nf\ne+89SdKiRYt048YNjYyMSLp1IdmRI0e0YsWKGb0gAADuVY4raJfLpe7ubrW2tiqbzaq9vV3BYFA9\nPT2SpI6ODu3fv1979uyR2+3WvHnztG/fPknS8PCwNm3aJEkaHx/Xli1b1NLSMssvBwCA8uB4m9Vd\naYDbrADAGNxmlY8ht1kBAIDSIKABADAQAQ0AgIEIaAAADERAAwBgIAIaAAADEdAAABiIgAYAwEAE\nNAAABiKgAQAwEAENAICBCGgAAAxEQAMAYCACGgAAAxHQAAAYiIAGAMBAeQM6kUgoEAiopqZGXV1d\nk/bH43HV19crFArpkUce0dGjRwuuBQAAU7Ns27an25nNZlVbW6v+/n55PB41NDSot7dXwWAwN2Z0\ndFRz586VJJ05c0ZtbW06f/58QbWSZFmWHFoAANxFlmVJ4j15esXJrEKyz3EFnUql5Pf75fP55Ha7\nFYvFFI/HJ4z5LJwl6fr163rooYcKrgUAAFNzDOhMJqPq6urcttfrVSaTmTTuwIEDCgaDWrdunV5/\n/fUZ1QIAgMlcTjtvnerIb+PGjdq4caOOHz+uZ555RufOnZtRE52dnbnHkUhEkUhkRvUAAJgsmUwq\nmUzOqMYxoD0ej9LpdG47nU7L6/VOO765uVnj4+P65JNP5PV6C669PaABACg3n1987tixI2+N4ynu\ncDiswcFBDQ0NaWxsTH19fYpGoxPGXLhwIfdB93vvvSdJWrRoUUG1AABgao4raJfLpe7ubrW2tiqb\nzaq9vV3BYFA9PT2SpI6ODu3fv1979uyR2+3WvHnztG/fPsdaAACQn+NtVnelAW6zAgBjcJtVPobc\nZgUAAEqDgAYAwEAENAAABiKgAQAwEAENAICBCGgAAAxEQAMAYCACGgAAAxHQAAAYiIAGAMBABDQA\nAAYioAEAMBABDQCAgQhoAAAMREADAGCgvAGdSCQUCARUU1Ojrq6uSfvfeust1dfXq66uTo8//rhO\nnz6d2+fz+VRXV6dQKKTGxsbidg4AQBmzbIe/GJ3NZlVbW6v+/n55PB41NDSot7dXwWAwN+add97R\n8uXLtWDBAiUSCXV2durEiROSpKVLl+rkyZOqqKiYvoEC/mg1AODusCxLEu/J0ytOZhWSfY4r6FQq\nJb/fL5/PJ7fbrVgspng8PmHMqlWrtGDBAklSU1OTLl68OGE/4QsAwMw5BnQmk1F1dXVu2+v1KpPJ\nTDv+jTfe0Pr163PblmVpzZo1CofD2rVrVxHaBQDg3uBy2nnrVEdhjh07pjfffFMDAwO55wYGBlRV\nVaVLly5p7dq1CgQCam5u/uLdAgBwj3AMaI/Ho3Q6ndtOp9Pyer2Txp0+fVrPP/+8EomEFi5cmHu+\nqqpKklRZWam2tjalUqkpA7qzszP3OBKJKBKJzPR1AABgrGQyqWQyOaMax4vExsfHVVtbq7fffltL\nlixRY2PjpIvEPvzwQ61evVp79+7Vo48+mnv+xo0bymazmj9/vkZHR9XS0qKXXnpJLS0tExvgIjEA\nMAYXieVz9y4Sc1xBu1wudXd3q7W1VdlsVu3t7QoGg+rp6ZEkdXR06OWXX9aVK1e0detWSZLb7VYq\nldLw8LA2bdok6VbQb9myZVI4AwCAqTmuoO9KA6ygAcAYrKDzMeQ2KwAAUBoENAAABiKgAQAwEAEN\nAICBCGgAAAxEQAMAYCACGgAAAxHQAAAYiIAGAMBABDQAAAYioAEAMBABDQCAgQhoAAAMREADAGAg\nAhoAAAMR0AAAGChvQCcSCQUCAdXU1Kirq2vS/rfeekv19fWqq6vT448/rtOnTxdcCwAApmbZtm1P\ntzObzaq2tlb9/f3yeDxqaGhQb2+vgsFgbsw777yj5cuXa8GCBUokEurs7NSJEycKqpUky7Lk0AIA\n4C6yLEsS78nTK05mFZJ9jivoVColv98vn88nt9utWCymeDw+YcyqVau0YMECSVJTU5MuXrxYcC0A\nAJiaY0BnMhlVV1fntr1erzKZzLTj33jjDa1fv/4L1QIAgP/nctp561RHYY4dO6Y333xTAwMDM64F\nAAATOQa0x+NROp3ObafTaXm93knjTp8+reeff16JREILFy6cUa1EmDuZP3+hrl37pNRtAADuQDKZ\nVDKZnFGN40Vi4+Pjqq2t1dtvv60lS5aosbFx0oVeH374oVavXq29e/fq0UcfnVGtxAUJ+XERHYC7\nh/fkfO7eRWKOK2iXy6Xu7m61trYqm82qvb1dwWBQPT09kqSOjg69/PLLunLlirZu3SpJcrvdSqVS\n09YCAID8HFfQd6UBflvLgxU0gLuH9+R8DLnNCgAAlAYBDQCAgQhoAAAMREADAGAgAhoAAAMR0AAA\nGIiABgDAQAQ0AAAGIqABADAQAQ0AgIEIaAAADERAAwBgIAIaAAADEdAAABiIgAYAwEAENAAABsob\n0IlEQoFAQDU1Nerq6pq0/9y5c1q1apXuu+8+vfrqqxP2+Xw+1dXVKRQKqbGxsXhdAwBQ5lxOO7PZ\nrLZt26b+/n55PB41NDQoGo0qGAzmxixatEg7d+7UgQMHJtVblqVkMqmKioridw4AQBlzXEGnUin5\n/X75fD653W7FYjHF4/EJYyorKxUOh+V2u6c8hm3bxesWAIB7hGNAZzIZVVdX57a9Xq8ymUzBB7cs\nS2vWrFE4HNauXbu+eJcAANxjHE9xW5Z1RwcfGBhQVVWVLl26pLVr1yoQCKi5uXmKkZ23PY58+oNi\neeCBCo2MXCl1G8aaP3+hrl375I6PwzxPr1hzDHxZJZNJJZPJGdU4BrTH41E6nc5tp9Npeb3egg9e\nVVUl6dZp8La2NqVSqQICGsV2KzT4qGE6IyN39ovo/x+HeZ5OseYY+LKKRCKKRCK57R07duStcTzF\nHQ6HNTg4qKGhIY2Njamvr0/RaHTKsZ//rPnGjRsaGRmRJI2OjurIkSNasWJF3oYAAECeFbTL5VJ3\nd7daW1uVzWbV3t6uYDConp4eSVJHR4eGh4fV0NCga9euac6cOXrttdd09uxZ/fOf/9SmTZskSePj\n49qyZYtaWlpm/xUBAFAGLLvEl1nf+pyb04LTs+74SnjmOJ87n2OJeXZWnDnG7OP/cT7Fe7/Idxy+\nSQwAAAMR0AAAGIiABgDAQAQ0AAAGIqABADAQAQ0AgIEIaAAADERAAwBgIAIaAAADEdAAABiIgAYA\nwEAENAAABiKgAQAwEAENAICBCGgAAAxEQAMAYKC8AZ1IJBQIBFRTU6Ourq5J+8+dO6dVq1bpvvvu\n06uvvjqjWgAAMDXLtm17up3ZbFa1tbXq7++Xx+NRQ0ODent7FQwGc2MuXbqkv//97zpw4IAWLlyo\nH//4xwXXSpJlWZKmbQGy5PBPVNgRmOM87nyOJebZWXHmGLOP/8f5FO/9It9xHFfQqVRKfr9fPp9P\nbrdbsVhM8Xh8wpjKykqFw2G53e4Z1wIAgKk5BnQmk1F1dXVu2+v1KpPJFHTgO6kFAOBe53LaeetU\nxxczs9rO2x5HPv0BgIkeeKBCIyNXSt2GkebPX6hr1z4pdRuYRjKZVDKZnFGNY0B7PB6l0+ncdjqd\nltfrLejAM6vtLOiYAO5tt8KZz0enMjLyxRdUmH2RSESRSCS3vWPHjrw1jqe4w+GwBgcHNTQ0pLGx\nMfX19SkajU459vMfds+kFgAATOS4gna5XOru7lZra6uy2aza29sVDAbV09MjSero6NDw8LAaGhp0\n7do1zZkzR6+99prOnj2refPmTVkLAADyc7zN6q40wCX9eXCb1ezjNqvZxxzPPub47jDkNisAAFAa\nBDQAAAYioAEAMBABDQCAgQhoAAAMREADAGAgAhoAAAMR0AAAGIiABgDAQAQ0AAAGIqABADAQAQ0A\ngIEIaAAADERAAwBgIAIaAAADEdAAABgob0AnEgkFAgHV1NSoq6tryjEvvPCCampqVF9fr1OnTuWe\n9/l8qqurUygUUmNjY/G6BgCgzLmcdmazWW3btk39/f3yeDxqaGhQNBpVMBjMjTl06JDOnz+vwcFB\n/fGPf9TWrVt14sQJSZJlWUomk6qoqJjdVwEAQJlxXEGnUin5/X75fD653W7FYjHF4/EJYw4ePKjn\nnntOktTU1KSrV6/q448/zu23bXsW2gYAoLw5BnQmk1F1dXVu2+v1KpPJFDzGsiytWbNG4XBYu3bt\nKmbfAACUNcdT3JZlFXSQ6VbJv//977VkyRJdunRJa9euVSAQUHNz8xQjO297HPn0BwCA8pBMJpVM\nJmdU4xjQHo9H6XQ6t51Op+X1eh3HXLx4UR6PR5K0ZMkSSVJlZaXa2tqUSqUKCGgAAMpLJBJRJBLJ\nbe/YsSNvjeMp7nA4rMHBQQ0NDWlsbEx9fX2KRqMTxkSjUe3Zs0eSdOLECT344INavHixbty4oZGR\nEUnS6Oiojhw5ohUrVsz0NQEAcE9yXEG7XC51d3ertbVV2WxW7e3tCgaD6unpkSR1dHRo/fr1OnTo\nkPx+v+bOnavdu3dLkoaHh7Vp0yZJ0vj4uLZs2aKWlpZZfjkAAJQHyy7xZda3PufmSu/pWXd8JTxz\nnM+dz7HEPDtjjmcfc3x3FG+e8x2HbxIDAMBABDQAAAYioAEAMBABDQCAgQhoAAAMREADAGAgAhoA\nAAMR0AAAGIiABgDAQAQ0AAAGIqABADAQAQ0AgIEIaAAADERAAwBgIAIaAAAD5Q3oRCKhQCCgmpoa\ndXV1TTnmhRdeUE1Njerr63Xq1KkZ1QIAgCnYDsbHx+1ly5bZH3zwgT02NmbX19fbZ8+enTDm17/+\ntb1u3Trbtm37xIkTdlNTU8G19q2/Vm1LNj/T/jj+ExWEOZ79OWaemePS/zDHX7Z5zsdxBZ1KpeT3\n++Xz+eR2uxWLxRSPxyeMOXjwoJ577jlJUlNTk65evarh4eGCagEAwNQcAzqTyai6ujq37fV6lclk\nChrz0Ucf5a0FAABTcwxoy7IKOsit1ToAACgWl9NOj8ejdDqd206n0/J6vY5jLl68KK/Xq5s3b+at\nlaRly5bpwoXCfhG4VxX6i1KeoxThGOWrOHMsMc/TY45nH3N8dxRjnpctW5Z3jGNAh8NhDQ4Oamho\nSEuWLFFfX596e3snjIlGo+ru7lYsFtOJEyf04IMPavHixVq0aFHeWkk6f/78DF8WAADlzzGgXS6X\nuru71draqmw2q/b2dgWDQfX09EiSOjo6tH79eh06dEh+v19z587V7t27HWsBAEB+ls0HyAAAGKek\n3yTGF5nMrh/84AdavHixVqxYUepWylY6ndaTTz6pr33ta/r617+u119/vdQtlaX//Oc/ampq0sqV\nK7V8+XL95Cc/KXVLZSubzSoUCmnDhg2lbqUs+Xw+1dXVKRQKqbGx0XFsyVbQ2WxWtbW16u/vl8fj\nUUNDg3p7ezkNXkTHjx/XvHnz9Oyzz+rMmTOlbqcsDQ8Pa3h4WCtXrtT169f1yCOP6MCBA/w/ngU3\nbtzQ/fffr/HxcT3xxBP6+c9/rieeeKLUbZWdX/ziFzp58qRGRkZ08ODBUrdTdpYuXaqTJ0+qoqIi\n79iSraD5IpPZ19zcrIULF5a6jbL21a9+VStXrpQkzZs3T8FgUB999FGJuypP999/vyRpbGxM2Wy2\noDc4zMzFixd16NAh/fCHP+T22VlU6NyWLKAL+RIU4MtkaGhIp06dUlNTU6lbKUv/+9//tHLlSi1e\nvFhPPvmkli9fXuqWys6PfvQj/exnP9OcOfwdpdliWZbWrFmjcDisXbt2OY4t2b9C8e7XA0rv+vXr\n2rx5s1577TXNmzev1O2UpTlz5uj999/XxYsX9bvf/U7JZLLULZWVX/3qV/rKV76iUCjE6nkWDQwM\n6NSpUzp8+LB++ctf6vjx49OOLVlAF/IlKMCXwc2bN/Xtb39b3/3ud7Vx48ZSt1P2FixYoG9+85v6\n05/+VOpWysof/vAHHTx4UEuXLtXTTz+to0eP6tlnny11W2WnqqpKklRZWam2tjalUqlpx5YsoG//\nEpSxsTH19fUpGo2Wqh3gC7FtW+3t7Vq+fLm2b99e6nbK1uXLl3X16lVJ0r///W/99re/VSgUKnFX\n5eWnP/2p0um0PvjgA+3bt0+rV6/Wnj17St1WWblx44ZGRkYkSaOjozpy5IjjXTYlC+jbv8hk+fLl\n+s53vsOVr0X29NNP67HHHtNf//pXVVdX575EBsUzMDCgvXv36tixYwqFQgqFQkokEqVuq+z84x//\n0OrVq7Vy5Uo1NTVpw4YN+sY3vlHqtsoaH0MW38cff6zm5ubc/+NvfetbamlpmXY8X1QCAICBuFQP\nAAADEdAAABiIgAYAwEAENAAABiKgAQAwEAENAICBCGgAAAxEQAMAYKD/AzisU8dtDuyrAAAAAElF\nTkSuQmCC\n", "text/plain": [""]}, "metadata": {}, "output_type": "display_data"}], "source": ["import matplotlib.pyplot as plt\n", "poids = [ 0.2, 0.15, 0.15, 0.1, 0.4 ]\n", "valeur = [ 0,1,2,3,4 ]\n", "plt.figure(figsize=(8,4))\n", "plt.bar(valeur,poids)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Lorsqu'on simule une telle loi, chaque valeur a une probabilit\u00e9 de sortir proportionnelle \u00e0 chaque poids. La fonction [numpy.random.multinomial](http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.multinomial.html#numpy.random.multinomial) permet de calculer cela."]}, {"cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [{"data": {"text/plain": ["array([ 0.211, 0.148, 0.15 , 0.089, 0.402])"]}, "execution_count": 5, "metadata": {}, "output_type": "execute_result"}], "source": ["import numpy.random as rnd\n", "draw = rnd.multinomial(1000, poids)\n", "draw / sum(draw)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Pour avoir 1000 tirages plut\u00f4t que l'aggr\u00e9gation des 1000 tirages :"]}, {"cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [{"data": {"text/plain": ["array([[0, 0, 0, 1, 0],\n", " [0, 0, 1, 0, 0],\n", " [0, 1, 0, 0, 0],\n", " ..., \n", " [0, 0, 0, 0, 1],\n", " [0, 0, 1, 0, 0],\n", " [0, 1, 0, 0, 0]])"]}, "execution_count": 6, "metadata": {}, "output_type": "execute_result"}], "source": ["draw = rnd.multinomial(1, poids, 1000)\n", "draw"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Algorithme de simulation\n", "\n", "Tout d'abord, on calcule la distribution cumul\u00e9e (ou [fonction de r\u00e9partition](http://fr.wikipedia.org/wiki/Fonction_de_r%C3%A9partition)). Le calcule propos\u00e9 utilise la fonction [numpy.cumsum](http://docs.scipy.org/doc/numpy/reference/generated/numpy.cumsum.html)."]}, {"cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["[ 0.2 0.35 0.5 0.6 1. ]\n"]}, {"data": {"text/plain": [""]}, "execution_count": 7, "metadata": {}, "output_type": "execute_result"}, {"data": {"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXIAAAEACAYAAACuzv3DAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEKVJREFUeJzt3V1sU/Ufx/FPSXfBHxCG4GRtk+Fat87BtmRQkWAKPoyg\nWxRIHMZIYJJlCSF4ZdQLhxfoNF6o82IYxeDDnBcmI2YUBakQCE5lwsUQN8JiV8mSicsmKNvK+V/w\n/+/BsXZP5fAb71eyhKa/0/PNz+XNoetxDsuyLAEAjDXD7gEAAJNDyAHAcIQcAAxHyAHAcIQcAAxH\nyAHAcAlDvnXrVqWlpWnJkiWjrtmxY4d8Pp/y8vLU1NQ0pQMCAOJLGPItW7YoFAqN+nxDQ4NaW1vV\n0tKiPXv2qKKiYkoHBADElzDkq1atUmpq6qjP79+/X5s3b5YkBQIBdXV1qaOjY+omBADENen3yKPR\nqDwez8Bjt9ut9vb2yb4sAGCMpuSHnf++y9/hcEzFywIAxsA52RdwuVyKRCIDj9vb2+VyuUas83q9\nOn/+/GRPBwC3lczMTLW2tsZdM+kr8pKSEu3bt0+SdPLkSc2bN09paWkj1p0/f16WZfFlWXrllVds\nn+FW+WIv2IsbfV1nTbMvTWgvxnIBnPCKfNOmTfruu+/U2dkpj8ejXbt2qa+vT5JUXl6udevWqaGh\nQV6vV7NmzdLevXsTnhQAMHUShry2tjbhi1RXV0/JMACA8ePOThsEg0G7R7hlsBeD2AtMlMMafEMq\nuSdyOHSTTgXAcNc/+TbdejGxBo6lnVyRA4DhCDkAGI6QA4DhCDkAGI6QA4DhCDkAGI6QA4DhCDkA\nGI6QA4DhCDkAGI6QA4DhCDkAGI6QA4DhCDkAGI6QA4DhCDkAGI6QA4DhCDkAGI6QA4DhCDkAGI6Q\nA4DhCDkAGI6QA4DhCDkAGI6QA4DhCDkAGI6QA4DhCDkAGI6QA4DhCDkAGI6QA4DhCDkAGI6QA4Dh\nCDkAGC5hyEOhkLKzs+Xz+VRVVTXi+c7OTq1du1b5+fnKzc3VRx99lIw5AQCjcFiWZY32ZCwWU1ZW\nlg4dOiSXy6Vly5aptrZWfr9/YE1lZaWuXr2q1157TZ2dncrKylJHR4ecTufwEzkcinMqABjgcDgk\nTbdeTKyBY2ln3CvyxsZGeb1eZWRkKCUlRaWlpaqvrx+2ZtGiReru7pYkdXd368477xwRcQBA8sQt\nbjQalcfjGXjsdrv1/fffD1uzbds2rVmzRunp6erp6dEXX3yRnEkBADcUN+TX/3kT3+7du5Wfn69w\nOKzz58/rkUce0enTpzVnzpwRaysrKwf+HAwGFQwGxz0wAExn4XBY4XB4XMfEDbnL5VIkEhl4HIlE\n5Ha7h605ceKEXn75ZUlSZmamFi9erHPnzqmwsHDE6w0NOQBgpH9f5O7atSvhMXHfIy8sLFRLS4va\n2trU29ururo6lZSUDFuTnZ2tQ4cOSZI6Ojp07tw53XPPPRMYHwAwEXGvyJ1Op6qrq1VUVKRYLKay\nsjL5/X7V1NRIksrLy/XSSy9py5YtysvL07Vr1/TGG29o/vz5N2V4AECCjx9O6Yn4+CGAMeLjh0OO\nmuzHDwEAtz5CDgCGI+QAYDhCDgCGI+QAYDhCDgCGI+QAYDhCDgCGI+QAYDhCDgCGI+QAYDhCDgCG\nI+QAYDhCDgCGI+QAYDhCDgCGI+QAYDhCDgCGI+QAYDhCDgCGI+QAYDhCDgCGI+QAYDhCDgCGI+QA\nYDhCDgCGI+QAYDhCDgCGI+QAYDhCDgCGI+QAYDhCDgCGI+QAYDhCDgCGI+QAYLiEIQ+FQsrOzpbP\n51NVVdUN14TDYRUUFCg3N1fBYHCqZwQAxOGwLMsa7clYLKasrCwdOnRILpdLy5YtU21trfx+/8Ca\nrq4urVy5UgcPHpTb7VZnZ6cWLFgw8kQOh+KcCgAGOBwOSdOtFxNr4FjaGfeKvLGxUV6vVxkZGUpJ\nSVFpaanq6+uHrfnss8+0YcMGud1uSbphxAEAyRM35NFoVB6PZ+Cx2+1WNBodtqalpUWXLl3S6tWr\nVVhYqI8//jg5kwIAbsgZ78nr/7yJr6+vT6dOndLhw4d15coVrVixQvfff798Pt+UDQkAGF3ckLtc\nLkUikYHHkUhk4C2U//N4PFqwYIFmzpypmTNn6sEHH9Tp06dvGPLKysqBPweDQX4wCgxxxx3z1dPz\np91jTKk5c1LV3X3J7jGMEg6HFQ6Hx3VM3B929vf3KysrS4cPH1Z6erqWL18+4oedv/zyi7Zv366D\nBw/q6tWrCgQCqqurU05OzvAT8cNOIC5+wDfkKPZi8KgxtDPuFbnT6VR1dbWKiooUi8VUVlYmv9+v\nmpoaSVJ5ebmys7O1du1aLV26VDNmzNC2bdtGRBwAkDxxr8in9ERckQNxcRU65Cj2YvCoyX78EABw\n6yPkAGA4Qg4AhiPkAGA4Qg4AhiPkAGA4Qg4AhiPkAGA4Qg4AhiPkAGA4Qg4AhiPkAGA4Qg4AhiPk\nAGA4Qg4AhiPkAGA4Qg4AhiPkAGC4uL+zE0g2fnM8MHn8zk7Yit/NOOQo9mLwKPZi8Ch+ZycATH+E\nHAAMR8gBwHCEHAAMR8gBwHCEHAAMR8gBwHCEHAAMR8gBwHCEHAAMR8gBwHCEHAAMR8gBwHCEHAAM\nR8gBwHCEHAAMlzDkoVBI2dnZ8vl8qqqqGnXdDz/8IKfTqS+//HJKBwQAxBc35LFYTNu3b1coFFJz\nc7Nqa2t19uzZG6574YUXtHbtWn4LEADcZHFD3tjYKK/Xq4yMDKWkpKi0tFT19fUj1r377rvauHGj\nFi5cmLRBAQA3Fjfk0WhUHo9n4LHb7VY0Gh2xpr6+XhUVFZL+/7v2AAA3S9yQjyXKO3fu1Ouvvz7w\nC0J5awUAbi5nvCddLpcikcjA40gkIrfbPWzNTz/9pNLSUklSZ2enDhw4oJSUFJWUlIx4vcrKyoE/\nB4NBBYPBSYwOANNPOBxWOBwe1zEOK84ldH9/v7KysnT48GGlp6dr+fLlqq2tld/vv+H6LVu2qLi4\nWOvXrx95ov9dsQNDXf9X33T7vpjY9zp7MeQo9mLwqDG0M+4VudPpVHV1tYqKihSLxVRWVia/36+a\nmhpJUnl5+biHAgBMrbhX5FN6Iq7IcQNceQ05ir0YPIq9GDxqDO3kzk4AMBwhBwDDEXIAMBwhBwDD\nEXIAMBwhBwDDEXIAMFzcG4KQHHfcMV89PX/aPcaUmjMnVd3dl+weA7gtcUOQDbjZYchR7MXgUezF\n4FHsxeBR3BAEANMfIQcAwxFyADAcIQcAwxFyADAcIQcAwxFyADAcIQcAwxFyADAcIQcAwxFyADAc\nIQcAwxFyADAcIQcAwxFyADAcIQcAwxFyADAcIQcAwxFyADAcIQcAwxFyADAcIQcAwxFyADAcIQcA\nwxFyADAcIQcAwxFyADDcmEIeCoWUnZ0tn8+nqqqqEc9/+umnysvL09KlS7Vy5UqdOXNmygcFAIzC\nSqC/v9/KzMy0Lly4YPX29lp5eXlWc3PzsDUnTpywurq6LMuyrAMHDliBQGDE64zhVLcNSZZkTbOv\nif33ZS/YC/Yi8V4kkvCKvLGxUV6vVxkZGUpJSVFpaanq6+uHrVmxYoXmzp0rSQoEAmpvb5/Kv2sA\nAHEkDHk0GpXH4xl47Ha7FY1GR13/wQcfaN26dVMzHQAgIWeiBQ6HY8wvduTIEX344Yc6fvz4pF/L\nFHPmpKq7+5LdYwCYJsLhsMLh8LiOSRhyl8ulSCQy8DgSicjtdo9Yd+bMGW3btk2hUEipqamjvJo1\nruFM0NMz/f5yAmCfYDCoYDA48HjXrl0Jj0n41kphYaFaWlrU1tam3t5e1dXVqaSkZNia3377TevX\nr9cnn3wir9c7/skBABOW8Irc6XSqurpaRUVFisViKisrk9/vV01NjSSpvLxcr776qv78809VVFRI\nklJSUtTY2JjcyQEAkiTH/z7ekvwTORyajm+tSA6Ndwun516Mfx8k9mLYUezF4FHsxeBRjsTHcWcn\nABiOkAOA4Qg5ABiOkAOA4Qg5ABiOkAOA4Qg5ABiOkAOA4Qg5ABiOkAOA4Qg5ABiOkAOA4Qg5ABiO\nkAOA4Qg5ABiOkAOA4Qg5ABiOkAOA4Qg5ABiOkAOA4Qg5ABiOkAOA4Qg5ABiOkAOA4Qg5ABiOkAOA\n4Qg5ABiOkAOA4Qg5ABiOkAOA4Qg5ABiOkAOA4Qg5ABiOkAOA4Qg5ABguYchDoZCys7Pl8/lUVVV1\nwzU7duyQz+dTXl6empqapnxIAMDo4oY8Fotp+/btCoVCam5uVm1trc6ePTtsTUNDg1pbW9XS0qI9\ne/aooqIiqQMDAIaLG/LGxkZ5vV5lZGQoJSVFpaWlqq+vH7Zm//792rx5syQpEAioq6tLHR0dyZsY\nADBM3JBHo1F5PJ6Bx263W9FoNOGa9vb2KR4TADCauCF3OBxjehHLsiZ0HABg8pzxnnS5XIpEIgOP\nI5GI3G533DXt7e1yuVwjXiszM1Pnz0/PwE/sL67ptxcT/wucvRhy5JTOcStgLwZNZC8yMzMTrokb\n8sLCQrW0tKitrU3p6emqq6tTbW3tsDUlJSWqrq5WaWmpTp48qXnz5iktLW3Ea7W2to5zfADAWMQN\nudPpVHV1tYqKihSLxVRWVia/36+amhpJUnl5udatW6eGhgZ5vV7NmjVLe/fuvSmDAwCuc1j/foMb\nAGCUpN/ZOZYbim4XW7duVVpampYsWWL3KLaLRCJavXq17rvvPuXm5uqdd96xeyRb/PPPPwoEAsrP\nz1dOTo5efPFFu0eyXSwWU0FBgYqLi+0exVYZGRlaunSpCgoKtHz58viLrSTq7++3MjMzrQsXLli9\nvb1WXl6e1dzcnMxT3tKOHj1qnTp1ysrNzbV7FNtdvHjRampqsizLsnp6eqx77733tv3euHz5smVZ\nltXX12cFAgHr2LFjNk9kr7feest6+umnreLiYrtHsVVGRob1xx9/jGltUq/Ix3JD0e1k1apVSk1N\ntXuMW8Ldd9+t/Px8SdLs2bPl9/v1+++/2zyVPf7zn/9Iknp7exWLxTR//nybJ7JPe3u7Ghoa9Nxz\nz434WPPtaKx7kNSQj+WGIqCtrU1NTU0KBAJ2j2KLa9euKT8/X2lpaVq9erVycnLsHsk2zz//vN58\n803NmMH/z8/hcOjhhx9WYWGh3n///bhrk7pb3BiERP766y9t3LhRb7/9tmbPnm33OLaYMWOGfv75\nZ7W3t+vo0aMKh8N2j2SLr776SnfddZcKCgq4Gpd0/PhxNTU16cCBA3rvvfd07NixUdcmNeRjuaEI\nt6++vj5t2LBBzzzzjJ544gm7x7Hd3Llz9dhjj+nHH3+0exRbnDhxQvv379fixYu1adMmffvtt3r2\n2WftHss2ixYtkiQtXLhQTz75pBobG0ddm9SQD72hqLe3V3V1dSopKUnmKWEIy7JUVlamnJwc7dy5\n0+5xbNPZ2amuri5J0t9//61vvvlGBQUFNk9lj927dysSiejChQv6/PPPtWbNGu3bt8/usWxx5coV\n9fT0SJIuX76sr7/+Ou6n3ZIa8qE3FOXk5Oipp56S3+9P5ilvaZs2bdIDDzygX3/9VR6P57a+eer4\n8eP65JNPdOTIERUUFKigoEChUMjusW66ixcvas2aNcrPz1cgEFBxcbEeeughu8e6JdzOb812dHRo\n1apVA98Xjz/+uB599NFR13NDEAAYjh8NA4DhCDkAGI6QA4DhCDkAGI6QA4DhCDkAGI6QA4DhCDkA\nGO6/kaF28V45J9sAAAAASUVORK5CYII=\n", "text/plain": [""]}, "metadata": {}, "output_type": "display_data"}], "source": ["import numpy\n", "cum = numpy.cumsum( poids ) # voir http://docs.scipy.org/doc/numpy/reference/generated/numpy.cumsum.html\n", "print(cum)\n", "plt.bar( valeur, cum)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Cette fonction de r\u00e9partition $(x,F(x))$ est croissante. On d\u00e9finit les cinq intervalles : $A_i=]F(i),F(i+1)]$. Pour simuler une loi multinomiale, il suffit de tirer un nombre al\u00e9atoire dans $[0,1]$ puis de trouver l'intervalle $A_i$ auquel il appartient. $i$ est le r\u00e9sultat cherch\u00e9. Le calcul de la distribution cumul\u00e9e utilise une autre alternative : [functools.reduce](https://docs.python.org/3.4/library/functools.html#functools.reduce)."]}, {"cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [{"data": {"text/plain": ["[0, 4, 2, 4, 2, 3, 4, 4, 4, 4]"]}, "execution_count": 8, "metadata": {}, "output_type": "execute_result"}], "source": ["import functools, random\n", "def simulation_multinomiale(poids):\n", " # calcule la fonction de r\u00e9partition\n", " # voir https://docs.python.org/3.4/library/functools.html#functools.reduce\n", " def agg(x,y): \n", " x.append(y)\n", " return x\n", " cum = functools.reduce(agg, poids, []) \n", " \n", " x = random.random()\n", " s = 0\n", " i = 0\n", " while s <= x and i < len(cum):\n", " s += cum[i]\n", " i += 1\n", " return i-1\n", "\n", "alea = [ simulation_multinomiale(poids) for i in range(0,1000) ]\n", "alea[:10]"]}, {"cell_type": "markdown", "metadata": {}, "source": ["On v\u00e9rifie que les probabilit\u00e9s d'apparition de chaque nombre sont celles attendues."]}, {"cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [{"data": {"text/plain": ["Counter({4: 405, 0: 185, 2: 153, 1: 150, 3: 107})"]}, "execution_count": 9, "metadata": {}, "output_type": "execute_result"}], "source": ["import collections\n", "c = collections.Counter(alea)\n", "c"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Une premi\u00e8re optimisation : tri des poids\n", "\n", "L'algorithme fait intervenir le calcul de la distribution cumul\u00e9e. Lorsqu'on tire un grand nombre de variable al\u00e9atoire, il est int\u00e9ressant de ne faire ce calcul qu'une seule fois puisqu'il ne change jamais. De m\u00eame, il est plus int\u00e9ressant de mettre les valeurs de plus grand poids en premier. La boucle de la fonction ``simulation_multinomiale`` s'arr\u00eatera plus t\u00f4t."]}, {"cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["[0.2, 0.15, 0.15, 0.1, 0.4] [0.4, 0.2, 0.15, 0.15, 0.1]\n", "passage 0\n", " non tri\u00e9 0.052738262983552886\n", " tri\u00e9 0.026375759856477998\n", "passage 1\n", " non tri\u00e9 0.04119122404517839\n", " tri\u00e9 0.01982565262665048\n", "passage 2\n", " non tri\u00e9 0.025675719017215215\n", " tri\u00e9 0.020590266567126037\n"]}], "source": ["def simulation_multinomiale(pc):\n", " x = random.random()\n", " s = 0\n", " i = 0\n", " while s <= x and i < len(pc):\n", " s += pc[i]\n", " i += 1\n", " return i-1\n", "\n", "def agg(x,y): \n", " x.append(y)\n", " return x\n", "poids_cumule = functools.reduce(agg, poids, []) \n", "poids_cumule_trie = functools.reduce(agg, sorted(poids, reverse=True), []) \n", "print(poids_cumule, poids_cumule_trie)\n", "\n", "import time\n", "for p in range(0,3):\n", " print(\"passage\",p)\n", " a = time.perf_counter()\n", " alea = [ simulation_multinomiale(poids_cumule) for i in range(0,10000) ]\n", " b = time.perf_counter()\n", " print(\" non tri\u00e9\", b-a)\n", " a = time.perf_counter()\n", " alea = [ simulation_multinomiale(poids_cumule_trie) for i in range(0,10000) ]\n", " b = time.perf_counter()\n", " print(\" tri\u00e9\", b-a)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["La seconde version est plus rapide.Son int\u00e9r\u00eat d\u00e9pend du nombre d'observations al\u00e9atoire \u00e0 tirer. En effet, si $K$ est le nombre de valeurs distinctes, les co\u00fbts fixes des deux m\u00e9thodes sont les suivants :\n", "\n", "- non tri\u00e9 : $O(K)$ (distribution cumulative)\n", "- tri\u00e9 : $O(K + K\\ln K)$ ((distribution cumulative + tri)\n", "\n", "Qu'en est-il pour la fonction [numpy.random.multinomial](http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.multinomial.html#numpy.random.multinomial) ?"]}, {"cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["passage 0\n", " non tri\u00e9 0.00011246838175793528\n", " tri\u00e9 5.6020372539933305e-05\n", "passage 1\n", " non tri\u00e9 4.96058262342558e-05\n", " tri\u00e9 4.704000753008586e-05\n", "passage 2\n", " non tri\u00e9 7.483637568839185e-05\n", " tri\u00e9 4.8750553105492145e-05\n"]}], "source": ["poids_trie = list(sorted(poids, reverse=True))\n", "\n", "for p in range(0,3):\n", " print(\"passage\",p)\n", " a = time.perf_counter()\n", " rnd.multinomial(10000, poids)\n", " b = time.perf_counter()\n", " print(\" non tri\u00e9\", b-a)\n", " a = time.perf_counter()\n", " rnd.multinomial(10000, poids_trie)\n", " b = time.perf_counter()\n", " print(\" tri\u00e9\", b-a)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["C'est plus rapide aussi. On voit aussi que cette fa\u00e7on de faire est beaucoup plus rapide que la version impl\u00e9ment\u00e9e en Python pur. Cela vient du faire que le module [numpy](http://www.numpy.org/) est optimis\u00e9 pour le calcul num\u00e9rique et surtout impl\u00e9ment\u00e9 en langages [C++](http://fr.wikipedia.org/wiki/C%2B%2B) et [Fortran](http://fr.wikipedia.org/wiki/Fortran)."]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Optimisation bootstrap\n", "\n", "C'est une technique inspir\u00e9 du [bootstrap](http://fr.wikipedia.org/wiki/Bootstrap_%28statistiques%29) qui est un peu moins pr\u00e9cise que la version pr\u00e9c\u00e9dente mais qui peut suffire dans bien des cas. Elle est int\u00e9ressante si on tire un grand nomrbre d'observations al\u00e9atoire de la m\u00eame loi et si $K$ est grand ($K$ = nombre de valeurs distinctes). L'id\u00e9e consiste \u00e0 g\u00e9n\u00e9rer un premier grand vecteur de nombres al\u00e9atoires puis \u00e0 tirer des nombres al\u00e9atoire dans ce vecteur."]}, {"cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["passage 0\n", " simulation_multinomiale 0.20075264012029947\n", " bootstrap 0.03310761256989281\n", "passage 1\n", " simulation_multinomiale 0.20289253282658137\n", " bootstrap 0.033825186502781435\n", "passage 2\n", " simulation_multinomiale 0.22474218868592288\n", " bootstrap 0.04704770498210564\n"]}], "source": ["K = 100\n", "poids = [ 1/K for i in range(0,K) ]\n", "poids_cumule = functools.reduce(agg, poids, []) \n", "vecteur = [ simulation_multinomiale(poids_cumule) for i in range(0,100000) ]\n", "N = len(vecteur)-1\n", "\n", "for p in range(0,3):\n", " print(\"passage\",p)\n", " a = time.perf_counter()\n", " alea = [ simulation_multinomiale(poids_cumule) for i in range(0,10000) ]\n", " b = time.perf_counter()\n", " print(\" simulation_multinomiale\", b-a)\n", " a = time.perf_counter()\n", " alea = [ vecteur [ random.randint(0,N) ] for i in range(0,10000) ]\n", " b = time.perf_counter()\n", " print(\" bootstrap\", b-a)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Cette fa\u00e7on de faire implique le stockage d'un grand nombre de variables al\u00e9atoires dans ``vecteur``. Ce proc\u00e9d\u00e9 est plus rapide car tirer un nombre al\u00e9atoire entier est plus rapide que de simuler une variable de loi multinomial."]}, {"cell_type": "code", "execution_count": 12, "metadata": {"collapsed": true}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1"}}, "nbformat": 4, "nbformat_minor": 2}