{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# 1A.1 - Calculer un chi 2 sur un tableau de contingence\n", "\n", "$\\chi_2$ et tableau de contingence, avec *numpy*, avec *scipy* ou sans."]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [{"data": {"text/html": ["<div id=\"my_id_menu_nb\">run previous cell, wait for 2 seconds</div>\n", "<script>\n", "function repeat_indent_string(n){\n", "    var a = \"\" ;\n", "    for ( ; n > 0 ; --n) {\n", "        a += \"    \";\n", "    }\n", "    return a;\n", "}\n", "var update_menu_string = function(begin, lfirst, llast, sformat, send, keep_item) {\n", "    var anchors = document.getElementsByClassName(\"section\");\n", "    if (anchors.length == 0) {\n", "        anchors = document.getElementsByClassName(\"text_cell_render rendered_html\");\n", "    }\n", "    var i,t;\n", "    var text_menu = begin;\n", "    var text_memo = \"<pre>\\nlength:\" + anchors.length + \"\\n\";\n", "    var ind = \"\";\n", "    var memo_level = 1;\n", "    var href;\n", "    var tags = [];\n", "    var main_item = 0;\n", "    for (i = 0; i <= llast; i++) {\n", "        tags.push(\"h\" + i);\n", "    }\n", "\n", "    for (i = 0; i < anchors.length; i++) {\n", "        text_memo += \"**\" + anchors[i].id + \"--\\n\";\n", "\n", "        var child = null;\n", "        for(t = 0; t < tags.length; t++) {\n", "            var r = anchors[i].getElementsByTagName(tags[t]);\n", "            if (r.length > 0) {\n", "child = r[0];\n", "break;\n", "            }\n", "        }\n", "        if (child == null){\n", "            text_memo += \"null\\n\";\n", "            continue;\n", "        }\n", "        if (anchors[i].hasAttribute(\"id\")) {\n", "            // when converted in RST\n", "            href = anchors[i].id;\n", "            text_memo += \"#1-\" + href;\n", "            // passer \u00e0 child suivant (le chercher)\n", "        }\n", "        else if (child.hasAttribute(\"id\")) {\n", "            // in a notebook\n", "            href = child.id;\n", "            text_memo += \"#2-\" + href;\n", "        }\n", "        else {\n", "            text_memo += \"#3-\" + \"*\" + \"\\n\";\n", "            continue;\n", "        }\n", "        var title = child.textContent;\n", "        var level = parseInt(child.tagName.substring(1,2));\n", "\n", "        text_memo += \"--\" + level + \"?\" + lfirst + \"--\" + title + \"\\n\";\n", "\n", "        if ((level < lfirst) || (level > llast)) {\n", "            continue ;\n", "        }\n", "        if (title.endsWith('\u00b6')) {\n", "            title = title.substring(0,title.length-1).replace(\"<\", \"&lt;\").replace(\">\", \"&gt;\").replace(\"&\", \"&amp;\")\n", "        }\n", "\n", "        if (title.length == 0) {\n", "            continue;\n", "        }\n", "\n", "        while (level < memo_level) {\n", "            text_menu += \"</ul>\\n\";\n", "            memo_level -= 1;\n", "        }\n", "        if (level == lfirst) {\n", "            main_item += 1;\n", "        }\n", "        if (keep_item != -1 && main_item != keep_item + 1) {\n", "            // alert(main_item + \" - \" + level + \" - \" + keep_item);\n", "            continue;\n", "        }\n", "        while (level > memo_level) {\n", "            text_menu += \"<ul>\\n\";\n", "            memo_level += 1;\n", "        }\n", "        text_menu += repeat_indent_string(level-2) + sformat.replace(\"__HREF__\", href).replace(\"__TITLE__\", title);\n", "    }\n", "    while (1 < memo_level) {\n", "        text_menu += \"</ul>\\n\";\n", "        memo_level -= 1;\n", "    }\n", "    text_menu += send;\n", "    //text_menu += \"\\n\" + text_memo;\n", "    return text_menu;\n", "};\n", "var update_menu = function() {\n", "    var sbegin = \"\";\n", "    var sformat = '<li><a href=\"#__HREF__\">__TITLE__</a></li>';\n", "    var send = \"\";\n", "    var keep_item = -1;\n", "    var text_menu = update_menu_string(sbegin, 2, 4, sformat, send, keep_item);\n", "    var menu = document.getElementById(\"my_id_menu_nb\");\n", "    menu.innerHTML=text_menu;\n", "};\n", "window.setTimeout(update_menu,2000);\n", "            </script>"], "text/plain": ["<IPython.core.display.HTML object>"]}, "execution_count": 2, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## formule\n", "\n", "Le test du $\\chi_2$ ([wikipedia](https://fr.wikipedia.org/wiki/Test_du_%CF%87%C2%B2)) sert \u00e0 comparer deux distributions. Il peut \u00eatre appliqu\u00e9 sur un [tableau de contingence](https://fr.wikipedia.org/wiki/Test_du_%CF%87%C2%B2#Test_du_.CF.87.C2.B2_d.27ind.C3.A9pendance) pour comparer la distributions observ\u00e9e avec la distribution qu'on observerait si les deux facteurs du tableau \u00e9taient ind\u00e9pendants. On note $M=(m_{ij})$ une matrice de dimension $I \\times J$. Le test du $\\chi_2$ se calcule comme suit :\n", "\n", "* $M = \\sum_{ij} m_{ij}$\n", "* $\\forall i, \\; m_{i \\bullet} = \\sum_j m_{ij}$\n", "* $\\forall j, \\; m_{\\bullet j} = \\sum_i m_{ij}$\n", "* $\\forall i,j \\; n_{ij} = \\frac{m_{i \\bullet} m_{\\bullet j}}{N}$\n", "\n", "Avec ces notations :\n", "\n", "$$T = \\sum_{ij} \\frac{ (m_{ij} - n_{ij})^2}{n_{ij}}$$\n", "\n", "La variable al\u00e9atoire $T$ suit asymptotiquement une loi du $\\chi_2$ \u00e0 $(I-1)(J-1)$ degr\u00e9s de libert\u00e9 ([table](http://www.apprendre-en-ligne.net/random/tablekhi2.html)). Comment le calculer avec [numpy](http://www.numpy.org/) ?"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## tableau au hasard\n", "\n", "On prend un petit tableau qu'on choisit au hasard, de pr\u00e9f\u00e9rence non carr\u00e9 pour d\u00e9tecter des erreurs de calculs."]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [{"data": {"text/plain": ["array([[ 4,  5,  2,  1],\n", "       [ 6,  3,  1,  7],\n", "       [10, 14,  6,  9]])"]}, "execution_count": 3, "metadata": {}, "output_type": "execute_result"}], "source": ["import numpy\n", "M = numpy.array([[4, 5, 2, 1],\n", "                 [6, 3, 1, 7],\n", "                 [10, 14, 6, 9]])\n", "M"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## calcul avec scipy\n", "\n", "Evidemment, il existe une fonction en python qui permet de calculer la statistique $T$ : [chi2_contingency](https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.chi2_contingency.html)."]}, {"cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [{"data": {"text/plain": ["(6.1685985038926212, 6, 0.40457120905808314)"]}, "execution_count": 4, "metadata": {}, "output_type": "execute_result"}], "source": ["from scipy.stats import chi2_contingency\n", "chi2, pvalue, degrees, expected = chi2_contingency(M)\n", "chi2, degrees, pvalue"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## calcul avec numpy"]}, {"cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [{"data": {"text/plain": ["(array([12, 17, 39]), array([20, 22,  9, 17]), 68)"]}, "execution_count": 5, "metadata": {}, "output_type": "execute_result"}], "source": ["N = M.sum()\n", "ni = numpy.array( [M[i,:].sum() for i in range(M.shape[0])] )\n", "nj = numpy.array( [M[:,j].sum() for j in range(M.shape[1])] )\n", "ni, nj, N"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Et comme c'est un usage courant, [numpy](http://www.numpy.org/) propose une fa\u00e7on de faire sans \u00e9crire une boucle avec la fonction [sum](https://docs.scipy.org/doc/numpy-1.11.0/reference/generated/numpy.sum.html) :"]}, {"cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [{"data": {"text/plain": ["(array([12, 17, 39]), array([20, 22,  9, 17]), 68)"]}, "execution_count": 6, "metadata": {}, "output_type": "execute_result"}], "source": ["ni = M.sum(axis=1)\n", "nj = M.sum(axis=0)\n", "ni, nj, N"]}, {"cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [{"data": {"text/plain": ["array([[  3.52941176,   3.88235294,   1.58823529,   3.        ],\n", "       [  5.        ,   5.5       ,   2.25      ,   4.25      ],\n", "       [ 11.47058824,  12.61764706,   5.16176471,   9.75      ]])"]}, "execution_count": 7, "metadata": {}, "output_type": "execute_result"}], "source": ["nij = ni.reshape(M.shape[0], 1) * nj / N\n", "nij"]}, {"cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [{"data": {"text/plain": ["6.1685985038926212"]}, "execution_count": 8, "metadata": {}, "output_type": "execute_result"}], "source": ["d = (M - nij) ** 2 / nij\n", "d.sum()"]}, {"cell_type": "code", "execution_count": 8, "metadata": {"collapsed": true}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1"}}, "nbformat": 4, "nbformat_minor": 2}