{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# 2A.ml - Texte et machine learning\n", "\n", "Revue de m\u00e9thodes de [word embedding](https://en.wikipedia.org/wiki/Word_embedding) statistiques (~ [NLP](https://en.wikipedia.org/wiki/Natural_language_processing)) ou comment transformer une information textuelle en vecteurs dans un espace vectoriel (*features*) ? Deux exercices sont ajout\u00e9s \u00e0 la fin."]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [{"data": {"text/html": ["<div id=\"my_id_menu_nb\">run previous cell, wait for 2 seconds</div>\n", "<script>\n", "function repeat_indent_string(n){\n", "    var a = \"\" ;\n", "    for ( ; n > 0 ; --n)\n", "        a += \"    \";\n", "    return a;\n", "}\n", "// look up into all sections and builds an automated menu //\n", "var update_menu_string = function(begin, lfirst, llast, sformat, send, keep_item, begin_format, end_format) {\n", "    var anchors = document.getElementsByClassName(\"section\");\n", "    if (anchors.length == 0) {\n", "        anchors = document.getElementsByClassName(\"text_cell_render rendered_html\");\n", "    }\n", "    var i,t;\n", "    var text_menu = begin;\n", "    var text_memo = \"<pre>\\nlength:\" + anchors.length + \"\\n\";\n", "    var ind = \"\";\n", "    var memo_level = 1;\n", "    var href;\n", "    var tags = [];\n", "    var main_item = 0;\n", "    var format_open = 0;\n", "    for (i = 0; i <= llast; i++)\n", "        tags.push(\"h\" + i);\n", "\n", "    for (i = 0; i < anchors.length; i++) {\n", "        text_memo += \"**\" + anchors[i].id + \"--\\n\";\n", "\n", "        var child = null;\n", "        for(t = 0; t < tags.length; t++) {\n", "            var r = anchors[i].getElementsByTagName(tags[t]);\n", "            if (r.length > 0) {\n", "child = r[0];\n", "break;\n", "            }\n", "        }\n", "        if (child == null) {\n", "            text_memo += \"null\\n\";\n", "            continue;\n", "        }\n", "        if (anchors[i].hasAttribute(\"id\")) {\n", "            // when converted in RST\n", "            href = anchors[i].id;\n", "            text_memo += \"#1-\" + href;\n", "            // passer \u00e0 child suivant (le chercher)\n", "        }\n", "        else if (child.hasAttribute(\"id\")) {\n", "            // in a notebook\n", "            href = child.id;\n", "            text_memo += \"#2-\" + href;\n", "        }\n", "        else {\n", "            text_memo += \"#3-\" + \"*\" + \"\\n\";\n", "            continue;\n", "        }\n", "        var title = child.textContent;\n", "        var level = parseInt(child.tagName.substring(1,2));\n", "\n", "        text_memo += \"--\" + level + \"?\" + lfirst + \"--\" + title + \"\\n\";\n", "\n", "        if ((level < lfirst) || (level > llast)) {\n", "            continue ;\n", "        }\n", "        if (title.endsWith('\u00b6')) {\n", "            title = title.substring(0,title.length-1).replace(\"<\", \"&lt;\")\n", "         .replace(\">\", \"&gt;\").replace(\"&\", \"&amp;\");\n", "        }\n", "        if (title.length == 0) {\n", "            continue;\n", "        }\n", "\n", "        while (level < memo_level) {\n", "            text_menu += end_format + \"</ul>\\n\";\n", "            format_open -= 1;\n", "            memo_level -= 1;\n", "        }\n", "        if (level == lfirst) {\n", "            main_item += 1;\n", "        }\n", "        if (keep_item != -1 && main_item != keep_item + 1) {\n", "            // alert(main_item + \" - \" + level + \" - \" + keep_item);\n", "            continue;\n", "        }\n", "        while (level > memo_level) {\n", "            text_menu += \"<ul>\\n\";\n", "            memo_level += 1;\n", "        }\n", "        text_menu += repeat_indent_string(level-2);\n", "        text_menu += begin_format + sformat.replace(\"__HREF__\", href).replace(\"__TITLE__\", title);\n", "        format_open += 1;\n", "    }\n", "    while (1 < memo_level) {\n", "        text_menu += end_format + \"</ul>\\n\";\n", "        memo_level -= 1;\n", "        format_open -= 1;\n", "    }\n", "    text_menu += send;\n", "    //text_menu += \"\\n\" + text_memo;\n", "\n", "    while (format_open > 0) {\n", "        text_menu += end_format;\n", "        format_open -= 1;\n", "    }\n", "    return text_menu;\n", "};\n", "var update_menu = function() {\n", "    var sbegin = \"\";\n", "    var sformat = '<a href=\"#__HREF__\">__TITLE__</a>';\n", "    var send = \"\";\n", "    var begin_format = '<li>';\n", "    var end_format = '</li>';\n", "    var keep_item = -1;\n", "    var text_menu = update_menu_string(sbegin, 2, 4, sformat, send, keep_item,\n", "       begin_format, end_format);\n", "    var menu = document.getElementById(\"my_id_menu_nb\");\n", "    menu.innerHTML=text_menu;\n", "};\n", "window.setTimeout(update_menu,2000);\n", "            </script>"], "text/plain": ["<IPython.core.display.HTML object>"]}, "execution_count": 2, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Donn\u00e9es\n", "\n", "Nous allons travailler sur des donn\u00e9es twitter collect\u00e9es avec le mot-cl\u00e9 macron : [tweets_macron_sijetaispresident_201609.zip](https://github.com/sdpython/ensae_teaching_cs/raw/master/src/ensae_teaching_cs/data/data_web/tweets_macron_sijetaispresident_201609.zip)."]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [{"data": {"text/html": ["<div>\n", "<style scoped>\n", "    .dataframe tbody tr th:only-of-type {\n", "        vertical-align: middle;\n", "    }\n", "\n", "    .dataframe tbody tr th {\n", "        vertical-align: top;\n", "    }\n", "\n", "    .dataframe thead th {\n", "        text-align: right;\n", "    }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", "  <thead>\n", "    <tr style=\"text-align: right;\">\n", "      <th></th>\n", "      <th>0</th>\n", "      <th>1</th>\n", "    </tr>\n", "  </thead>\n", "  <tbody>\n", "    <tr>\n", "      <th>index</th>\n", "      <td>776066992054861825</td>\n", "      <td>776067660979245056</td>\n", "    </tr>\n", "    <tr>\n", "      <th>nb_user_mentions</th>\n", "      <td>0</td>\n", "      <td>0</td>\n", "    </tr>\n", "    <tr>\n", "      <th>nb_extended_entities</th>\n", "      <td>0</td>\n", "      <td>0</td>\n", "    </tr>\n", "    <tr>\n", "      <th>nb_hashtags</th>\n", "      <td>1</td>\n", "      <td>1</td>\n", "    </tr>\n", "    <tr>\n", "      <th>geo</th>\n", "      <td>NaN</td>\n", "      <td>NaN</td>\n", "    </tr>\n", "    <tr>\n", "      <th>text_hashtags</th>\n", "      <td>, SiJ\u00e9taisPr\u00e9sident</td>\n", "      <td>, SiJ\u00e9taisPr\u00e9sident</td>\n", "    </tr>\n", "    <tr>\n", "      <th>annee</th>\n", "      <td>2016.0</td>\n", "      <td>2016.0</td>\n", "    </tr>\n", "    <tr>\n", "      <th>delimit_mention</th>\n", "      <td>NaN</td>\n", "      <td>NaN</td>\n", "    </tr>\n", "    <tr>\n", "      <th>lang</th>\n", "      <td>fr</td>\n", "      <td>fr</td>\n", "    </tr>\n", "    <tr>\n", "      <th>id_str</th>\n", "      <td>776066992054861824.0</td>\n", "      <td>776067660979245056.0</td>\n", "    </tr>\n", "    <tr>\n", "      <th>text_mention</th>\n", "      <td>NaN</td>\n", "      <td>NaN</td>\n", "    </tr>\n", "    <tr>\n", "      <th>retweet_count</th>\n", "      <td>4.0</td>\n", "      <td>5.0</td>\n", "    </tr>\n", "    <tr>\n", "      <th>favorite_count</th>\n", "      <td>3.0</td>\n", "      <td>8.0</td>\n", "    </tr>\n", "    <tr>\n", "      <th>type_extended_entities</th>\n", "      <td>[]</td>\n", "      <td>[]</td>\n", "    </tr>\n", "    <tr>\n", "      <th>text</th>\n", "      <td>#SiJ\u00e9taisPr\u00e9sident se serait la fin du monde.....</td>\n", "      <td>#SiJ\u00e9taisPr\u00e9sident je donnerai plus de vacance...</td>\n", "    </tr>\n", "    <tr>\n", "      <th>nb_user_photos</th>\n", "      <td>0.0</td>\n", "      <td>0.0</td>\n", "    </tr>\n", "    <tr>\n", "      <th>nb_urls</th>\n", "      <td>0.0</td>\n", "      <td>0.0</td>\n", "    </tr>\n", "    <tr>\n", "      <th>nb_symbols</th>\n", "      <td>0.0</td>\n", "      <td>0.0</td>\n", "    </tr>\n", "    <tr>\n", "      <th>created_at</th>\n", "      <td>Wed Sep 14 14:36:04 +0000 2016</td>\n", "      <td>Wed Sep 14 14:38:43 +0000 2016</td>\n", "    </tr>\n", "    <tr>\n", "      <th>delimit_hash</th>\n", "      <td>, 0, 18</td>\n", "      <td>, 0, 18</td>\n", "    </tr>\n", "  </tbody>\n", "</table>\n", "</div>"], "text/plain": ["                                                                        0  \\\n", "index                                                  776066992054861825   \n", "nb_user_mentions                                                        0   \n", "nb_extended_entities                                                    0   \n", "nb_hashtags                                                             1   \n", "geo                                                                   NaN   \n", "text_hashtags                                         , SiJ\u00e9taisPr\u00e9sident   \n", "annee                                                              2016.0   \n", "delimit_mention                                                       NaN   \n", "lang                                                                   fr   \n", "id_str                                               776066992054861824.0   \n", "text_mention                                                          NaN   \n", "retweet_count                                                         4.0   \n", "favorite_count                                                        3.0   \n", "type_extended_entities                                                 []   \n", "text                    #SiJ\u00e9taisPr\u00e9sident se serait la fin du monde.....   \n", "nb_user_photos                                                        0.0   \n", "nb_urls                                                               0.0   \n", "nb_symbols                                                            0.0   \n", "created_at                                 Wed Sep 14 14:36:04 +0000 2016   \n", "delimit_hash                                                      , 0, 18   \n", "\n", "                                                                        1  \n", "index                                                  776067660979245056  \n", "nb_user_mentions                                                        0  \n", "nb_extended_entities                                                    0  \n", "nb_hashtags                                                             1  \n", "geo                                                                   NaN  \n", "text_hashtags                                         , SiJ\u00e9taisPr\u00e9sident  \n", "annee                                                              2016.0  \n", "delimit_mention                                                       NaN  \n", "lang                                                                   fr  \n", "id_str                                               776067660979245056.0  \n", "text_mention                                                          NaN  \n", "retweet_count                                                         5.0  \n", "favorite_count                                                        8.0  \n", "type_extended_entities                                                 []  \n", "text                    #SiJ\u00e9taisPr\u00e9sident je donnerai plus de vacance...  \n", "nb_user_photos                                                        0.0  \n", "nb_urls                                                               0.0  \n", "nb_symbols                                                            0.0  \n", "created_at                                 Wed Sep 14 14:38:43 +0000 2016  \n", "delimit_hash                                                      , 0, 18  "]}, "execution_count": 3, "metadata": {}, "output_type": "execute_result"}], "source": ["from ensae_teaching_cs.data import twitter_zip\n", "df = twitter_zip(as_df=True)\n", "df.head(n=2).T"]}, {"cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [{"data": {"text/plain": ["(5088, 20)"]}, "execution_count": 4, "metadata": {}, "output_type": "execute_result"}], "source": ["df.shape"]}, {"cell_type": "markdown", "metadata": {}, "source": ["5000 tweets n'est pas assez pour tirer des conclusions mais cela donne une id\u00e9e. On supprime les valeurs manquantes."]}, {"cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [{"data": {"text/plain": ["(5087, 2)"]}, "execution_count": 5, "metadata": {}, "output_type": "execute_result"}], "source": ["data = df[[\"retweet_count\", \"text\"]].dropna()\n", "data.shape"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Construire une pond\u00e9ration\n", "\n", "Le texte est toujours d\u00e9licat \u00e0 traiter. Il n'est pas toujours \u00e9vident de sortir d'une information binaire : un mot est-il pr\u00e9sent ou pas. Les mots n'ont aucun sens num\u00e9rique. Une liste de tweets n'a pas beaucoup de sens \u00e0 part les trier par une autre colonne : les retweet par exemple."]}, {"cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [{"data": {"text/html": ["<div>\n", "<style scoped>\n", "    .dataframe tbody tr th:only-of-type {\n", "        vertical-align: middle;\n", "    }\n", "\n", "    .dataframe tbody tr th {\n", "        vertical-align: top;\n", "    }\n", "\n", "    .dataframe thead th {\n", "        text-align: right;\n", "    }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", "  <thead>\n", "    <tr style=\"text-align: right;\">\n", "      <th></th>\n", "      <th>retweet_count</th>\n", "      <th>text</th>\n", "    </tr>\n", "  </thead>\n", "  <tbody>\n", "    <tr>\n", "      <th>2038</th>\n", "      <td>842.0</td>\n", "      <td>#SiJetaisPresident travailler moins pour gagne...</td>\n", "    </tr>\n", "    <tr>\n", "      <th>2453</th>\n", "      <td>816.0</td>\n", "      <td>#SiJetaisPresident je ferais revenir l'\u00e9t\u00e9 ave...</td>\n", "    </tr>\n", "    <tr>\n", "      <th>2627</th>\n", "      <td>529.0</td>\n", "      <td>#SiJetaisPresident le mcdo livrerai \u00e0 domicile</td>\n", "    </tr>\n", "    <tr>\n", "      <th>1402</th>\n", "      <td>289.0</td>\n", "      <td>#SiJetaisPresident les devoirs \u00e7a serait de re...</td>\n", "    </tr>\n", "    <tr>\n", "      <th>2198</th>\n", "      <td>276.0</td>\n", "      <td>#SiJetaisPresident ? Pr\u00e9sident c'est pour les...</td>\n", "    </tr>\n", "  </tbody>\n", "</table>\n", "</div>"], "text/plain": ["      retweet_count                                               text\n", "2038          842.0  #SiJetaisPresident travailler moins pour gagne...\n", "2453          816.0  #SiJetaisPresident je ferais revenir l'\u00e9t\u00e9 ave...\n", "2627          529.0     #SiJetaisPresident le mcdo livrerai \u00e0 domicile\n", "1402          289.0  #SiJetaisPresident les devoirs \u00e7a serait de re...\n", "2198          276.0   #SiJetaisPresident ? Pr\u00e9sident c'est pour les..."]}, "execution_count": 6, "metadata": {}, "output_type": "execute_result"}], "source": ["data.sort_values(\"retweet_count\", ascending=False).head()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Sans cette colonne qui mesure la popularit\u00e9, il faut trouver un moyen d'extraire de l'information. On d\u00e9coupe alors en mots et on constuire un mod\u00e8le de langage : les [n-grammes](https://fr.wikipedia.org/wiki/N-gramme). Si un tweet est constitu\u00e9 de la s\u00e9quence de mots $(m_1, m_2, ..., m_k)$. On d\u00e9finit sa probabilit\u00e9 comme :\n", "\n", "$$P(tweet) = P(w_1, w_2) P(w_3 | w_2, w_1) P(w_4 | w_3, w_2) ... P(w_k | w_{k-1}, w_{k-2})$$\n", "\n", "Dans ce cas, $n=3$ car on suppose que la probabilit\u00e9 d'apparition d'un mot ne d\u00e9pend que des deux pr\u00e9c\u00e9dents. On estime chaque n-grammes comme suit :\n", "\n", "$$P(c | a, b) = \\frac{ \\# (a, b, c)}{ \\# (a, b)}$$\n", "\n", "C'est le nombre de fois o\u00f9 on observe la s\u00e9quence $(a,b,c)$ divis\u00e9 par le nombre de fois o\u00f9 on observe la s\u00e9quence $(a,b)$."]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Tokenisation\n", "\n", "D\u00e9couper en mots para\u00eet simple ``tweet.split()`` et puis il y a toujours des surprises avec le texte, la prise en compte des tirets, les majuscules, les espaces en trop. On utilse un *tokenizer* d\u00e9di\u00e9 : [TweetTokenizer](http://www.nltk.org/api/nltk.tokenize.html#nltk.tokenize.casual.TweetTokenizer) ou un tokenizer qui prend en compte le langage."]}, {"cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [{"data": {"text/plain": ["['#sij\u00e9taispr\u00e9sident',\n", " 'se',\n", " 'serait',\n", " 'la',\n", " 'fin',\n", " 'du',\n", " 'monde',\n", " '...',\n", " 'mdr',\n", " '\ud83d\ude02']"]}, "execution_count": 7, "metadata": {}, "output_type": "execute_result"}], "source": ["from nltk.tokenize import TweetTokenizer\n", "tknzr = TweetTokenizer(preserve_case=False)\n", "tokens = tknzr.tokenize(data.loc[0, \"text\"])\n", "tokens"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### n-grammes\n", "\n", "* [N-Gram-Based Text Categorization: Categorizing Text With Python](http://blog.alejandronolla.com/2013/05/20/n-gram-based-text-categorization-categorizing-text-with-python/)"]}, {"cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [{"data": {"text/plain": ["[(None, None, None, '#sij\u00e9taispr\u00e9sident'),\n", " (None, None, '#sij\u00e9taispr\u00e9sident', 'se'),\n", " (None, '#sij\u00e9taispr\u00e9sident', 'se', 'serait'),\n", " ('#sij\u00e9taispr\u00e9sident', 'se', 'serait', 'la'),\n", " ('se', 'serait', 'la', 'fin'),\n", " ('serait', 'la', 'fin', 'du'),\n", " ('la', 'fin', 'du', 'monde'),\n", " ('fin', 'du', 'monde', '...'),\n", " ('du', 'monde', '...', 'mdr'),\n", " ('monde', '...', 'mdr', '\ud83d\ude02'),\n", " ('...', 'mdr', '\ud83d\ude02', None),\n", " ('mdr', '\ud83d\ude02', None, None),\n", " ('\ud83d\ude02', None, None, None)]"]}, "execution_count": 8, "metadata": {}, "output_type": "execute_result"}], "source": ["from nltk.util import ngrams\n", "generated_ngrams = ngrams(tokens, 4, pad_left=True, pad_right=True)\n", "list(generated_ngrams)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Exercice 1 : calculer des n-grammes sur les tweets"]}, {"cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["### Nettoyage\n", "\n", "Tous les mod\u00e8les sont plus stables sans les stop-words, c'est-\u00e0-dire tous les mots pr\u00e9sents dans n'importe quel documents et qui n'apporte pas de sens (\u00e0, de, le, la, ...). Souvent, on enl\u00e8ve les accents, la ponctuation... Moins de variabilit\u00e9 signifie des statistiques plus fiable."]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Exercice 2 : nettoyer les tweets\n", "\n", "Voir [stem](http://www.nltk.org/api/nltk.stem.html#module-nltk.stem)."]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Structure de graphe\n", "\n", "On cherche cette fois-ci \u00e0 construire des coordonn\u00e9es pour chaque tweet."]}, {"cell_type": "markdown", "metadata": {}, "source": ["### matrice d'adjacence"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Une option courante est de d\u00e9couper chaque expression en mots puis de cr\u00e9er une matrice *expression x mot* ou chaque case indique la pr\u00e9sence d'un mot dans une expression."]}, {"cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [{"data": {"text/plain": ["(5087, 11924)"]}, "execution_count": 10, "metadata": {}, "output_type": "execute_result"}], "source": ["from sklearn.feature_extraction.text import CountVectorizer\n", "count_vect = CountVectorizer()\n", "counts = count_vect.fit_transform(data[\"text\"])\n", "counts.shape"]}, {"cell_type": "markdown", "metadata": {}, "source": ["On aboutit \u00e0 une matrice sparse ou chaque expression est repr\u00e9sent\u00e9e \u00e0 une vecteur ou chaque 1 repr\u00e9sente l'appartenance d'un mot \u00e0 l'ensemble."]}, {"cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [{"data": {"text/plain": ["scipy.sparse.csr.csr_matrix"]}, "execution_count": 11, "metadata": {}, "output_type": "execute_result"}], "source": ["type(counts)"]}, {"cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [{"data": {"text/plain": ["array([[0, 0, 0, 0, 0],\n", "       [0, 0, 0, 0, 0],\n", "       [0, 0, 0, 0, 0],\n", "       [0, 0, 0, 0, 0],\n", "       [0, 0, 0, 0, 0]], dtype=int64)"]}, "execution_count": 12, "metadata": {}, "output_type": "execute_result"}], "source": ["counts[:5,:5].toarray()"]}, {"cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [{"data": {"text/plain": ["'#SiJ\u00e9taisPr\u00e9sident se serait la fin du monde... mdr \ud83d\ude02'"]}, "execution_count": 13, "metadata": {}, "output_type": "execute_result"}], "source": ["data.loc[0,\"text\"]"]}, {"cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [{"data": {"text/plain": ["8"]}, "execution_count": 14, "metadata": {}, "output_type": "execute_result"}], "source": ["counts[0,:].sum()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### td-idf"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Ce genre de technique produit des matrices de tr\u00e8s grande dimension qu'il faut r\u00e9duire. On peut enlever les mots rares ou les mots tr\u00e8s fr\u00e9quents. [td-idf](https://fr.wikipedia.org/wiki/TF-IDF) est une technique qui vient des moteurs de recherche. Elle construit le m\u00eame type de matrice (m\u00eame dimension) mais associe \u00e0 chaque couple (document - mot) un poids qui d\u00e9pend de la fr\u00e9quence d'un mot globalement et du nombre de documents contenant ce mot.\n", "\n", "$$idf(t) = \\log \\frac{\\# D}{\\#\\{d \\; | \\; t \\in d \\}}$$\n", "\n", "O\u00f9 :\n", "\n", "* $\\#D$ est le nombre de tweets\n", "* $\\#\\{d \\; | \\; t \\in d \\}$ est le nombre de tweets contenant le mot $t$"]}, {"cell_type": "markdown", "metadata": {}, "source": ["$f(t,d)$ est le nombre d'occurences d'un mot $t$ dans un document $d$.\n", "\n", "$$tf(t,d) = \\frac{1}{2} + \\frac{1}{2} \\frac{f(t,d)}{\\max_{t' \\in d} f(t',d)}$$"]}, {"cell_type": "markdown", "metadata": {}, "source": ["On construit le nombre $tfidf(t,f)$\n", "\n", "$$tdidf(t,d) = tf(t,d) idf(t)$$\n", "\n", "Le terme $idf(t)$ favorise les mots pr\u00e9sent dans peu de documents, le terme $tf(t,f)$ favorise les termes r\u00e9p\u00e9t\u00e9s un grand nombre de fois dans le m\u00eame document. On applique \u00e0 la matrice pr\u00e9c\u00e9dente."]}, {"cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [{"data": {"text/plain": ["(5087, 11924)"]}, "execution_count": 15, "metadata": {}, "output_type": "execute_result"}], "source": ["from sklearn.feature_extraction.text import TfidfTransformer\n", "tfidf = TfidfTransformer()\n", "res = tfidf.fit_transform(counts)\n", "res.shape"]}, {"cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [{"data": {"text/plain": ["2.6988143126521047"]}, "execution_count": 16, "metadata": {}, "output_type": "execute_result"}], "source": ["res[0,:].sum()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Exercice 3 : tf-idf sans mot-cl\u00e9s\n", "\n", "La matrice ainsi cr\u00e9\u00e9e est de grande dimension. Il faut trouver un moyen de la r\u00e9duire avec [TfidfVectorizer](http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html)."]}, {"cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["### word2vec\n", "\n", "* [word2vec From theory to practice](http://hen-drik.de/pub/Heuer%20-%20word2vec%20-%20From%20theory%20to%20practice.pdf)\n", "* [Efficient Estimation of Word Representations in Vector Space](https://arxiv.org/abs/1301.3781)\n", "* [word2vec](https://radimrehurek.com/gensim/models/word2vec.html)\n", "\n", "Cet algorithme part d'une r\u00e9presentation des mots sous forme de vecteur en un espace de dimension N = le nombre de mots distinct. Un mot est repr\u00e9sent\u00e9 par $(0,0, ..., 0, 1, 0, ..., 0)$. L'astuce consiste \u00e0 r\u00e9duire le nombre de dimensions en compressant avec une ACP, un r\u00e9seau de neurones non lin\u00e9aires."]}, {"cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [{"data": {"text/plain": ["['#sij\u00e9taispr\u00e9sident',\n", " 'se',\n", " 'serait',\n", " 'la',\n", " 'fin',\n", " 'du',\n", " 'monde',\n", " '...',\n", " 'mdr',\n", " '\ud83d\ude02']"]}, "execution_count": 18, "metadata": {}, "output_type": "execute_result"}], "source": ["sentences = [tknzr.tokenize(_) for _ in data[\"text\"]]\n", "sentences[0]"]}, {"cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [{"name": "stderr", "output_type": "stream", "text": ["2022-02-12 18:46:39,284 : INFO : collecting all words and their counts\n", "2022-02-12 18:46:39,284 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types\n", "2022-02-12 18:46:39,331 : INFO : collected 13279 word types from a corpus of 76421 raw words and 5087 sentences\n", "2022-02-12 18:46:39,332 : INFO : Creating a fresh vocabulary\n", "2022-02-12 18:46:39,400 : INFO : Word2Vec lifecycle event {'msg': 'effective_min_count=1 retains 13279 unique words (100.0%% of original 13279, drops 0)', 'datetime': '2022-02-12T18:46:39.388519', 'gensim': '4.1.2', 'python': '3.9.5 (tags/v3.9.5:0a7dcbd, May  3 2021, 17:27:52) [MSC v.1928 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.19043-SP0', 'event': 'prepare_vocab'}\n", "2022-02-12 18:46:39,402 : INFO : Word2Vec lifecycle event {'msg': 'effective_min_count=1 leaves 76421 word corpus (100.0%% of original 76421, drops 0)', 'datetime': '2022-02-12T18:46:39.401509', 'gensim': '4.1.2', 'python': '3.9.5 (tags/v3.9.5:0a7dcbd, May  3 2021, 17:27:52) [MSC v.1928 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.19043-SP0', 'event': 'prepare_vocab'}\n", "2022-02-12 18:46:39,498 : INFO : deleting the raw counts dictionary of 13279 items\n", "2022-02-12 18:46:39,498 : INFO : sample=0.001 downsamples 46 most-common words\n", "2022-02-12 18:46:39,498 : INFO : Word2Vec lifecycle event {'msg': 'downsampling leaves estimated 56028.0861159631 word corpus (73.3%% of prior 76421)', 'datetime': '2022-02-12T18:46:39.498380', 'gensim': '4.1.2', 'python': '3.9.5 (tags/v3.9.5:0a7dcbd, May  3 2021, 17:27:52) [MSC v.1928 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.19043-SP0', 'event': 'prepare_vocab'}\n", "2022-02-12 18:46:39,663 : INFO : estimated required memory for 13279 words and 100 dimensions: 17262700 bytes\n", "2022-02-12 18:46:39,663 : INFO : resetting layer weights\n", "2022-02-12 18:46:39,679 : INFO : Word2Vec lifecycle event {'update': False, 'trim_rule': 'None', 'datetime': '2022-02-12T18:46:39.678678', 'gensim': '4.1.2', 'python': '3.9.5 (tags/v3.9.5:0a7dcbd, May  3 2021, 17:27:52) [MSC v.1928 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.19043-SP0', 'event': 'build_vocab'}\n", "2022-02-12 18:46:39,680 : INFO : Word2Vec lifecycle event {'msg': 'training model with 3 workers on 13279 vocabulary and 100 features, using sg=0 hs=0 sample=0.001 negative=5 window=5 shrink_windows=True', 'datetime': '2022-02-12T18:46:39.680669', 'gensim': '4.1.2', 'python': '3.9.5 (tags/v3.9.5:0a7dcbd, May  3 2021, 17:27:52) [MSC v.1928 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.19043-SP0', 'event': 'train'}\n", "2022-02-12 18:46:39,747 : INFO : worker thread finished; awaiting finish of 2 more threads\n", "2022-02-12 18:46:39,755 : INFO : worker thread finished; awaiting finish of 1 more threads\n", "2022-02-12 18:46:39,755 : INFO : worker thread finished; awaiting finish of 0 more threads\n", "2022-02-12 18:46:39,755 : INFO : EPOCH - 1 : training on 76421 raw words (56059 effective words) took 0.1s, 847131 effective words/s\n", "2022-02-12 18:46:39,813 : INFO : worker thread finished; awaiting finish of 2 more threads\n", "2022-02-12 18:46:39,819 : INFO : worker thread finished; awaiting finish of 1 more threads\n", "2022-02-12 18:46:39,823 : INFO : worker thread finished; awaiting finish of 0 more threads\n", "2022-02-12 18:46:39,824 : INFO : EPOCH - 2 : training on 76421 raw words (56030 effective words) took 0.1s, 935688 effective words/s\n", "2022-02-12 18:46:39,881 : INFO : worker thread finished; awaiting finish of 2 more threads\n", "2022-02-12 18:46:39,890 : INFO : worker thread finished; awaiting finish of 1 more threads\n", "2022-02-12 18:46:39,890 : INFO : worker thread finished; awaiting finish of 0 more threads\n", "2022-02-12 18:46:39,890 : INFO : EPOCH - 3 : training on 76421 raw words (55944 effective words) took 0.1s, 905191 effective words/s\n", "2022-02-12 18:46:39,952 : INFO : worker thread finished; awaiting finish of 2 more threads\n", "2022-02-12 18:46:39,963 : INFO : worker thread finished; awaiting finish of 1 more threads\n", "2022-02-12 18:46:39,971 : INFO : worker thread finished; awaiting finish of 0 more threads\n", "2022-02-12 18:46:39,972 : INFO : EPOCH - 4 : training on 76421 raw words (56072 effective words) took 0.1s, 774904 effective words/s\n", "2022-02-12 18:46:40,033 : INFO : worker thread finished; awaiting finish of 2 more threads\n", "2022-02-12 18:46:40,039 : INFO : worker thread finished; awaiting finish of 1 more threads\n", "2022-02-12 18:46:40,042 : INFO : worker thread finished; awaiting finish of 0 more threads\n", "2022-02-12 18:46:40,042 : INFO : EPOCH - 5 : training on 76421 raw words (56047 effective words) took 0.1s, 906799 effective words/s\n", "2022-02-12 18:46:40,043 : INFO : Word2Vec lifecycle event {'msg': 'training on 382105 raw words (280152 effective words) took 0.4s, 776815 effective words/s', 'datetime': '2022-02-12T18:46:40.043431', 'gensim': '4.1.2', 'python': '3.9.5 (tags/v3.9.5:0a7dcbd, May  3 2021, 17:27:52) [MSC v.1928 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.19043-SP0', 'event': 'train'}\n", "2022-02-12 18:46:40,044 : INFO : Word2Vec lifecycle event {'params': 'Word2Vec(vocab=13279, vector_size=100, alpha=0.025)', 'datetime': '2022-02-12T18:46:40.044429', 'gensim': '4.1.2', 'python': '3.9.5 (tags/v3.9.5:0a7dcbd, May  3 2021, 17:27:52) [MSC v.1928 64 bit (AMD64)]', 'platform': 'Windows-10-10.0.19043-SP0', 'event': 'created'}\n"]}], "source": ["import gensim, logging\n", "logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)\n", "\n", "model = gensim.models.Word2Vec(sentences, min_count=1)"]}, {"cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [{"data": {"text/plain": ["[('mon', 0.9989398121833801),\n", " ('pays', 0.9989068508148193),\n", " ('ma', 0.9988953471183777),\n", " ('toutes', 0.9988815784454346),\n", " ('leur', 0.9987949132919312),\n", " ('tout', 0.9987940192222595),\n", " ('ses', 0.9987934231758118),\n", " ('mes', 0.998781144618988),\n", " ('france', 0.9987801909446716),\n", " ('au', 0.9987511038780212)]"]}, "execution_count": 20, "metadata": {}, "output_type": "execute_result"}], "source": ["model.wv.similar_by_word(\"fin\")"]}, {"cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [{"data": {"text/plain": ["(100,)"]}, "execution_count": 21, "metadata": {}, "output_type": "execute_result"}], "source": ["model.wv[\"fin\"].shape"]}, {"cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [{"data": {"text/plain": ["array([-0.09920651,  0.15360324,  0.10844447,  0.12709534,  0.15020044,\n", "       -0.21826063,  0.07867183,  0.2793031 , -0.1988279 , -0.135458  ,\n", "       -0.08442771, -0.27579817,  0.05431064,  0.13231573,  0.06987454,\n", "       -0.18821737, -0.0537038 , -0.10661628, -0.04758533, -0.3020647 ,\n", "        0.1704731 ,  0.0394745 ,  0.12408937, -0.05706318, -0.05796036,\n", "        0.03647643, -0.18711708, -0.10510068, -0.10040793, -0.08600791,\n", "        0.13921241, -0.0547129 ,  0.09572571, -0.10740169, -0.00452373,\n", "        0.28817332, -0.01231772,  0.06307271,  0.02313815, -0.22305253,\n", "        0.12906754, -0.20111138, -0.12507376,  0.06637593,  0.06323538,\n", "       -0.2289281 , -0.18086989,  0.05065202,  0.04751947,  0.0070283 ,\n", "        0.20169634, -0.15028226,  0.04512867, -0.08974832, -0.08562531,\n", "        0.23815149,  0.11708703, -0.08336464, -0.00898065,  0.00677549,\n", "       -0.08762765, -0.06554074,  0.1182849 ,  0.01473513, -0.11507029,\n", "        0.25605434, -0.05245751,  0.22131208, -0.27702177,  0.17844225,\n", "       -0.28551322,  0.09160851,  0.19049928,  0.09809981,  0.18412267,\n", "       -0.01433086, -0.06096153, -0.00965379, -0.04718976,  0.04390529,\n", "       -0.2812708 , -0.00393267, -0.14382981,  0.09499372, -0.10859697,\n", "       -0.07420573,  0.13133654,  0.06538489,  0.24226172,  0.03639907,\n", "        0.28915352,  0.05038366,  0.05872998, -0.0310102 ,  0.30720538,\n", "        0.09244314,  0.20608151,  0.00660289,  0.07621165,  0.0461465 ],\n", "      dtype=float32)"]}, "execution_count": 22, "metadata": {}, "output_type": "execute_result"}], "source": ["model.wv[\"fin\"]"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Tagging\n", "\n", "L'objectif est de tagger les mots comme d\u00e9terminer si un mot est un verbe, un adjectif ..."]}, {"cell_type": "markdown", "metadata": {}, "source": ["### grammar\n", "\n", "Voir [html.grammar](http://www.nltk.org/api/nltk.html#module-nltk.grammar)."]}, {"cell_type": "markdown", "metadata": {}, "source": ["### CRF\n", "\n", "Voir [CRF](http://www.nltk.org/api/nltk.tag.html)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### HMM\n", "\n", "Voir [HMM](http://www.nltk.org/api/nltk.tag.html#module-nltk.tag.hmm)."]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Clustering\n", "\n", "Une fois qu'on a des coordonn\u00e9es, on peut faire plein de choses."]}, {"cell_type": "markdown", "metadata": {}, "source": ["### LDA\n", "\n", "* [Latent Dirichlet Application](http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf)\n", "* [LatentDirichletAllocation](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html)"]}, {"cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": ["from sklearn.feature_extraction.text import TfidfVectorizer\n", "\n", "tfidf_vectorizer = TfidfVectorizer(max_df=0.95, min_df=2,\n", "                                   max_features=1000)\n", "tfidf = tfidf_vectorizer.fit_transform(data[\"text\"])"]}, {"cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [{"data": {"text/plain": ["(5087, 1000)"]}, "execution_count": 24, "metadata": {}, "output_type": "execute_result"}], "source": ["tfidf.shape"]}, {"cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": ["from sklearn.decomposition import NMF, LatentDirichletAllocation\n", "lda = LatentDirichletAllocation(n_components=10, max_iter=5,\n", "                                learning_method='online',\n", "                                learning_offset=50.,\n", "                                random_state=0)"]}, {"cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [{"data": {"text/plain": ["LatentDirichletAllocation(learning_method='online', learning_offset=50.0,\n", "                          max_iter=5, random_state=0)"]}, "execution_count": 26, "metadata": {}, "output_type": "execute_result"}], "source": ["lda.fit(tfidf)"]}, {"cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [{"name": "stderr", "output_type": "stream", "text": ["C:\\Python395_x64\\lib\\site-packages\\sklearn\\utils\\deprecation.py:87: FutureWarning: Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.\n", "  warnings.warn(msg, category=FutureWarning)\n"]}, {"data": {"text/plain": ["['avoir', 'bac', 'bah']"]}, "execution_count": 27, "metadata": {}, "output_type": "execute_result"}], "source": ["tf_feature_names = tfidf_vectorizer.get_feature_names()\n", "tf_feature_names[100:103]"]}, {"cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": ["def print_top_words(model, feature_names, n_top_words):\n", "    for topic_idx, topic in enumerate(model.components_):\n", "        print(\"Topic #%d:\" % topic_idx)\n", "        print(\" \".join([feature_names[i]\n", "                        for i in topic.argsort()[- n_top_words - 1:][::-1]]))\n", "    print()"]}, {"cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["Topic #0:\n", "gratuit mcdo supprimerai \u00e9cole soir kebab macdo kfc domicile cc volont\u00e9\n", "Topic #1:\n", "macron co https de la est le il et hollande un\n", "Topic #2:\n", "sijetaispresident je les de la et le des en pour que\n", "Topic #3:\n", "notaires eu organiserais mets carte nouveaux journ\u00e9es installation cache cr\u00e9er sijetaispresident\n", "Topic #4:\n", "sijetaispresident interdirais les je ballerines la serait serais bah de interdit\n", "Topic #5:\n", "ministre de sijetaispresident la je premier mort et nommerais pr\u00e9sident plus\n", "Topic #6:\n", "cours le supprimerais jour sijetaispresident lundi samedi semaine je vendredi dimanche\n", "Topic #7:\n", "port interdirait d\u00e9missionnerais promesses heure rendrai ballerine mes changement christineboutin tiendrais\n", "Topic #8:\n", "seraient sijetaispresident gratuits aux les nos putain \u00e9ducation nationale bonne aurais\n", "Topic #9:\n", "bordel seront l\u00e9galiserai putes gratuites pizza mot virerais vitesse dutreil vivre\n", "\n"]}], "source": ["print_top_words(lda, tf_feature_names, 10)"]}, {"cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [{"data": {"text/plain": ["array([[0.02703569, 0.02703991, 0.75666556, 0.02703569, 0.02704012,\n", "        0.02703837, 0.02703696, 0.02703608, 0.02703592, 0.02703569],\n", "       [0.02276328, 0.02277087, 0.79511841, 0.02276199, 0.02276289,\n", "        0.02276525, 0.02277065, 0.02276215, 0.02276251, 0.02276199],\n", "       [0.02318042, 0.79137016, 0.02318268, 0.02318042, 0.02318137,\n", "        0.02318192, 0.0231807 , 0.02318045, 0.02318146, 0.02318042],\n", "       [0.0294858 , 0.73460096, 0.02949239, 0.0294858 , 0.02949433,\n", "        0.0294906 , 0.0294873 , 0.02948597, 0.02948989, 0.02948696],\n", "       [0.0260542 , 0.66003211, 0.02607499, 0.0260542 , 0.02605546,\n", "        0.13151004, 0.02605456, 0.0260542 , 0.02605602, 0.0260542 ]])"]}, "execution_count": 30, "metadata": {}, "output_type": "execute_result"}], "source": ["tr = lda.transform(tfidf)\n", "tr[:5]"]}, {"cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [{"data": {"text/plain": ["(5087, 10)"]}, "execution_count": 31, "metadata": {}, "output_type": "execute_result"}], "source": ["tr.shape"]}, {"cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": ["import pyLDAvis\n", "import pyLDAvis.sklearn\n", "pyLDAvis.enable_notebook()"]}, {"cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [{"name": "stderr", "output_type": "stream", "text": ["C:\\Python395_x64\\lib\\site-packages\\ipykernel\\ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.\n", "  and should_run_async(code)\n", "C:\\Python395_x64\\lib\\site-packages\\sklearn\\utils\\deprecation.py:87: FutureWarning: Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.\n", "  warnings.warn(msg, category=FutureWarning)\n", "C:\\Python395_x64\\lib\\site-packages\\pyLDAvis\\_prepare.py:246: FutureWarning: In a future version of pandas all arguments of DataFrame.drop except for the argument 'labels' will be keyword-only.\n", "  default_term_info = default_term_info.sort_values(\n"]}, {"data": {"text/html": ["\n", "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.jsdelivr.net/gh/bmabey/pyLDAvis@3.3.1/pyLDAvis/js/ldavis.v1.0.0.css\">\n", "\n", "\n", "<div id=\"ldavis_el5588813375534963207284704117\"></div>\n", "<script type=\"text/javascript\">\n", "\n", "var ldavis_el5588813375534963207284704117_data = {\"mdsDat\": {\"x\": [0.13217175579686602, 0.11523680626642205, 0.17422069193133857, 0.1570257194912535, -0.021095288791457473, 0.005103332843388231, -0.17192863433525704, -0.1577332366836731, -0.10122288133062536, -0.13177826518825553], \"y\": [0.04967795806828093, 0.15847271860312784, 0.09558140544454603, -0.1906491850421613, -0.16205812575652817, -0.06201956932251803, -0.01202222629452041, 0.04282972502029823, 0.019968555569748855, 0.06021874370972662], \"topics\": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], \"cluster\": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1], \"Freq\": [36.88185661912426, 26.106388659913076, 9.645160404360361, 5.727111011024793, 4.590569976902032, 4.34113749809889, 3.4765269242895416, 3.247125567176256, 3.1181812496560504, 2.8659420894547307]}, \"tinfo\": {\"Term\": [\"sijetaispresident\", \"les\", \"je\", \"interdirais\", \"gratuit\", \"seraient\", \"le\", \"la\", \"macron\", \"ministre\", \"port\", \"de\", \"ballerines\", \"bordel\", \"interdirait\", \"cours\", \"seront\", \"serait\", \"mcdo\", \"aux\", \"serais\", \"supprimerai\", \"mes\", \"l\\u00e9galiserai\", \"https\", \"co\", \"\\u00e9cole\", \"jour\", \"et\", \"supprimerais\", \"ferais\", \"merde\", \"ferai\", \"interdirai\", \"serai\", \"sorte\", \"moins\", \"monde\", \"toutes\", \"toute\", \"prison\", \"tout\", \"place\", \"pays\", \"seriez\", \"salaire\", \"week\", \"temps\", \"t\\u00e9l\\u00e9\", \"fromage\", \"donnerai\", \"rendrais\", \"travail\", \"end\", \"parce\", \"mettrais\", \"gros\", \"politiques\", \"sport\", \"plein\", \"me\", \"mieux\", \"gens\", \"vous\", \"tous\", \"des\", \"je\", \"que\", \"comme\", \"fran\\u00e7ais\", \"dans\", \"en\", \"les\", \"sijetaispresident\", \"pour\", \"\\u00e7a\", \"et\", \"pas\", \"de\", \"qui\", \"le\", \"la\", \"serait\", \"un\", \"du\", \"au\", \"une\", \"sur\", \"ne\", \"plus\", \"est\", \"co\", \"gauche\", \"via\", \"emmanuel\", \"droite\", \"emmanuelmacron\", \"enmarche\", \"candidat\", \"bayrou\", \"2017\", \"gt\", \"sondage\", \"macron\", \"tr\\u00e8s\", \"montebourg\", \"valls\", \"presidentielle2017\", \"cohn\", \"bendit\", \"bus\", \"comment\", \"fran\\u00e7ois\", \"direct\", \"pourquoi\", \"politique\", \"marche\", \"lemissionpolitique\", \"pr\\u00e9sidentielle\", \"cohnbendit\", \"\\u00e9conomique\", \"ps\", \"hollande\", \"https\", \"co\", \"son\", \"sarkozy\", \"est\", \"il\", \"vacances\", \"de\", \"mois\", \"le\", \"on\", \"la\", \"un\", \"avec\", \"pas\", \"en\", \"pour\", \"et\", \"une\", \"se\", \"qui\", \"des\", \"du\", \"les\", \"sijetaispresident\", \"que\", \"par\", \"dans\", \"au\", \"premier\", \"ministre\", \"nommerais\", \"enfant\", \"l\\u00e9galiserais\", \"mickey\", \"pens\\u00e9e\", \"triste\", \"nommerai\", \"peine\", \"justice\", \"chirac\", \"porte\", \"famille\", \"mandat\", \"poste\", \"drogue\", \"m\\u00e8re\", \"sant\\u00e9\", \"es\", \"1er\", \"lettre\", \"l\\u00e9gal\", \"coupe\", \"mort\", \"demanderai\", \"bande\", \"ta\", \"affaires\", \"violeurs\", \"r\\u00e9publique\", \"culture\", \"jamais\", \"pr\\u00e9sident\", \"aurai\", \"jul\", \"aurait\", \"de\", \"la\", \"sijetaispresident\", \"france\", \"plus\", \"je\", \"sarko\", \"et\", \"serait\", \"bien\", \"le\", \"pour\", \"mon\", \"un\", \"des\", \"les\", \"https\", \"co\", \"interdirais\", \"ballerines\", \"bah\", \"interdit\", \"hymne\", \"sieste\", \"an\", \"camembert\", \"imposerai\", \"instaurerais\", \"maths\", \"leggings\", \"elysee\", \"interdite\", \"national\", \"d\\u00e9missionnerai\", \"demanderais\", \"nouvel\", \"interdiction\", \"constitution\", \"secret\", \"raciste\", \"blanc\", \"premi\\u00e8re\", \"envers\", \"baiser\", \"suppression\", \"lyc\\u00e9e\", \"sortie\", \"haine\", \"arr\\u00eaterai\", \"obligatoire\", \"serais\", \"sijetaispresident\", \"les\", \"serait\", \"je\", \"la\", \"de\", \"le\", \"et\", \"supprimerais\", \"lundi\", \"samedi\", \"vendredi\", \"f\\u00e9ri\\u00e9\", \"hashtag\", \"49\", \"jour\", \"cantine\", \"dimanche\", \"20\", \"lenorman\", \"12\", \"10h\", \"2016\", \"mercredi\", \"septembre\", \"scolaires\", \"jeudi\", \"scolaire\", \"pote\", \"s\\u00e9nat\", \"self\", \"\\u00e9couter\", \"23\", \"aura\", \"obama\", \"burkini\", \"shopping\", \"semaine\", \"g\\u00e9rard\", \"cannabis\", \"cours\", \"le\", \"sijetaispresident\", \"je\", \"par\", \"supprimerai\", \"de\", \"la\", \"et\", \"gratuits\", \"\\u00e9ducation\", \"putain\", \"bonne\", \"nos\", \"grosse\", \"nationale\", \"police\", \"banques\", \"permis\", \"ss10\", \"seraient\", \"poudlard\", \"f\\u00eate\", \"r\\u00e9tablirais\", \"internet\", \"organiserai\", \"marre\", \"belle\", \"meufs\", \"fronti\\u00e8res\", \"finances\", \"18h\", \"8h\", \"question\", \"\\u00e9conomie\", \"f\\u00e9ri\\u00e9s\", \"tintin\", \"picsou\", \"minist\\u00e8re\", \"aux\", \"aurais\", \"sijetaispresident\", \"les\", \"une\", \"de\", \"au\", \"et\", \"je\", \"mcdo\", \"soir\", \"kebab\", \"kfc\", \"domicile\", \"macdo\", \"cc\", \"volont\\u00e9\", \"autoriserai\", \"obligerai\", \"tacos\", \"gratuit\", \"jeux\", \"cantines\", \"12h\", \"\\u00e9cole\", \"supprimerai\", \"cyrilhanouna\", \"13\", \"vos\", \"chez\", \"partir\", \"tu\", \"humour\", \"chocolat\", \"devoirs\", \"chaque\", \"etc\", \"0000\", \"sep\", \"le\", \"sijetaispresident\", \"je\", \"bordel\", \"l\\u00e9galiserai\", \"putes\", \"gratuites\", \"pizza\", \"mot\", \"virerais\", \"dutreil\", \"vitesse\", \"vivre\", \"vire\", \"ancien\", \"fachos\", \"seront\", \"renaud\", \"possible\", \"comptes\", \"perp\\u00e9tuit\\u00e9\", \"chocolatine\", \"beau\", \"netflix\", \"entretien\", \"s\\u00e9ries\", \"0000\", \"sep\", \"soutien\", \"medef\", \"h\\u00e9ritage\", \"\\u00e9lus\", \"2007\", \"ministre\", \"toutes\", \"les\", \"sijetaispresident\", \"mlp_officiel\", \"me\", \"de\", \"port\", \"interdirait\", \"d\\u00e9missionnerais\", \"heure\", \"rendrai\", \"promesses\", \"ballerine\", \"changement\", \"tiendrais\", \"ait\", \"christineboutin\", \"mes\", \"tweeter\", \"nan\", \"0000\", \"sep\", \"haute\", \"medef\", \"h\\u00e9ritage\", \"\\u00e9lus\", \"2007\", \"lrps\", \"quotidien\", \"le_parisien\", \"lrpsfn\", \"effets\", \"pro\", \"espace\", \"important\", \"d\\u00e9put\\u00e9\", \"edito\", \"le\", \"je\", \"du\", \"sijetaispresident\", \"des\", \"de\", \"soutiens\", \"faites\", \"vont\", \"chaine\", \"vs\", \"co\", \"jean\", \"https\", \"interdirais\", \"alstom\", \"gt\", \"fait\", \"ill\\u00e9gal\", \"macron\", \"il\", \"ou\", \"organiserais\", \"notaires\", \"eu\", \"mets\", \"carte\", \"nouveaux\", \"installation\", \"cache\", \"journ\\u00e9es\", \"cr\\u00e9er\", \"sep\", \"0000\", \"medef\", \"h\\u00e9ritage\", \"\\u00e9lus\", \"2007\", \"lrps\", \"quotidien\", \"le_parisien\", \"lrpsfn\", \"effets\", \"pro\", \"espace\", \"d\\u00e9put\\u00e9\", \"homologues\", \"important\", \"edito\", \"lib\\u00e9ral\", \"activit\\u00e9\", \"r\\u00e9f\\u00e9rendum\", \"boulot\", \"r\\u00e9seaux\", \"cars\", \"servent\", \"11\", \"libre\", \"le1hebdo\", \"rocard\", \"gt\", \"millions\", \"lelab_e1\", \"21\", \"sijetaispresident\", \"votez\", \"15\", \"in\", \"macron\", \"mois\", \"vacances\", \"de\", \"sorte\", \"elys\\u00e9e\", \"plein\", \"minist\\u00e8re\", \"co\", \"https\", \"les\", \"des\", \"la\", \"pas\", \"et\", \"du\"], \"Freq\": [423.0, 276.0, 308.0, 58.0, 49.0, 47.0, 218.0, 246.0, 161.0, 61.0, 32.0, 303.0, 39.0, 30.0, 29.0, 40.0, 30.0, 94.0, 28.0, 48.0, 50.0, 32.0, 33.0, 23.0, 188.0, 189.0, 24.0, 24.0, 182.0, 23.0, 54.981355718324636, 29.160471545487287, 29.527617235572745, 23.351125688465952, 30.022866808217472, 15.07657544902835, 14.770447793308264, 33.262147962624745, 23.311662958919563, 21.433287130671477, 13.226475025413766, 59.248339929866646, 19.080279483070267, 25.8906699550249, 10.617143776681251, 10.351759136644372, 9.762359474664596, 9.013044946363253, 8.997050514861062, 8.922746098434695, 8.786708070806192, 8.699627797776754, 12.39303897537948, 8.56700044955873, 11.034943655556159, 24.725980373831124, 8.383980977361846, 8.257667229639242, 8.132032841055906, 8.118158814815288, 32.874563156816194, 14.971220192550684, 18.81610997031631, 38.17053149704094, 48.59265341015673, 89.6480817117078, 179.5704870773422, 59.972116299439634, 37.42811527842562, 16.679832680931952, 52.375057296322595, 81.00971247675996, 144.3768764816205, 197.15881921708652, 72.23669417333939, 30.930513319420058, 92.45150951905048, 53.8031838263728, 121.24906248809587, 45.50565163337854, 91.99614201441533, 96.15977227918373, 49.23790179762185, 54.205115506517785, 46.54733909412809, 41.91000570203377, 41.6401895326332, 34.79664993795782, 31.667971567580196, 32.53449952498103, 32.736710084673625, 32.710505199941835, 32.799356370833486, 25.330958282413896, 18.323722472082874, 18.663842665981665, 15.153052435863197, 14.319021508035796, 14.17455447113122, 13.696791154431315, 14.00589535615754, 13.619306772725343, 13.148328440332268, 143.74826045464602, 11.73293348825853, 11.499533066120257, 20.105154426195096, 10.227954846245892, 10.059730868798978, 9.865721459718582, 9.661763954124293, 10.172172813379605, 9.457353870947067, 8.970209374711743, 8.620321281588337, 24.136655023216218, 8.068097322174397, 7.944881200292253, 7.517622241209234, 7.500192830077671, 7.251919545538066, 12.78557154021152, 43.15473647147819, 137.04332612888075, 137.12086556116986, 21.01341332503706, 21.347144373982182, 64.70900935512523, 47.18290922477739, 20.203418174503074, 95.23469263670113, 20.114488218187663, 62.968408034222605, 28.60766416958229, 66.87824115626911, 42.43996623224533, 28.990946514015963, 34.87134712102697, 41.200819697825395, 39.412834541750804, 44.48002339036471, 31.06067782988721, 20.472384299412433, 26.820602868241487, 31.66450093494864, 27.09258090204937, 36.30859455007037, 34.600579614092965, 24.975507437404612, 22.881314709008933, 23.738221129512812, 22.981968243706156, 27.128900861903045, 56.96482276012276, 18.17904751152302, 12.760527071917052, 10.087866065451301, 8.778886563215169, 8.308715125977933, 7.749449226429081, 12.516419972817634, 14.923379634555092, 7.318715902291474, 6.333539055053071, 6.327080361670323, 6.037129997395191, 5.909757905040152, 5.880117304664589, 5.857642823933987, 5.799384660662509, 5.683853935090845, 5.6427700698241665, 8.99594356207086, 5.257714081628877, 4.895868279464305, 4.602624415619055, 19.570111388222706, 4.801536730474992, 4.494921356442227, 4.444857403487598, 4.426718393274243, 4.362942416299696, 15.729299047079687, 8.380847136917264, 15.073320344060141, 18.03174615310471, 13.509588306555429, 13.470566484710893, 15.909823686414615, 48.40497047876778, 40.798564900909554, 48.30676729130964, 17.427355276044292, 17.455848227347378, 37.09345192506244, 8.867315268250216, 18.745999695904402, 13.744987856027883, 10.122030653110388, 15.035197054210844, 12.380464005539112, 9.986719103833037, 12.012237705974732, 10.450755680361029, 10.76907661014303, 10.519942931896326, 10.495047991908578, 56.96539946814006, 38.2424250083825, 18.32262106810589, 15.272547867020851, 13.125214408668803, 11.074316849926484, 10.659021008120058, 9.832150553572708, 8.879318992298009, 8.080499067745572, 7.40227530536168, 7.17669568121767, 6.980005997368172, 6.603089664660915, 12.897934252203052, 6.209455402294146, 5.946816036224446, 5.1743639499813225, 4.977927990862887, 4.576364125230047, 4.63290543271609, 4.42714988796919, 4.329077893335545, 7.160128026102931, 4.251675931811438, 3.9796690442687495, 3.807177670241878, 9.692588868298461, 4.623318839525872, 3.385566553790628, 5.942066054478761, 12.076017744347025, 19.209986141239, 79.08237311753828, 55.23981350570674, 22.170573947424714, 41.36094999283153, 29.22325561039375, 17.214566455408672, 12.55692015025724, 10.762016234433602, 21.67890710532923, 15.15053832845554, 15.033458538254461, 11.106744484240505, 10.034757904887494, 10.009955023784071, 9.908997372479856, 21.473347299216748, 9.21913794934776, 10.240123538523193, 8.75984780199486, 8.02050926443744, 7.9237329728377555, 7.884479434521408, 9.756356261626697, 7.316905900038822, 6.568733202014281, 6.427816277286464, 6.176172822932605, 6.843577705683779, 5.875948507309921, 5.598777218402202, 5.59547393464478, 6.26786653834806, 5.290613752105975, 7.573180831339394, 7.207451988874838, 4.44521281935141, 4.218247695585212, 14.791632538983674, 7.795240089111525, 8.799611932070357, 23.831965321615176, 23.486936252701383, 20.976386093089694, 12.172453408267735, 8.789327317203336, 7.617194531572033, 7.786507266900649, 7.410276333757114, 6.953401881841698, 18.175734390151828, 13.8829679680412, 14.098060433613938, 12.338059579851217, 15.950612835631082, 10.010786189800355, 13.439558671483605, 8.866580606543138, 9.122992019510656, 8.098806368884995, 7.9096609797451025, 40.35869546229199, 7.8268980335924665, 8.299367004362983, 7.168013534638959, 7.138906291732176, 8.468853700550133, 6.999605869603662, 6.549298448380922, 6.426661636016777, 5.845129046561045, 5.796829744294362, 5.712669284458151, 5.615428891148134, 5.789339683356412, 5.675226378778914, 5.441487065868169, 5.325673532402518, 4.972571301659004, 7.6800931105048456, 17.59726930950342, 12.043939032991547, 22.084577521474465, 17.39018859759427, 9.144282090619448, 10.759132388486032, 7.249909660470258, 7.751482967706769, 6.601323005931536, 27.15617790546986, 14.689530683312018, 13.927899394112274, 13.498892530160573, 13.32354395832861, 13.749965929927269, 12.720947935565409, 12.067008375361924, 10.217411536306395, 9.795816657283268, 9.082990183184998, 42.24742440226741, 7.928085376265827, 6.190060947766177, 5.134186689614663, 17.76276621165419, 22.592414349611758, 3.6219930936647726, 3.525295661461261, 6.238309066076734, 6.157446759993402, 1.7938176793269065, 6.768892120628941, 1.393320883174782, 1.3227171878806103, 1.0382264810877015, 3.3896899160093548, 1.5986555910460656, 0.18044085088916664, 0.18042819397470578, 5.629817478616745, 9.216773432467452, 2.895343749850835, 29.07064512412052, 22.457484611062153, 16.2319911257131, 15.157216385357195, 14.544182007786679, 13.167671935936175, 11.063232924447508, 10.113154607927505, 10.159577765586462, 9.908026435054627, 8.070335940941872, 8.99600483111304, 6.954749140520647, 25.306859053538975, 5.631820130888545, 7.767783001724947, 5.007640236106504, 3.4043813484350474, 3.5169817822712095, 1.559062805533414, 1.1749705317258545, 1.000242620182058, 1.3344534091930953, 0.20958328751200123, 0.20958029172310588, 0.6926579953883346, 0.20955197666404643, 0.20958234352014485, 0.20946834000232717, 0.2096004599014084, 2.818230291750903, 1.231894828283916, 6.244461437433068, 3.1964701833956557, 0.2306734065032936, 0.30061646209515314, 0.3306255284478012, 30.79737023742191, 28.498921700088502, 17.65018963359892, 13.673980635972839, 12.429598949469488, 15.062966849233268, 11.417595591059087, 11.159101713639416, 10.049004338588675, 9.205888851389023, 10.345282328810104, 11.342617319548392, 1.3493461143126457, 1.3170050660758625, 0.2231611811951851, 0.22288828696985244, 0.3301296909997835, 0.2225111562215783, 0.2230345006272364, 0.2234933098123423, 0.2226860047376072, 0.222701269617381, 0.22259963749153208, 0.22295669333454715, 0.22303461479184694, 0.22293849609641456, 0.2223818156276963, 0.2226148624315841, 0.22273325675580755, 0.22270209770730598, 0.22317670938721948, 6.079867724381109, 7.128049539198986, 2.758672228489123, 8.314304549675853, 2.806246742548188, 2.1292292603435325, 0.22542938076486904, 0.22588880810633089, 0.22609407542960372, 0.22566163139220033, 0.2327799055316602, 0.2634405582861428, 0.22773737833091753, 0.2630478884269113, 0.24600899813734808, 0.2286082506441489, 0.23147628170723, 0.23076724009289207, 0.22727028123219628, 0.23249832965742437, 0.23109541938299075, 0.22967913642625196, 14.409618739152826, 16.16252058538582, 14.81237032398875, 13.56650589882429, 12.499971320877098, 11.151319865765442, 8.877775071020578, 7.943170297307868, 9.33371253051712, 1.3221990563845556, 0.28977542307731347, 0.28927751463958423, 0.2897509250152163, 0.2901222052406193, 0.28941224154798056, 0.2896228752468017, 0.2896529743401956, 0.28860709287197805, 0.289380209469987, 0.2895555254563074, 0.2895489280906105, 0.2893259837361215, 0.28948729325410355, 0.28951394725881674, 0.28971279414547974, 0.28885357970662995, 0.289522026593877, 0.28923963381640994, 0.28938038025380647, 0.29009283470404357, 0.31566526292802705, 0.2974642666084235, 0.3175216049458227, 0.29256740154360783, 0.2966301014831744, 0.29600927727206716, 0.29280773795789905, 0.2928649461553626, 0.30814845204404223, 0.2976103673542526, 0.29373372542301573, 0.29292323304809115, 0.34216327608210007, 0.30156306343801703, 0.2946385531143965, 0.2933467580853959, 0.3217797219519592, 0.30540659693235933, 0.30409982575169037, 0.3090172518350447, 0.29957220399361684, 0.2979758878564289, 0.2971806658488168, 0.29815053603083064, 0.3059487583371101, 0.3056306883327947, 0.30652983133374306, 0.30270441100333867, 0.30150046272887454, 0.3012458713019114, 0.30023885465447026, 0.2982778207884218], \"Total\": [423.0, 276.0, 308.0, 58.0, 49.0, 47.0, 218.0, 246.0, 161.0, 61.0, 32.0, 303.0, 39.0, 30.0, 29.0, 40.0, 30.0, 94.0, 28.0, 48.0, 50.0, 32.0, 33.0, 23.0, 188.0, 189.0, 24.0, 24.0, 182.0, 23.0, 57.45938376728785, 30.6496892137215, 31.20841920294803, 24.985841051703794, 32.33623799708369, 16.57934049554667, 16.265383616571423, 36.67169427005589, 25.81598757704128, 23.83272642474223, 14.719380944070986, 66.40933658745527, 21.560836457355457, 29.324485719732575, 12.09452276141895, 11.83029885326162, 11.239113662824058, 10.493594523417853, 10.475068472581013, 10.40015342312211, 10.26831021988244, 10.176071695747122, 14.529335170270237, 10.044068769920619, 12.944482674199978, 29.069916011689962, 9.865637060648996, 9.73355719677469, 9.6083053634358, 9.601106074208742, 39.09310709904545, 17.860747872921642, 23.13753738110172, 49.18782885229712, 63.74434408206156, 139.49273139597074, 308.57989235671636, 89.34015817459301, 53.0459005712214, 21.22319525209966, 79.69771189105467, 135.32982879265163, 276.6393174737369, 423.27921429621256, 126.6751808025445, 44.632561900699095, 182.0975544952382, 97.72360102514841, 303.61186352099264, 79.03826709730355, 218.75464543311762, 246.70585061238262, 94.39044294034507, 119.25896215162933, 91.42486880897707, 81.32304635000419, 92.63911966460354, 66.73462945786194, 53.32647959031066, 65.80377928886489, 105.94032016020729, 189.0807452877207, 34.384798083158124, 26.80995436719102, 19.79220079567944, 20.177796338237037, 16.673214597255164, 15.793195213462253, 15.660075022009007, 15.164903063222168, 15.51560352488505, 15.113279490010328, 14.61291513557541, 161.24019440162786, 13.203998511506555, 12.961778458793614, 22.709024923811242, 11.695605957245402, 11.524671028424937, 11.32913990685907, 11.123643406378843, 11.731772252663038, 10.92100834012278, 10.4308219204266, 10.102146914672371, 28.542997631035522, 9.543680001585665, 9.406883816701518, 8.981122350151063, 8.97794477659828, 8.73171354570446, 15.490714286990007, 53.76328297599422, 188.83414648975156, 189.0807452877207, 26.721741506868536, 28.106456299641145, 105.94032016020729, 75.3170766323103, 29.198861177955443, 303.61186352099264, 33.20061715006644, 218.75464543311762, 61.4899031677422, 246.70585061238262, 119.25896215162933, 64.44375422540801, 97.72360102514841, 135.32982879265163, 126.6751808025445, 182.0975544952382, 92.63911966460354, 36.658348662141236, 79.03826709730355, 139.49273139597074, 91.42486880897707, 276.6393174737369, 423.27921429621256, 89.34015817459301, 58.675302557181595, 79.69771189105467, 81.32304635000419, 28.595328289163103, 61.36154461794063, 19.64282116707331, 14.229111366769311, 11.551112584489216, 10.237749690252592, 9.76707654644382, 9.20824321156508, 15.104325709297772, 18.046297055863363, 8.884437833204633, 7.794174908306151, 7.789567885464032, 7.498072061703175, 7.366361850409982, 7.340495096569618, 7.315238548336143, 7.262381368052485, 7.152043540577037, 7.105762965900631, 11.390277890756096, 6.717161155456738, 6.352731209280177, 6.060086615253234, 25.83670378865929, 6.341583709299847, 5.954449834828482, 5.902751241077146, 5.896060308929613, 5.819109851272704, 21.57188739206543, 11.594094510042797, 25.013515623194266, 33.88391243659234, 24.701073790904694, 29.392852075842786, 41.151759881010264, 303.61186352099264, 246.70585061238262, 423.27921429621256, 60.33207433427432, 65.80377928886489, 308.57989235671636, 19.71609303402654, 182.0975544952382, 94.39044294034507, 35.45329393307013, 218.75464543311762, 126.6751808025445, 39.371423256974076, 119.25896215162933, 139.49273139597074, 276.6393174737369, 188.83414648975156, 189.0807452877207, 58.55790169994123, 39.693895736913234, 19.769259279744507, 16.783551626754974, 14.567079144867018, 12.514245875250086, 12.10945687459684, 11.274288362974296, 10.323081304488461, 9.52326554148089, 8.856991207980403, 8.61900029864819, 8.421669640024271, 8.045136192368469, 15.79530963701924, 7.650540974556978, 7.3898320803396444, 6.63761262610303, 6.426643548401627, 6.018890837813238, 6.099254953187194, 5.868876957254799, 5.7716982140573725, 9.588578516157613, 5.697106311070666, 5.4248252008892415, 5.248656733215314, 13.61041099223091, 6.698851637993608, 4.912154356881982, 9.67915212450808, 26.224149893067032, 50.55312989114495, 423.27921429621256, 276.6393174737369, 94.39044294034507, 308.57989235671636, 246.70585061238262, 303.61186352099264, 218.75464543311762, 182.0975544952382, 23.09635824449745, 16.569185075693028, 16.44918749015235, 12.58989242459313, 11.4524999415175, 11.42982054466092, 11.32464197907851, 24.6696141338584, 10.651830656317088, 11.838211416093397, 10.234497730326401, 9.436041415531665, 9.34348480644978, 9.300447151453238, 11.548816493951627, 8.731104611560532, 7.989117567304872, 7.842877520115344, 7.594969604868399, 8.419493594806315, 7.296333424075325, 7.015877632822544, 7.0120800304176845, 7.85946330646825, 6.708813133952483, 9.612551389612527, 9.247107204566268, 5.890216383625168, 5.645112925123977, 20.282930949782525, 10.715621969272219, 12.739447377133962, 40.69345425766343, 218.75464543311762, 423.27921429621256, 308.57989235671636, 58.675302557181595, 32.460929142941886, 303.61186352099264, 246.70585061238262, 182.0975544952382, 19.593601957431236, 15.294280150303878, 15.552326469405223, 13.759197262604463, 18.212718402664596, 11.522789943513617, 15.534547627912055, 10.279805640268565, 10.638308670402514, 9.509631505902135, 9.322378826931617, 47.62244589040776, 9.236583434551607, 9.90160334453699, 8.57832469292835, 8.551268071851373, 10.174159425270963, 8.409953893949183, 7.978914231156258, 7.836203857374626, 7.254124765822488, 7.207133449834548, 7.122673563807289, 7.026255583352716, 7.256505682238536, 7.128555491125856, 6.851572367936442, 6.736713014267887, 6.386504959030058, 13.878313167338018, 48.90191915574695, 31.776019595652656, 423.27921429621256, 276.6393174737369, 92.63911966460354, 303.61186352099264, 81.32304635000419, 182.0975544952382, 308.57989235671636, 28.53071486384636, 16.067552268939483, 15.297878458180463, 14.874141942490143, 14.694513597603683, 15.177715883592386, 14.092426197519966, 13.47359762089662, 11.589436762658508, 11.167605030090506, 10.453370400377365, 49.3819172483611, 9.300053053802326, 7.56130954176932, 6.559196289676172, 24.718195768647796, 32.460929142941886, 5.980420543405518, 6.378333392611751, 16.50593499230667, 16.42174270704548, 4.842913221729854, 20.72291213484678, 6.356329711476949, 8.852042804957302, 7.134287494862465, 24.41056024263181, 12.86073608649054, 2.388646509512376, 2.388783155240786, 218.75464543311762, 423.27921429621256, 308.57989235671636, 30.43244772674053, 23.799621528139458, 17.572932871256995, 16.50514597861596, 15.885540105608822, 14.516556398508097, 12.405019007545194, 11.456549357935229, 11.511806243336189, 11.255862999926883, 9.496606161090241, 10.71207377525688, 8.342445966958811, 30.74753548523304, 6.97292234031702, 11.542051584314411, 7.496009126226312, 6.257618384806251, 10.217875415314193, 6.32226263746762, 4.95008678839123, 5.98425597299953, 12.676730675114975, 2.388646509512376, 2.388783155240786, 7.938434561077524, 3.297724910897427, 3.3199659321513737, 3.426596111539934, 3.481994243150039, 61.36154461794063, 25.81598757704128, 276.6393174737369, 423.27921429621256, 4.482204686529235, 39.09310709904545, 303.61186352099264, 32.12606859222197, 29.977542730316866, 18.978039315795804, 15.00697530436135, 13.764599856625951, 16.68157862790969, 12.745148536717188, 12.487225038983588, 11.376347798960989, 10.537618466635214, 11.843489688753376, 33.19763696482478, 5.343176185694906, 5.594885669454578, 2.388646509512376, 2.388783155240786, 4.615776397342079, 3.297724910897427, 3.3199659321513737, 3.426596111539934, 3.481994243150039, 3.540835269301688, 3.5459903889565516, 3.556484770384115, 3.562321180060039, 3.580805960360218, 3.600948640866241, 3.618284480849923, 3.6386925027871673, 3.642025204920648, 3.6499420593434815, 218.75464543311762, 308.57989235671636, 91.42486880897707, 423.27921429621256, 139.49273139597074, 303.61186352099264, 3.700836665441673, 4.061490805460836, 4.1128563475445095, 4.066915516888579, 6.058988752009855, 189.0807452877207, 5.083747672483573, 188.83414648975156, 58.55790169994123, 6.68017707265354, 15.113279490010328, 25.80791652617566, 5.650290927006553, 161.24019440162786, 75.3170766323103, 25.795532977951186, 15.671234176675421, 17.60045250578675, 16.183685243176935, 14.84729479766305, 13.768425986605255, 12.419065874911054, 10.164658217264158, 9.205330154035135, 12.171690396466925, 7.038954162736939, 2.388783155240786, 2.388646509512376, 3.297724910897427, 3.3199659321513737, 3.426596111539934, 3.481994243150039, 3.540835269301688, 3.5459903889565516, 3.556484770384115, 3.562321180060039, 3.580805960360218, 3.600948640866241, 3.618284480849923, 3.642025204920648, 3.6469225436391106, 3.6386925027871673, 3.6499420593434815, 3.6502155321323144, 3.6873086851842833, 3.6989273470122166, 4.0774553222979, 4.234574097966984, 6.8050493620757555, 3.999021557901032, 4.6634188333203355, 5.225719449344607, 4.3596818164340965, 4.510277942330022, 15.113279490010328, 6.549436835642081, 4.936676801847355, 4.6395271943861855, 423.27921429621256, 9.756924193408077, 5.802255831436477, 5.18495808846932, 161.24019440162786, 33.20061715006644, 29.198861177955443, 303.61186352099264, 16.57934049554667, 12.202340593600331, 9.601106074208742, 13.878313167338018, 189.0807452877207, 188.83414648975156, 276.6393174737369, 139.49273139597074, 246.70585061238262, 97.72360102514841, 182.0975544952382, 91.42486880897707], \"Category\": [\"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic3\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic4\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic5\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic6\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic7\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic8\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic9\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\", \"Topic10\"], \"logprob\": [30.0, 29.0, 28.0, 27.0, 26.0, 25.0, 24.0, 23.0, 22.0, 21.0, 20.0, 19.0, 18.0, 17.0, 16.0, 15.0, 14.0, 13.0, 12.0, 11.0, 10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0, -4.5326, -5.1668, -5.1543, -5.389, -5.1376, -5.8265, -5.847, -5.0352, -5.3906, -5.4747, -5.9574, -4.4579, -5.5909, -5.2857, -6.1771, -6.2024, -6.2611, -6.3409, -6.3427, -6.351, -6.3664, -6.3763, -6.0225, -6.3917, -6.1385, -5.3317, -6.4133, -6.4285, -6.4438, -6.4455, -5.0469, -5.8335, -5.6049, -4.8975, -4.6561, -4.0437, -3.349, -4.4457, -4.9172, -5.7254, -4.5812, -4.145, -3.5672, -3.2556, -4.2597, -5.1079, -4.0129, -4.5543, -3.7418, -4.7218, -4.0179, -3.9736, -4.6429, -4.5468, -4.6991, -4.8041, -4.8105, -4.9901, -5.0843, -5.0573, -5.0511, -5.0519, -4.7037, -4.962, -5.2859, -5.2675, -5.4759, -5.5325, -5.5426, -5.5769, -5.5546, -5.5826, -5.6178, -3.226, -5.7317, -5.7518, -5.1931, -5.8689, -5.8855, -5.905, -5.9259, -5.8744, -5.9473, -6.0002, -6.0399, -5.0103, -6.1061, -6.1215, -6.1768, -6.1791, -6.2128, -5.6457, -4.4293, -3.2738, -3.2732, -5.1489, -5.1331, -4.0242, -4.34, -5.1882, -3.6377, -5.1926, -4.0514, -4.8404, -3.9912, -4.446, -4.8271, -4.6424, -4.4756, -4.52, -4.399, -4.7581, -5.175, -4.9049, -4.7389, -4.8948, -4.602, -4.6502, -4.9762, -5.0637, -5.027, -5.0594, -3.8977, -3.1559, -4.2981, -4.652, -4.887, -5.026, -5.081, -5.1507, -4.6713, -4.4954, -5.2079, -5.3525, -5.3535, -5.4004, -5.4217, -5.4268, -5.4306, -5.4406, -5.4607, -5.468, -5.0016, -5.5386, -5.6099, -5.6717, -4.2243, -5.6294, -5.6954, -5.7066, -5.7107, -5.7252, -4.4428, -5.0724, -4.4854, -4.3062, -4.5949, -4.5978, -4.4314, -3.3187, -3.4897, -3.3208, -4.3403, -4.3387, -3.5849, -5.016, -4.2674, -4.5777, -4.8836, -4.4879, -4.6822, -4.8971, -4.7124, -4.8517, -4.8217, -4.8451, -4.8474, -2.6346, -3.0331, -3.769, -3.951, -4.1026, -4.2725, -4.3107, -4.3914, -4.4934, -4.5876, -4.6753, -4.7063, -4.734, -4.7896, -4.12, -4.851, -4.8942, -5.0334, -5.0721, -5.1562, -5.1439, -5.1893, -5.2117, -4.7086, -5.2298, -5.2959, -5.3402, -4.4057, -5.146, -5.4576, -4.895, -4.1859, -3.7217, -2.3066, -2.6654, -3.5783, -2.9548, -3.3021, -3.8313, -4.1468, -4.3011, -3.3795, -3.7378, -3.7456, -4.0483, -4.1498, -4.1523, -4.1624, -3.3891, -4.2346, -4.1296, -4.2857, -4.3739, -4.386, -4.391, -4.178, -4.4657, -4.5736, -4.5953, -4.6352, -4.5326, -4.685, -4.7333, -4.7339, -4.6204, -4.79, -4.4313, -4.4808, -4.9641, -5.0165, -3.7618, -4.4024, -4.2812, -3.2849, -3.2994, -3.4125, -3.9567, -4.2823, -4.4255, -4.4035, -4.453, -4.5167, -3.4999, -3.7694, -3.754, -3.8873, -3.6305, -4.0964, -3.8018, -4.2177, -4.1892, -4.3083, -4.3319, -2.7022, -4.3425, -4.2838, -4.4304, -4.4345, -4.2636, -4.4542, -4.5207, -4.5396, -4.6344, -4.6427, -4.6573, -4.6745, -4.644, -4.6639, -4.706, -4.7275, -4.7961, -4.3614, -3.5323, -3.9115, -3.3051, -3.5441, -4.1869, -4.0243, -4.419, -4.3521, -4.5127, -2.8763, -3.4908, -3.544, -3.5753, -3.5884, -3.5569, -3.6347, -3.6874, -3.8538, -3.896, -3.9715, -2.4344, -4.1075, -4.355, -4.542, -3.3008, -3.0603, -4.8909, -4.918, -4.3472, -4.3603, -5.5936, -4.2656, -5.8462, -5.8982, -6.1404, -4.9572, -5.7088, -7.8903, -7.8903, -4.4498, -3.9569, -5.1148, -2.7399, -2.998, -3.3227, -3.3912, -3.4325, -3.5319, -3.706, -3.7958, -3.7912, -3.8163, -4.0215, -3.9129, -4.1702, -2.8786, -4.3812, -4.0597, -4.4987, -4.8846, -4.852, -5.6656, -5.9484, -6.1094, -5.8211, -7.6723, -7.6723, -6.4769, -7.6724, -7.6723, -7.6728, -7.6722, -5.0735, -5.9011, -4.278, -4.9476, -7.5764, -7.3116, -7.2164, -2.6417, -2.7193, -3.1984, -3.4536, -3.5491, -3.3569, -3.634, -3.6569, -3.7617, -3.8493, -3.7326, -3.6406, -5.7695, -5.7938, -7.569, -7.5702, -7.1774, -7.5719, -7.5696, -7.5675, -7.5711, -7.5711, -7.5715, -7.5699, -7.5696, -7.57, -7.5725, -7.5714, -7.5709, -7.5711, -7.5689, -4.2641, -4.1051, -5.0544, -3.9512, -5.0373, -5.3134, -7.5589, -7.5568, -7.5559, -7.5579, -7.5268, -7.4031, -7.5487, -7.4046, -7.4715, -7.5449, -7.5324, -7.5355, -7.5507, -7.528, -7.5341, -7.5402, -3.3169, -3.2021, -3.2893, -3.3772, -3.4591, -3.5732, -3.8012, -3.9125, -3.7511, -5.7055, -7.2234, -7.2251, -7.2235, -7.2222, -7.2247, -7.224, -7.2239, -7.2275, -7.2248, -7.2242, -7.2242, -7.225, -7.2244, -7.2243, -7.2236, -7.2266, -7.2243, -7.2253, -7.2248, -7.2223, -7.1379, -7.1972, -7.132, -7.2138, -7.2, -7.2021, -7.213, -7.2128, -7.162, -7.1967, -7.2099, -7.2126, -7.0572, -7.1836, -7.2068, -7.2112, -7.1187, -7.1709, -7.1752, -7.1591, -7.1902, -7.1955, -7.1982, -7.1949, -7.1691, -7.1702, -7.1672, -7.1798, -7.1838, -7.1846, -7.188, -7.1945], \"loglift\": [30.0, 29.0, 28.0, 27.0, 26.0, 25.0, 24.0, 23.0, 22.0, 21.0, 20.0, 19.0, 18.0, 17.0, 16.0, 15.0, 14.0, 13.0, 12.0, 11.0, 10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0, 0.9534, 0.9476, 0.9421, 0.9298, 0.9232, 0.9024, 0.901, 0.8999, 0.8954, 0.8913, 0.8905, 0.8834, 0.8752, 0.8729, 0.8672, 0.8639, 0.8566, 0.8454, 0.8453, 0.8442, 0.8416, 0.8407, 0.8384, 0.8384, 0.8378, 0.8356, 0.8347, 0.833, 0.8306, 0.8297, 0.8242, 0.821, 0.7907, 0.7439, 0.726, 0.5553, 0.456, 0.5989, 0.6487, 0.7566, 0.5776, 0.4843, 0.3472, 0.2334, 0.4358, 0.6307, 0.3196, 0.4006, 0.0795, 0.4454, 0.1312, 0.0553, 0.3467, 0.2089, 0.3224, 0.3345, 0.1978, 0.3462, 0.4763, 0.2931, -0.1769, -0.757, 1.2958, 1.2862, 1.2659, 1.265, 1.2474, 1.245, 1.2433, 1.2412, 1.2406, 1.2389, 1.2374, 1.2282, 1.2249, 1.2233, 1.2212, 1.2089, 1.207, 1.2047, 1.2021, 1.2003, 1.1991, 1.1921, 1.1844, 1.1753, 1.175, 1.1741, 1.1651, 1.1631, 1.1573, 1.1511, 1.1232, 1.0224, 1.0217, 1.1027, 1.0679, 0.85, 0.8753, 0.9747, 0.1836, 0.8419, 0.0977, 0.5778, 0.0377, 0.3098, 0.5442, 0.3125, 0.1537, 0.1755, -0.0665, 0.2502, 0.7604, 0.2622, -0.1398, 0.1267, -0.6877, -1.1612, 0.0684, 0.4013, 0.1318, 0.0793, 2.2861, 2.2644, 2.2613, 2.2298, 2.2033, 2.185, 2.177, 2.1662, 2.1508, 2.1487, 2.1448, 2.1312, 2.1308, 2.122, 2.1184, 2.1169, 2.1165, 2.1138, 2.1089, 2.1082, 2.1027, 2.0937, 2.0782, 2.0636, 2.0609, 2.0605, 2.0575, 2.055, 2.0521, 2.0507, 2.0228, 2.0142, 1.8322, 1.7079, 1.7353, 1.5585, 1.3884, 0.5026, 0.5392, 0.1683, 1.0969, 1.0117, 0.2202, 1.5397, 0.0652, 0.4119, 1.0852, -0.3388, 0.0132, 0.9669, 0.0433, -0.2526, -0.9073, -0.5489, -0.5526, 2.8324, 2.8227, 2.784, 2.7656, 2.7557, 2.7377, 2.7324, 2.7231, 2.7093, 2.6957, 2.6805, 2.6768, 2.6722, 2.6624, 2.6573, 2.6513, 2.6427, 2.6109, 2.6045, 2.586, 2.585, 2.5781, 2.5723, 2.5679, 2.5673, 2.5502, 2.5389, 2.5205, 2.4891, 2.4878, 2.372, 2.0845, 1.8924, 1.1824, 1.2489, 1.4113, 0.8503, 0.7267, -0.01, 0.0023, 0.0314, 3.0178, 2.9917, 2.9912, 2.9558, 2.949, 2.9485, 2.9476, 2.9424, 2.9367, 2.9361, 2.9256, 2.9186, 2.9163, 2.916, 2.9125, 2.9045, 2.8854, 2.8822, 2.8744, 2.8739, 2.8647, 2.8555, 2.8555, 2.8549, 2.8437, 2.8427, 2.832, 2.7997, 2.7898, 2.7654, 2.763, 2.7112, 2.5461, 0.8497, 0.0765, -0.1516, 1.1827, 1.6315, -0.5822, -0.4242, -0.1841, 3.0619, 3.0402, 3.0389, 3.028, 3.0044, 2.9964, 2.9922, 2.9891, 2.9834, 2.9764, 2.9727, 2.9715, 2.9714, 2.9605, 2.9574, 2.9565, 2.9536, 2.9535, 2.9396, 2.9387, 2.9211, 2.9193, 2.9164, 2.9129, 2.9112, 2.909, 2.9066, 2.902, 2.8868, 2.5453, 2.115, 2.1669, 0.1839, 0.3702, 0.8215, -0.203, 0.7196, -0.0196, -0.7077, 3.3098, 3.2695, 3.2653, 3.2621, 3.2612, 3.2603, 3.2567, 3.2489, 3.2331, 3.2281, 3.2186, 3.2031, 3.1995, 3.159, 3.1142, 3.0287, 2.9967, 2.8577, 2.7662, 2.3861, 2.3782, 2.366, 2.2402, 1.8414, 1.4582, 1.4317, 1.3849, 1.2741, 0.7761, 0.7759, -0.3007, -0.4679, -1.3097, 3.3816, 3.3694, 3.348, 3.3422, 3.3392, 3.3299, 3.3129, 3.3027, 3.3024, 3.2999, 3.2647, 3.2528, 3.2455, 3.2327, 3.2138, 3.0314, 3.024, 2.8187, 2.3609, 2.0274, 1.9892, 1.6385, 1.1762, 0.994, 0.994, 0.9885, 0.6714, 0.6648, 0.6326, 0.6172, 0.3467, 0.385, -0.3636, -1.4586, 0.4605, -1.4405, -3.3951, 3.4257, 3.4173, 3.3954, 3.3749, 3.3659, 3.3659, 3.3579, 3.3555, 3.3439, 3.3328, 3.3327, 2.394, 2.0917, 2.0214, 1.0973, 1.0961, 0.8302, 0.7719, 0.7675, 0.738, 0.7183, 0.7016, 0.6997, 0.6984, 0.6971, 0.6915, 0.6834, 0.6796, 0.6745, 0.6735, 0.6734, -0.115, -0.3, -0.0328, -0.4621, -0.4382, -1.4921, 0.6696, 0.5787, 0.567, 0.5763, 0.2087, -3.1082, 0.3623, -3.1084, -2.0045, 0.093, -0.7109, -1.2491, 0.2546, -3.0738, -2.3187, -1.2534, 3.4683, 3.467, 3.4637, 3.4621, 3.4556, 3.4446, 3.4169, 3.4048, 3.2868, 1.8801, 1.4428, 1.4412, 1.1203, 1.1149, 1.0808, 1.0655, 1.0488, 1.0438, 1.0435, 1.0425, 1.0373, 1.0309, 1.0266, 1.0202, 1.0195, 1.0188, 1.018, 1.017, 1.0074, 1.0067, 0.9937, 0.8965, 0.4874, 0.9372, 0.7973, 0.6813, 0.8516, 0.8179, -0.3405, 0.4609, 0.7305, 0.7898, -3.5682, 0.0755, 0.572, 0.6801, -2.6645, -1.1364, -1.0123, -3.3378, -0.4613, -0.1601, 0.077, -0.2882, -2.8742, -2.874, -3.2529, -2.5807, -3.1549, -2.2297, -2.8554, -2.173]}, \"token.table\": {\"Topic\": [5, 5, 1, 2, 5, 7, 2, 7, 2, 6, 2, 3, 5, 2, 5, 2, 5, 5, 5, 6, 2, 3, 9, 2, 4, 8, 1, 4, 1, 2, 3, 4, 5, 6, 3, 5, 1, 3, 1, 3, 6, 1, 2, 3, 7, 1, 2, 4, 6, 1, 2, 3, 5, 4, 4, 9, 4, 3, 6, 2, 1, 2, 8, 6, 2, 1, 2, 3, 4, 6, 8, 1, 4, 5, 2, 10, 4, 2, 3, 5, 5, 7, 2, 4, 10, 7, 3, 9, 1, 2, 5, 7, 1, 2, 3, 7, 3, 1, 7, 1, 8, 9, 1, 2, 3, 4, 6, 2, 2, 1, 2, 3, 6, 2, 6, 8, 4, 3, 1, 5, 3, 10, 3, 6, 3, 7, 1, 2, 3, 1, 2, 3, 4, 5, 6, 9, 3, 4, 1, 2, 3, 6, 9, 1, 7, 5, 2, 7, 1, 3, 2, 1, 2, 3, 4, 5, 9, 8, 4, 9, 2, 2, 2, 4, 1, 2, 6, 2, 2, 1, 2, 3, 4, 1, 3, 2, 2, 5, 8, 4, 3, 2, 1, 2, 3, 1, 2, 3, 4, 5, 6, 1, 2, 3, 7, 10, 8, 1, 2, 3, 1, 3, 1, 1, 3, 6, 1, 2, 3, 1, 2, 3, 2, 1, 6, 5, 6, 6, 2, 1, 2, 1, 7, 8, 6, 1, 6, 2, 2, 5, 4, 5, 2, 9, 1, 2, 3, 2, 1, 2, 3, 4, 6, 2, 7, 4, 2, 1, 2, 3, 1, 1, 4, 1, 10, 4, 4, 1, 4, 9, 4, 4, 6, 1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 9, 2, 5, 7, 1, 2, 5, 6, 10, 1, 3, 3, 7, 7, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 7, 9, 2, 2, 4, 2, 2, 5, 1, 2, 3, 4, 5, 6, 8, 3, 2, 2, 3, 3, 5, 1, 4, 3, 8, 3, 7, 1, 2, 3, 3, 2, 6, 4, 7, 1, 2, 6, 1, 5, 1, 1, 2, 9, 10, 1, 2, 6, 6, 3, 1, 2, 2, 3, 8, 2, 6, 3, 4, 1, 1, 2, 5, 1, 2, 3, 1, 2, 2, 1, 2, 3, 8, 3, 1, 9, 2, 4, 1, 6, 1, 2, 3, 1, 8, 3, 6, 3, 2, 6, 10, 10, 4, 2, 5, 1, 4, 5, 7, 1, 2, 3, 6, 10, 1, 2, 3, 4, 1, 2, 3, 5, 1, 2, 2, 7, 1, 2, 3, 1, 2, 2, 3, 3, 6, 3, 8, 6, 8, 1, 2, 1, 1, 2, 3, 6, 1, 2, 1, 9, 3, 1, 8, 3, 5, 6, 1, 2, 3, 4, 2, 3, 1, 4, 2, 1, 3, 9, 1, 3, 2, 2, 3, 6, 8, 1, 2, 3, 6, 1, 2, 3, 1, 2, 4, 8, 9, 1, 2, 1, 1, 3, 2, 6, 1, 5, 3, 2, 3, 1, 2, 3, 5, 5, 1, 2, 3, 4, 5, 1, 5, 5, 5, 1, 5, 1, 6, 1, 4, 1, 2, 3, 4, 5, 1, 1, 8, 1, 5, 4, 1, 2, 3, 4, 5, 6, 7, 8, 9, 7, 1, 2, 3, 2, 1, 3, 4, 2, 8, 2, 1, 6, 4, 5, 6, 7, 5, 1, 2, 3, 5, 1, 8, 3, 7, 1, 9, 6, 1, 2, 3, 4, 1, 2, 3, 1, 3, 1, 8, 1, 2, 3, 2, 1, 2, 3, 7, 1, 9, 1, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 1, 2, 2, 3, 5, 2, 3, 8, 8, 8, 8, 7, 1, 1, 7, 1, 2, 1, 2, 3, 1, 2, 1, 1, 2, 3, 4, 1, 7, 6, 2, 5, 6, 1], \"Freq\": [0.41864712757524863, 0.8601736959227776, 0.21443495335545576, 0.4288699067109115, 0.8562115918974497, 0.7622885151142276, 0.3135615335373768, 0.6271230670747536, 0.6893870446608197, 0.8423803149547703, 0.08779417057169087, 0.7901475351452178, 0.8793787674925763, 0.5743834884088348, 0.8658895918241686, 0.9023174623884779, 0.6466176130253081, 0.7452883095961655, 0.8830301230250197, 0.8539399013915434, 0.5424010221970458, 0.6784191121556169, 0.8540829247610638, 0.7484831533086698, 0.9083809549770764, 0.8401734518285818, 0.2066296690322599, 0.6198890070967797, 0.516458763967563, 0.28282265645842736, 0.07377982342393757, 0.024593274474645856, 0.012296637237322928, 0.0860764606612605, 0.10403065320208489, 0.8322452256166791, 0.4048406998274767, 0.5667769797584674, 0.3461729990091312, 0.2202919084603562, 0.37764327164632494, 0.2916035677379979, 0.2916035677379979, 0.3888047569839972, 0.86285470163833, 0.40898190388606864, 0.1840418567487309, 0.02044909519430343, 0.3680837134974618, 0.4189700045338886, 0.45000481968454703, 0.09310444545197526, 0.015517407575329208, 0.9105045234771502, 0.7373509471502081, 0.8630735034833347, 0.9573260395467306, 0.6717665126009446, 0.845999141295782, 0.9231842723711645, 0.3163424417308135, 0.15817122086540675, 0.3163424417308135, 0.8773123506787716, 0.8826795398603595, 0.3102673624844607, 0.36667961020890805, 0.282061238622237, 0.6930369280669807, 0.8721439027997868, 0.9529302493311486, 0.4905020023304328, 0.2452510011652164, 0.6790921995871019, 0.8989860277492813, 0.8690617138260075, 0.8869739426606148, 0.8939931628886898, 0.23548902171256667, 0.7064670651377, 0.8449251861380779, 0.7935133414199601, 0.7347485277424706, 0.14694970554849413, 0.8715593207004428, 0.922481325627789, 0.7376597786558325, 0.8809002773361853, 0.4096587665585612, 0.1638635066234245, 0.28676113659099284, 0.12289762996756837, 0.30447438430848345, 0.24357950744678675, 0.06089487686169669, 0.3653692611701801, 0.7698056652033659, 0.6778096459994402, 0.11296827433324003, 0.4893385167435273, 0.3914708133948218, 0.8443457344752061, 0.17452861183609417, 0.7245581764104516, 0.05288745813214975, 0.021154983252859902, 0.015866237439644926, 0.8677037266691237, 0.891072533755457, 0.697509130801209, 0.18851598129762406, 0.03770319625952481, 0.03770319625952481, 0.8523861343907407, 0.1334043199735839, 0.6670215998679195, 0.830717840667232, 0.8250707155595769, 0.36860965169048493, 0.5897754427047759, 0.7103327972313237, 0.14206655944626473, 0.6900064505313809, 0.17250161263284522, 0.1672123210637216, 0.6688492842548864, 0.6524654066741974, 0.30113788000347574, 0.025094823333622975, 0.3985351514158922, 0.3128994990455352, 0.1580965889914283, 0.05599254193446419, 0.026349431498571385, 0.036230468310535655, 0.006587357874642846, 0.7884465819898533, 0.8119264328025481, 0.6451949080022077, 0.2294026339563405, 0.07168832311135641, 0.028675329244542563, 0.02150649693340692, 0.7008408342950286, 0.14016816685900574, 0.8447222007207567, 0.8628274999475705, 0.8846839273482304, 0.8764830636469647, 0.8202056515798388, 0.9416290897928676, 0.5140833190387366, 0.2953244598733168, 0.05468971479135496, 0.06562765774962595, 0.04375177183308397, 0.032813828874812975, 0.8728631708878059, 0.7842582661741047, 0.9484646807016696, 0.5491450189026289, 0.5479539037832677, 0.5585334760219195, 0.8311890989801193, 0.3278059622510332, 0.2458544716882749, 0.3278059622510332, 0.909449140387123, 0.8996465506099455, 0.5985376669921442, 0.3029635104528137, 0.059114831307866086, 0.022168061740449784, 0.8960512125278021, 0.9136199489140425, 0.8864577313694116, 0.3342103026715159, 0.3342103026715159, 0.16710515133575796, 0.7021108228623303, 0.8443850475723715, 0.5527481353622605, 0.3114961324460418, 0.6135529881512944, 0.06607493718552401, 0.5052236986653534, 0.2416287254486473, 0.10433967689827951, 0.060407181362161826, 0.03844093359410298, 0.04393249553611769, 0.38878023515716265, 0.23326814109429758, 0.15551209406286506, 0.15551209406286506, 0.9268593509209543, 0.8390824498863141, 0.4262258051262394, 0.4262258051262394, 0.07749560093204352, 0.738645030530755, 0.8002056996284869, 0.9612790639894417, 0.9571978742889339, 0.017403597714344254, 0.8325085197552072, 0.3977983562611527, 0.2983487671958645, 0.28177383568498315, 0.8010103944323911, 0.09423651699204602, 0.04711825849602301, 0.8240997277636744, 0.8653718492258756, 0.8271156333385323, 0.8731718010098467, 0.729759496287113, 0.8079499573585563, 0.9597264442324468, 0.8211764150630319, 0.12965943395732082, 0.12150196538185504, 0.8505137576729852, 0.9088074724957886, 0.9186672281649146, 0.8108954293392313, 0.867845378508282, 0.9263376627987201, 0.18664338903846528, 0.7465735561538611, 0.6107299938156396, 0.8749043750009867, 0.6499448287242645, 0.9328995161290963, 0.05580016386535633, 0.799802348736774, 0.11160032773071266, 0.5484075891571539, 0.1747565290146874, 0.7255043780306719, 0.058252176338229135, 0.021182609577537866, 0.0158869571831534, 0.629293976487347, 0.15732349412183674, 0.8924232422105561, 0.6024158201840277, 0.2655440292463528, 0.624028468728929, 0.07966320877390583, 0.7079281494836488, 0.549647984397703, 0.8718327149168933, 0.7714623593381567, 0.8854208186472965, 0.8400479819820272, 0.7780110974481466, 0.92052134456493, 0.9733955340831005, 0.9340325273453137, 0.8937321690653496, 0.8700909260728397, 0.8185920428623027, 0.3198270935006771, 0.03997838668758464, 0.5996758003137695, 0.583317333560805, 0.06805368891542725, 0.11990411856527657, 0.1328667259777389, 0.038887822237386994, 0.02268456297180908, 0.009721955559346749, 0.02268456297180908, 0.7868211126312398, 0.7899965782817593, 0.8602101465140782, 0.04053569685257161, 0.04053569685257161, 0.8512496339040038, 0.16431571415754542, 0.7394207137089545, 0.5103281560188611, 0.4422844018830129, 0.7878945332746042, 0.9151595783867383, 0.8739999961183385, 0.38912737481379206, 0.2715784803387924, 0.1661898163267237, 0.11754889447499968, 0.028373871080172336, 0.020267050771551668, 0.42056249739449864, 0.2879938840853632, 0.06856997240127695, 0.05942730941444002, 0.10514062434862466, 0.02742798896051078, 0.02742798896051078, 0.6881236122992531, 0.562353033718739, 0.8121591550586075, 0.6076962540625243, 0.8504410340219515, 0.8478131504204771, 0.520533383739536, 0.130133345934884, 0.03976296681343677, 0.19881483406718384, 0.02168889098914733, 0.0614518578025841, 0.02168889098914733, 0.7443620726500229, 0.7654448423368131, 0.5479128512807784, 0.5648384767683455, 0.5614316898753896, 0.9052949756717353, 0.2204195010505163, 0.7347316701683877, 0.7870630497786394, 0.9243844476261239, 0.8657174732612299, 0.9224049328222348, 0.049615420210131135, 0.8930775637823605, 0.049615420210131135, 0.8145133407566809, 0.8382510728220995, 0.8323470126318266, 0.7903361125268815, 0.946348527502686, 0.8441385821902546, 0.10231982814427328, 0.02557995703606832, 0.6064787251935243, 0.8017313171040877, 0.9461759888585443, 0.5422072667121536, 0.09036787778535893, 0.3313488852129828, 0.9429327154064178, 0.8599956047326275, 0.0687996483786102, 0.0343998241893051, 0.7656768646151823, 0.879099438089304, 0.8398304542858045, 0.11197739390477393, 0.763424417316305, 0.9289205536611375, 0.04889055545584934, 0.36027433159292543, 0.5764389305486807, 0.22310449208296718, 0.44620898416593435, 0.9222038873228768, 0.24095937626219668, 0.6023984406554918, 0.12047968813109834, 0.6603774476299755, 0.05079826520230581, 0.2539913260115291, 0.8998766121080478, 0.054537976491396835, 0.84864897475063, 0.07740925531212213, 0.11611388296818319, 0.7740925531212213, 0.8955291904722006, 0.8261752854778802, 0.5362039865047783, 0.17873466216825945, 0.12661986665412553, 0.8230291332518159, 0.0643726501699494, 0.8368444522093422, 0.6000771145188131, 0.28128614743069363, 0.09376204914356455, 0.6060499801812556, 0.20201666006041852, 0.8606805924476051, 0.06620619941904655, 0.916365314681624, 0.05490668542119972, 0.8785069667391955, 0.9090675364590459, 0.8857348942984637, 0.753282886732051, 0.10814192783513908, 0.7569934948459736, 0.4194606896640759, 0.45759347963353736, 0.0762655799389229, 0.8954471413571256, 0.45535931197707413, 0.4716221445476839, 0.04878849771182937, 0.7863057443477145, 0.8933565692507592, 0.5814960292862061, 0.23259841171448248, 0.038766401952413744, 0.07753280390482749, 0.4260736444543499, 0.3919877528980019, 0.017042945778173998, 0.15338651200356598, 0.8497828980005835, 0.07725299072732576, 0.2064872844537171, 0.4129745689074342, 0.5525789004245097, 0.35815299101588594, 0.08186354080363106, 0.8866310648545999, 0.06820238960419998, 0.055413029991939165, 0.8311954498790874, 0.8190782535550815, 0.8412523655658808, 0.3196104135810008, 0.4794156203715012, 0.78290082479782, 0.9442549576708343, 0.8812274068113981, 0.04638038983217885, 0.8332373310081678, 0.501490953204023, 0.2279504332745559, 0.25834382437783004, 0.8755029340968037, 0.10510458777945678, 0.8408367022356542, 0.8218989048167178, 0.9649484471157918, 0.7702609551932257, 0.17327941964130447, 0.6931176785652179, 0.8173835580659863, 0.8223308408853847, 0.8661211211576484, 0.5683828477200307, 0.3078740425150166, 0.09473047462000511, 0.007894206218333758, 0.890899734088047, 0.9442101775146365, 0.10429074531901789, 0.7300352172331251, 0.8550219660747909, 0.8831893168194985, 0.5554091989267811, 0.8991954739165832, 0.4131754273122529, 0.531225549401468, 0.8907572670876085, 0.83921243134141, 0.06455480241087769, 0.9001868644887964, 0.9104911580337425, 0.6715904832264231, 0.27982936801100966, 0.03357952416132116, 0.8268442502134278, 0.5819965655796789, 0.34160667979676806, 0.06326049625866074, 0.2820086605745882, 0.2820086605745882, 0.6815614007813554, 0.8604713643960091, 0.8718015870416665, 0.884427730964333, 0.6651474783503456, 0.5406972920448103, 0.18542651958545267, 0.7417060783418107, 0.7084537737668347, 0.8160101477355525, 0.8452871836997587, 0.911899144500606, 0.8389210672389044, 0.45647989104472014, 0.45647989104472014, 0.10673704176781272, 0.747159292374689, 0.10673704176781272, 0.8314039224779564, 0.765025334720739, 0.35462590308727404, 0.5455783124419601, 0.054557831244196005, 0.819772257165152, 0.855666218008439, 0.19721015714658774, 0.739538089299704, 0.418623179674591, 0.8761918874053383, 0.9277517070076491, 0.03092505690025497, 0.12599100881562525, 0.8399400587708349, 0.5934350665250283, 0.3758422087991846, 0.519120352374743, 0.042377171622427996, 0.148320100678498, 0.23307444392335397, 0.042377171622427996, 0.909502608493951, 0.13009172725147547, 0.8130732953217217, 0.5001223351868452, 0.7085774993442054, 0.8789982320672739, 0.4654138293267067, 0.08268773617479562, 0.11340032389686255, 0.18663803308025295, 0.04961264170487737, 0.05197514845272867, 0.02126256073066173, 0.00708752024355391, 0.018900053982810427, 0.9335585003197289, 0.11226813189660122, 0.7858769232762085, 0.0374227106322004, 0.8896239990028589, 0.9047404511674699, 0.14927931741738243, 0.7463965870869121, 0.7558165219901478, 0.1259694203316913, 0.5404183380141993, 0.8326130048326553, 0.8581500653983971, 0.7620997530066346, 0.24645012361697827, 0.030806265452122283, 0.7085441053988125, 0.9525311205822394, 0.5244653380760876, 0.3146792028456526, 0.13486251550527967, 0.8552030571243232, 0.7888469240441052, 0.07888469240441051, 0.6776501052871867, 0.8609663348076809, 0.8576660723754194, 0.8790167263445751, 0.7422017220282873, 0.7686956498747503, 0.12550133059179597, 0.06275066529589798, 0.04706299897192348, 0.8884292937078535, 0.060232494488668034, 0.030116247244334017, 0.8811413191148202, 0.041959110434039056, 0.890920788188417, 0.038735686442974655, 0.8259152851366707, 0.06882627376138922, 0.8687867833413013, 0.9088156129026115, 0.28953459634230905, 0.09651153211410302, 0.28953459634230905, 0.33779036239936056, 0.5614637990099957, 0.18715459966999853, 0.8591829278785075, 0.4527961590957233, 0.3521747904077848, 0.10062136868793851, 0.033540456229312836, 0.041925570286641047, 0.45337218393330403, 0.33463185004601015, 0.06476745484761487, 0.032383727423807435, 0.010794575807935811, 0.09715118227142229, 0.2739832882948134, 0.6849582207370334, 0.880707122701216, 0.0440353561350608, 0.8737167585731369, 0.9324894648307954, 0.6873903573284762, 0.8424062095759874, 0.8867378593542977, 0.8686734113327069, 0.8884258808111789, 0.890630723704323, 0.7294200785279272, 0.5452584179081557, 0.3635056119387704, 0.7174392115016436, 0.10249131592880623, 0.7725488375205112, 0.1423116279643047, 0.06099069769898772, 0.49513212893898445, 0.330088085959323, 0.8897498770812586, 0.6945601749003442, 0.22405166932269166, 0.04481033386453833, 0.022405166932269165, 0.24273616311471705, 0.7282084893441512, 0.8416852485007988, 0.8016754057906113, 0.7634109055591707, 0.915374889332195, 0.5836696053160427], \"Term\": [\"0000\", \"10h\", \"11\", \"11\", \"12\", \"12h\", \"13\", \"13\", \"15\", \"18h\", \"1er\", \"1er\", \"20\", \"2007\", \"2016\", \"2017\", \"21\", \"23\", \"49\", \"8h\", \"activit\\u00e9\", \"affaires\", \"ait\", \"alstom\", \"an\", \"ancien\", \"arr\\u00eaterai\", \"arr\\u00eaterai\", \"au\", \"au\", \"au\", \"au\", \"au\", \"au\", \"aura\", \"aura\", \"aurai\", \"aurai\", \"aurais\", \"aurais\", \"aurais\", \"aurait\", \"aurait\", \"aurait\", \"autoriserai\", \"aux\", \"aux\", \"aux\", \"aux\", \"avec\", \"avec\", \"avec\", \"avec\", \"bah\", \"baiser\", \"ballerine\", \"ballerines\", \"bande\", \"banques\", \"bayrou\", \"beau\", \"beau\", \"beau\", \"belle\", \"bendit\", \"bien\", \"bien\", \"bien\", \"blanc\", \"bonne\", \"bordel\", \"boulot\", \"boulot\", \"burkini\", \"bus\", \"cache\", \"camembert\", \"candidat\", \"cannabis\", \"cannabis\", \"cantine\", \"cantines\", \"cars\", \"cars\", \"carte\", \"cc\", \"chaine\", \"changement\", \"chaque\", \"chaque\", \"chaque\", \"chaque\", \"chez\", \"chez\", \"chez\", \"chez\", \"chirac\", \"chocolat\", \"chocolat\", \"chocolatine\", \"chocolatine\", \"christineboutin\", \"co\", \"co\", \"co\", \"co\", \"co\", \"cohn\", \"cohnbendit\", \"comme\", \"comme\", \"comme\", \"comme\", \"comment\", \"comptes\", \"comptes\", \"constitution\", \"coupe\", \"cours\", \"cours\", \"cr\\u00e9er\", \"cr\\u00e9er\", \"culture\", \"culture\", \"cyrilhanouna\", \"cyrilhanouna\", \"dans\", \"dans\", \"dans\", \"de\", \"de\", \"de\", \"de\", \"de\", \"de\", \"de\", \"demanderai\", \"demanderais\", \"des\", \"des\", \"des\", \"des\", \"des\", \"devoirs\", \"devoirs\", \"dimanche\", \"direct\", \"domicile\", \"donnerai\", \"drogue\", \"droite\", \"du\", \"du\", \"du\", \"du\", \"du\", \"du\", \"dutreil\", \"d\\u00e9missionnerai\", \"d\\u00e9missionnerais\", \"d\\u00e9put\\u00e9\", \"edito\", \"effets\", \"elysee\", \"elys\\u00e9e\", \"elys\\u00e9e\", \"elys\\u00e9e\", \"emmanuel\", \"emmanuelmacron\", \"en\", \"en\", \"en\", \"en\", \"end\", \"enfant\", \"enmarche\", \"entretien\", \"entretien\", \"entretien\", \"envers\", \"es\", \"espace\", \"est\", \"est\", \"est\", \"et\", \"et\", \"et\", \"et\", \"et\", \"et\", \"etc\", \"etc\", \"etc\", \"etc\", \"eu\", \"fachos\", \"fait\", \"fait\", \"fait\", \"faites\", \"famille\", \"ferai\", \"ferais\", \"ferais\", \"finances\", \"france\", \"france\", \"france\", \"fran\\u00e7ais\", \"fran\\u00e7ais\", \"fran\\u00e7ais\", \"fran\\u00e7ois\", \"fromage\", \"fronti\\u00e8res\", \"f\\u00e9ri\\u00e9\", \"f\\u00e9ri\\u00e9s\", \"f\\u00eate\", \"gauche\", \"gens\", \"gens\", \"gratuit\", \"gratuit\", \"gratuites\", \"gratuits\", \"gros\", \"grosse\", \"gt\", \"g\\u00e9rard\", \"g\\u00e9rard\", \"haine\", \"hashtag\", \"haute\", \"heure\", \"hollande\", \"hollande\", \"hollande\", \"homologues\", \"https\", \"https\", \"https\", \"https\", \"https\", \"humour\", \"humour\", \"hymne\", \"h\\u00e9ritage\", \"il\", \"il\", \"il\", \"ill\\u00e9gal\", \"important\", \"imposerai\", \"in\", \"installation\", \"instaurerais\", \"interdiction\", \"interdirai\", \"interdirais\", \"interdirait\", \"interdit\", \"interdite\", \"internet\", \"jamais\", \"jamais\", \"jamais\", \"je\", \"je\", \"je\", \"je\", \"je\", \"je\", \"je\", \"je\", \"jean\", \"jeudi\", \"jeux\", \"jour\", \"jour\", \"jour\", \"journ\\u00e9es\", \"journ\\u00e9es\", \"jul\", \"jul\", \"justice\", \"kebab\", \"kfc\", \"la\", \"la\", \"la\", \"la\", \"la\", \"la\", \"le\", \"le\", \"le\", \"le\", \"le\", \"le\", \"le\", \"le1hebdo\", \"le_parisien\", \"leggings\", \"lelab_e1\", \"lemissionpolitique\", \"lenorman\", \"les\", \"les\", \"les\", \"les\", \"les\", \"les\", \"les\", \"lettre\", \"libre\", \"lib\\u00e9ral\", \"lrps\", \"lrpsfn\", \"lundi\", \"lyc\\u00e9e\", \"lyc\\u00e9e\", \"l\\u00e9gal\", \"l\\u00e9galiserai\", \"l\\u00e9galiserais\", \"macdo\", \"macron\", \"macron\", \"macron\", \"mandat\", \"marche\", \"marre\", \"maths\", \"mcdo\", \"me\", \"me\", \"me\", \"medef\", \"mercredi\", \"merde\", \"mes\", \"mes\", \"mes\", \"mets\", \"mettrais\", \"mettrais\", \"mettrais\", \"meufs\", \"mickey\", \"mieux\", \"mieux\", \"millions\", \"ministre\", \"ministre\", \"minist\\u00e8re\", \"minist\\u00e8re\", \"mlp_officiel\", \"mlp_officiel\", \"moins\", \"mois\", \"mois\", \"mois\", \"mon\", \"mon\", \"mon\", \"monde\", \"monde\", \"montebourg\", \"mort\", \"mort\", \"mort\", \"mot\", \"m\\u00e8re\", \"nan\", \"nan\", \"national\", \"national\", \"nationale\", \"nationale\", \"ne\", \"ne\", \"ne\", \"netflix\", \"netflix\", \"nommerai\", \"nommerai\", \"nommerais\", \"nos\", \"nos\", \"notaires\", \"nouveaux\", \"nouvel\", \"obama\", \"obama\", \"obligatoire\", \"obligatoire\", \"obligatoire\", \"obligerai\", \"on\", \"on\", \"on\", \"organiserai\", \"organiserais\", \"ou\", \"ou\", \"ou\", \"ou\", \"par\", \"par\", \"par\", \"par\", \"parce\", \"parce\", \"partir\", \"partir\", \"pas\", \"pas\", \"pas\", \"pays\", \"pays\", \"peine\", \"peine\", \"pens\\u00e9e\", \"permis\", \"perp\\u00e9tuit\\u00e9\", \"perp\\u00e9tuit\\u00e9\", \"picsou\", \"pizza\", \"place\", \"place\", \"plein\", \"plus\", \"plus\", \"plus\", \"police\", \"politique\", \"politique\", \"politiques\", \"port\", \"porte\", \"possible\", \"possible\", \"poste\", \"pote\", \"poudlard\", \"pour\", \"pour\", \"pour\", \"pour\", \"pourquoi\", \"premier\", \"premi\\u00e8re\", \"premi\\u00e8re\", \"presidentielle2017\", \"prison\", \"pro\", \"promesses\", \"pr\\u00e9sident\", \"pr\\u00e9sident\", \"pr\\u00e9sidentielle\", \"ps\", \"ps\", \"putain\", \"putes\", \"que\", \"que\", \"que\", \"question\", \"qui\", \"qui\", \"qui\", \"quotidien\", \"quotidien\", \"raciste\", \"renaud\", \"rendrai\", \"rendrais\", \"rocard\", \"r\\u00e9f\\u00e9rendum\", \"r\\u00e9publique\", \"r\\u00e9publique\", \"r\\u00e9seaux\", \"r\\u00e9tablirais\", \"salaire\", \"samedi\", \"sant\\u00e9\", \"sarko\", \"sarko\", \"sarkozy\", \"sarkozy\", \"sarkozy\", \"scolaire\", \"scolaires\", \"se\", \"se\", \"se\", \"secret\", \"self\", \"semaine\", \"semaine\", \"sep\", \"septembre\", \"serai\", \"serai\", \"seraient\", \"seraient\", \"serais\", \"serais\", \"serait\", \"serait\", \"serait\", \"serait\", \"serait\", \"seriez\", \"seront\", \"seront\", \"servent\", \"shopping\", \"sieste\", \"sijetaispresident\", \"sijetaispresident\", \"sijetaispresident\", \"sijetaispresident\", \"sijetaispresident\", \"sijetaispresident\", \"sijetaispresident\", \"sijetaispresident\", \"sijetaispresident\", \"soir\", \"son\", \"son\", \"son\", \"sondage\", \"sorte\", \"sortie\", \"sortie\", \"soutien\", \"soutien\", \"soutiens\", \"sport\", \"ss10\", \"suppression\", \"supprimerai\", \"supprimerai\", \"supprimerai\", \"supprimerais\", \"sur\", \"sur\", \"sur\", \"s\\u00e9nat\", \"s\\u00e9ries\", \"s\\u00e9ries\", \"ta\", \"tacos\", \"temps\", \"tiendrais\", \"tintin\", \"tous\", \"tous\", \"tous\", \"tous\", \"tout\", \"tout\", \"tout\", \"toute\", \"toute\", \"toutes\", \"toutes\", \"travail\", \"travail\", \"triste\", \"tr\\u00e8s\", \"tu\", \"tu\", \"tu\", \"tu\", \"tweeter\", \"tweeter\", \"t\\u00e9l\\u00e9\", \"un\", \"un\", \"un\", \"un\", \"un\", \"une\", \"une\", \"une\", \"une\", \"une\", \"une\", \"vacances\", \"vacances\", \"valls\", \"valls\", \"vendredi\", \"via\", \"violeurs\", \"vire\", \"virerais\", \"vitesse\", \"vivre\", \"volont\\u00e9\", \"vont\", \"vos\", \"vos\", \"votez\", \"votez\", \"vous\", \"vous\", \"vous\", \"vs\", \"vs\", \"week\", \"\\u00e7a\", \"\\u00e7a\", \"\\u00e7a\", \"\\u00e7a\", \"\\u00e9cole\", \"\\u00e9cole\", \"\\u00e9conomie\", \"\\u00e9conomique\", \"\\u00e9couter\", \"\\u00e9ducation\", \"\\u00e9lus\"]}, \"R\": 30, \"lambda.step\": 0.01, \"plot.opts\": {\"xlab\": \"PC1\", \"ylab\": \"PC2\"}, \"topic.order\": [3, 2, 6, 5, 7, 9, 1, 10, 8, 4]};\n", "\n", "function LDAvis_load_lib(url, callback){\n", "  var s = document.createElement('script');\n", "  s.src = url;\n", "  s.async = true;\n", "  s.onreadystatechange = s.onload = callback;\n", "  s.onerror = function(){console.warn(\"failed to load library \" + url);};\n", "  document.getElementsByTagName(\"head\")[0].appendChild(s);\n", "}\n", "\n", "if(typeof(LDAvis) !== \"undefined\"){\n", "   // already loaded: just create the visualization\n", "   !function(LDAvis){\n", "       new LDAvis(\"#\" + \"ldavis_el5588813375534963207284704117\", ldavis_el5588813375534963207284704117_data);\n", "   }(LDAvis);\n", "}else if(typeof define === \"function\" && define.amd){\n", "   // require.js is available: use it to load d3/LDAvis\n", "   require.config({paths: {d3: \"https://d3js.org/d3.v5\"}});\n", "   require([\"d3\"], function(d3){\n", "      window.d3 = d3;\n", "      LDAvis_load_lib(\"https://cdn.jsdelivr.net/gh/bmabey/pyLDAvis@3.3.1/pyLDAvis/js/ldavis.v3.0.0.js\", function(){\n", "        new LDAvis(\"#\" + \"ldavis_el5588813375534963207284704117\", ldavis_el5588813375534963207284704117_data);\n", "      });\n", "    });\n", "}else{\n", "    // require.js not available: dynamically load d3 & LDAvis\n", "    LDAvis_load_lib(\"https://d3js.org/d3.v5.js\", function(){\n", "         LDAvis_load_lib(\"https://cdn.jsdelivr.net/gh/bmabey/pyLDAvis@3.3.1/pyLDAvis/js/ldavis.v3.0.0.js\", function(){\n", "                 new LDAvis(\"#\" + \"ldavis_el5588813375534963207284704117\", ldavis_el5588813375534963207284704117_data);\n", "            })\n", "         });\n", "}\n", "</script>"], "text/plain": ["PreparedData(topic_coordinates=              x         y  topics  cluster       Freq\n", "topic                                                \n", "2      0.132172  0.049678       1        1  36.881857\n", "1      0.115237  0.158473       2        1  26.106389\n", "5      0.174221  0.095581       3        1   9.645160\n", "4      0.157026 -0.190649       4        1   5.727111\n", "6     -0.021095 -0.162058       5        1   4.590570\n", "8      0.005103 -0.062020       6        1   4.341137\n", "0     -0.171929 -0.012022       7        1   3.476527\n", "9     -0.157733  0.042830       8        1   3.247126\n", "7     -0.101223  0.019969       9        1   3.118181\n", "3     -0.131778  0.060219      10        1   2.865942, topic_info=                  Term        Freq       Total Category  logprob  loglift\n", "837  sijetaispresident  423.000000  423.000000  Default  30.0000  30.0000\n", "494                les  276.000000  276.000000  Default  29.0000  29.0000\n", "460                 je  308.000000  308.000000  Default  28.0000  28.0000\n", "447        interdirais   58.000000   58.000000  Default  27.0000  27.0000\n", "397            gratuit   49.000000   49.000000  Default  26.0000  26.0000\n", "..                 ...         ...         ...      ...      ...      ...\n", "245                des    0.302704  139.492731  Topic10  -7.1798  -2.5807\n", "479                 la    0.301500  246.705851  Topic10  -7.1838  -3.1549\n", "646                pas    0.301246   97.723601  Topic10  -7.1846  -2.2297\n", "321                 et    0.300239  182.097554  Topic10  -7.1880  -2.8554\n", "277                 du    0.298278   91.424869  Topic10  -7.1945  -2.1730\n", "\n", "[513 rows x 6 columns], token_table=      Topic      Freq        Term\n", "term                             \n", "1         5  0.418647        0000\n", "5         5  0.860174         10h\n", "6         1  0.214435          11\n", "6         2  0.428870          11\n", "7         5  0.856212          12\n", "...     ...       ...         ...\n", "982       6  0.841685    \u00e9conomie\n", "983       2  0.801675  \u00e9conomique\n", "985       5  0.763411     \u00e9couter\n", "986       6  0.915375   \u00e9ducation\n", "987       1  0.583670        \u00e9lus\n", "\n", "[623 rows x 3 columns], R=30, lambda_step=0.01, plot_opts={'xlab': 'PC1', 'ylab': 'PC2'}, topic_order=[3, 2, 6, 5, 7, 9, 1, 10, 8, 4])"]}, "execution_count": 33, "metadata": {}, "output_type": "execute_result"}], "source": ["pyLDAvis.sklearn.prepare(lda, tfidf, tfidf_vectorizer)"]}, {"cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["### Exercice 4 : LDA\n", "\n", "Recommencer en supprimant les stop-words pour avoir des r\u00e9sultats plus propres."]}, {"cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5"}}, "nbformat": 4, "nbformat_minor": 2}