{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# 2A.eco - Web-Scraping\n", "\n", "Sous ce nom se cache une pratique tr\u00e8s utile pour toute personne souhaitant travailler sur des informations disponibles en ligne, mais n'existant pas forc\u00e9ment sous la forme d'un tableau *Excel*... Bref, il s'agit de r\u00e9cup\u00e9rer des informations depuis *Internet*."]}, {"cell_type": "markdown", "metadata": {}, "source": ["Le [webscraping](https://fr.wikipedia.org/wiki/Web_scraping) d\u00e9signe les techniques d'extraction du contenu des sites internet. Via un programme informatique : nous allons aujourd'hui vous pr\u00e9senter comme cr\u00e9er et ex\u00e9cuter ces robots afin de recup\u00e9rer rapidement des informations utiles \u00e0 vos projets actuels ou futurs."]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [{"data": {"text/html": ["<div id=\"my_id_menu_nb\">run previous cell, wait for 2 seconds</div>\n", "<script>\n", "function repeat_indent_string(n){\n", "    var a = \"\" ;\n", "    for ( ; n > 0 ; --n)\n", "        a += \"    \";\n", "    return a;\n", "}\n", "// look up into all sections and builds an automated menu //\n", "var update_menu_string = function(begin, lfirst, llast, sformat, send, keep_item, begin_format, end_format) {\n", "    var anchors = document.getElementsByClassName(\"section\");\n", "    if (anchors.length == 0) {\n", "        anchors = document.getElementsByClassName(\"text_cell_render rendered_html\");\n", "    }\n", "    var i,t;\n", "    var text_menu = begin;\n", "    var text_memo = \"<pre>\\nlength:\" + anchors.length + \"\\n\";\n", "    var ind = \"\";\n", "    var memo_level = 1;\n", "    var href;\n", "    var tags = [];\n", "    var main_item = 0;\n", "    var format_open = 0;\n", "    for (i = 0; i <= llast; i++)\n", "        tags.push(\"h\" + i);\n", "\n", "    for (i = 0; i < anchors.length; i++) {\n", "        text_memo += \"**\" + anchors[i].id + \"--\\n\";\n", "\n", "        var child = null;\n", "        for(t = 0; t < tags.length; t++) {\n", "            var r = anchors[i].getElementsByTagName(tags[t]);\n", "            if (r.length > 0) {\n", "child = r[0];\n", "break;\n", "            }\n", "        }\n", "        if (child == null) {\n", "            text_memo += \"null\\n\";\n", "            continue;\n", "        }\n", "        if (anchors[i].hasAttribute(\"id\")) {\n", "            // when converted in RST\n", "            href = anchors[i].id;\n", "            text_memo += \"#1-\" + href;\n", "            // passer \u00e0 child suivant (le chercher)\n", "        }\n", "        else if (child.hasAttribute(\"id\")) {\n", "            // in a notebook\n", "            href = child.id;\n", "            text_memo += \"#2-\" + href;\n", "        }\n", "        else {\n", "            text_memo += \"#3-\" + \"*\" + \"\\n\";\n", "            continue;\n", "        }\n", "        var title = child.textContent;\n", "        var level = parseInt(child.tagName.substring(1,2));\n", "\n", "        text_memo += \"--\" + level + \"?\" + lfirst + \"--\" + title + \"\\n\";\n", "\n", "        if ((level < lfirst) || (level > llast)) {\n", "            continue ;\n", "        }\n", "        if (title.endsWith('\u00b6')) {\n", "            title = title.substring(0,title.length-1).replace(\"<\", \"&lt;\")\n", "         .replace(\">\", \"&gt;\").replace(\"&\", \"&amp;\");\n", "        }\n", "        if (title.length == 0) {\n", "            continue;\n", "        }\n", "\n", "        while (level < memo_level) {\n", "            text_menu += end_format + \"</ul>\\n\";\n", "            format_open -= 1;\n", "            memo_level -= 1;\n", "        }\n", "        if (level == lfirst) {\n", "            main_item += 1;\n", "        }\n", "        if (keep_item != -1 && main_item != keep_item + 1) {\n", "            // alert(main_item + \" - \" + level + \" - \" + keep_item);\n", "            continue;\n", "        }\n", "        while (level > memo_level) {\n", "            text_menu += \"<ul>\\n\";\n", "            memo_level += 1;\n", "        }\n", "        text_menu += repeat_indent_string(level-2);\n", "        text_menu += begin_format + sformat.replace(\"__HREF__\", href).replace(\"__TITLE__\", title);\n", "        format_open += 1;\n", "    }\n", "    while (1 < memo_level) {\n", "        text_menu += end_format + \"</ul>\\n\";\n", "        memo_level -= 1;\n", "        format_open -= 1;\n", "    }\n", "    text_menu += send;\n", "    //text_menu += \"\\n\" + text_memo;\n", "\n", "    while (format_open > 0) {\n", "        text_menu += end_format;\n", "        format_open -= 1;\n", "    }\n", "    return text_menu;\n", "};\n", "var update_menu = function() {\n", "    var sbegin = \"\";\n", "    var sformat = '<a href=\"#__HREF__\">__TITLE__</a>';\n", "    var send = \"\";\n", "    var begin_format = '<li>';\n", "    var end_format = '</li>';\n", "    var keep_item = -1;\n", "    var text_menu = update_menu_string(sbegin, 2, 4, sformat, send, keep_item,\n", "       begin_format, end_format);\n", "    var menu = document.getElementById(\"my_id_menu_nb\");\n", "    menu.innerHTML=text_menu;\n", "};\n", "window.setTimeout(update_menu,2000);\n", "            </script>"], "text/plain": ["<IPython.core.display.HTML object>"]}, "execution_count": 2, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Un d\u00e9tour par le Web : comment fonctionne un site ?\n", "\n", "M\u00eame si nous n'allons pas aujourd'hui faire un cours de web, il vous faut n\u00e9anmoins certaines bases pour comprendre comment un site internet fonctionne et comment sont structur\u00e9es les informations sur une page.\n", "\n", "\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Un site Web est un ensemble de pages cod\u00e9es en *HTML* qui permet de d\u00e9crire \u00e0 la fois le contenu et la forme d'une page *Web*.\n", "\n", "###  HTML \n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Les balises\n", "\n", "\n", "Sur une page web, vous trouverez toujours \u00e0 coup s\u00fbr des \u00e9l\u00e9ments comme ``<head>``, ``<title>``, etc. Il  s'agit des codes qui vous permettent de structurer le contenu d'une page *HTML* et qui s'appellent des balises. \n", "Citons, par exemple, les balises ``<p>``, ``<h1>``, ``<h2>``, ``<h3>``, ``<strong>`` ou ``<em>``.\n", "Le symbole ``< >`` est une balise : il sert \u00e0 indiquer le d\u00e9but d'une partie. Le symbole ``</ >`` indique la fin de cette partie. La plupart des balises vont par paires, avec une *balise ouvrante* et une *balise fermante* (par exemple ``<p>`` et ``</p>``)."]}, {"cell_type": "markdown", "metadata": {}, "source": ["#### Exemple : les balise des tableaux\n", "\n", "$$\n", "\\begin{array}{rr} \\hline\n", "Balise  & \\text{Description} \\\\ \\hline\n", "< table> & \\text{Tableau} \\\\\n", "< caption>& \\text{Titre du tableau} \\\\\n", "< tr> & \\text{Ligne de tableau} \\\\\n", "< th> & \\text{Cellule d'en-t\u00eate}\\\\\n", "< td> & \\text{Cellule} \\\\\n", "< thead> & \\text{Section de l'en-t\u00eate du tableau} \\\\\n", "< tbody> & \\text{Section du corps du tableau} \\\\\n", "< tfoot> & \\text{Section du pied du tableau} \\\\\n", "\\end{array}\n", "$$"]}, {"cell_type": "markdown", "metadata": {}, "source": ["##### Application : un tableau en HTML\n", "\n", "Le code *HTML* du tableau suivant"]}, {"cell_type": "raw", "metadata": {}, "source": ["<table>\n", "   <tr>\n", "      <th>Pr\u00e9nom</th>\n", "      <th>Nom</th>\n", "      <th>Profession</th>\n", "   </tr>\n", "   <tr>\n", "      <td>Mike</td>\n", "      <td>Stuntman</td>\n", "      <td>Cascadeur</td>\n", "   </tr>\n", "   <tr>\n", "      <td>Mister</td>\n", "      <td>Pink</td>\n", "      <td>Gangster</td>\n", "   </tr>\n", "</table>"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Donnera dans le navigateur\n", "\n", "|     Pr\u00e9nom |      Mike |   Mister |\n", "|------------|-----------|----------|\n", "|        Nom | Stuntman  |     Pink |\n", "| Profession | Cascadeur | Gangster |\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["#### Parent et enfant\n", "\n", "Dans le cadre du langage HTML, les termes de parents (parent) et enfants (child) servent \u00e0 d\u00e9signer des \u00e9lements embo\u00eet\u00e9s les uns dans les autres. Dans la construction suivante, par exemple :"]}, {"cell_type": "raw", "metadata": {}, "source": ["< div> \n", "    < p>\n", "       bla,bla\n", "    < /p>\n", "< /div>"]}, {"cell_type": "markdown", "metadata": {}, "source": ["On dira que l'\u00e9l\u00e9ment ``<div>`` est le parent de l'\u00e9l\u00e9ment ``<p>`` tandis que l'\u00e9l\u00e9ment ``<p>`` est l'enfant de l'\u00e9l\u00e9ment ``<div>``."]}, {"cell_type": "markdown", "metadata": {}, "source": ["----------\n", "\n", "Mais pourquoi apprendre \u00e7a pour scraper me direz-vous ?\n", "\n", "Pour bien r\u00e9cup\u00e9rer les informations d'un site internet, il faut pouvoir comprendre sa structure et donc son code HTML. Les fonctions python qui servent au scrapping sont principalement construites pour vous permettre de naviguer entre les balises."]}, {"cell_type": "markdown", "metadata": {"collapsed": true}, "source": ["### Optionnel - CSS - le style de la page WEB\n", "\n", "Quand le bout de code html est \u00e9crit, il apaprait sous la forme d'un texte noir sur un fond blanc. Une mani\u00e8re simple de rendre la page plus belle, c'est d'y ajouter de la couleur. \n", "\n", "La feuille de style qui permet de rendre la page plus belle correspond au(x) fichier(s) [CSS](https://en.wikipedia.org/wiki/Cascading_Style_Sheets). Toutes les pages HTML qui font r\u00e9f\u00e9rence \u00e0 cette feuille de style externe h\u00e9riteront de toutes ses d\u00e9finitions. Nous y reviendrons plus en d\u00e9tail dans le TD sur [Flask](http://flask.pocoo.org/) (module Python de cr\u00e9ation de site internet)."]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Scrapper avec python\n", "\n", "Nous allons essentiellement utiliser le package [BeautifulSoup4](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) pour ce cours, mais d'autres packages existent ([Selenium](https://selenium-python.readthedocs.io/), [Scrapy](https://scrapy.org/)...).\n", "\n", "[BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) sera suffisant quand vous voudrez travailler sur des pages HTML statiques, d\u00e8s que les informations que vous recherchez sont g\u00e9n\u00e9r\u00e9es via l'ex\u00e9cution de scripts [Javascript](https://fr.wikipedia.org/wiki/JavaScript), il vous faudra passer par des outils comme Selenium.\n", "\n", "De m\u00eame, si vous ne connaissez pas l'URL, il faudra passer par un framework comme [Scrapy](https://scrapy.org/), qui passe facilement d'une page \u00e0 une autre (\"crawl\"). Scrapy est plus complexe \u00e0 manipuler que [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) : si vous voulez plus de d\u00e9tails, rendez-vous sur la page du tutorial [Scrapy Tutorial](https://doc.scrapy.org/en/latest/intro/tutorial.html)."]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Utiliser BeautifulSoup\n", "\n", "Les packages pour scrapper des pages HTML : \n", "- [BeautifulSoup4](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) (``pip install bs4``)\n", "- [urllib](https://docs.python.org/3/library/urllib.html#module-urllib)"]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": ["import urllib\n", "import bs4\n", "#help(bs4)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["#### 1ere page HTML\n", "\n", "On va commencer facilement, prenons une page wikipedia, par exemple celle de la Ligue 1 de football : [Championnat de France de football 2016-2017](https://fr.wikipedia.org/wiki/Championnat_de_France_de_football_2016-2017). On va souhaiter r\u00e9cup\u00e9rer la liste des \u00e9quipes, ainsi que les url des pages Wikipedia de ces \u00e9quipes."]}, {"cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["b'<!DOCTYPE html>\\n<html class=\"client-nojs\" lang=\"fr\" dir=\"ltr\">\\n<head>\\n<meta charset=\"UTF-8\"/>\\n<title>Championnat de France de football 2016-2017 \\xe2\\x80\\x94 Wikip\\xc3\\xa9dia</title>\\n<script>document.documentElement.className=\"client-js\";RLCONF={\"wgBreakFrames\":false,\"wgSeparatorTransformTable\":[\",\\\\t.\",\"\\xc2\\xa0\\\\t,\"],\"wgDigitTransformTable\":[\"\",\"\"],\"wgDefaultDateFormat\":\"dmy\",\"wgMonthNames\":[\"\",\"janvier\",\"f\\xc3\\xa9vrier\",\"mars\",\"avril\",\"mai\",\"juin\",\"juillet\",\"ao\\xc3\\xbbt\",\"septembre\",\"octobre\",\"novembre\",\"d\\xc3\\xa9cembre\"],\"wgRequestId\":\"ea8aa125-b829-4dfe-a828-4222cfa58da5\",\"wgCSPNonce\":false,\"wgCanonicalNamespace\":\"\",\"wgCanonicalSpecialPageName\":false,\"wgNamespaceNumber\":0,\"wgPageName\":\"Championnat_de_France_de_football_2016-2017\",\"wgTitle\":\"Championnat de France de football 2016-2017\",\"wgCurRevisionId\":196876773,\"wgRevisionId\":196876773,\"wgArticleId\":9734718,\"wgIsArticle\":true,\"wgIsRedirect\":false,\"wgAction\":\"view\",\"wgUserName\":null,\"wgUserGroups\":[\"*\"],\"wgCategories\":[\"Page utilisant une frise chronologique\",\"Article ut'\n"]}], "source": ["# Etape 1 : se connecter \u00e0 la page wikipedia et obtenir le code source\n", "\n", "url_ligue_1 = \"https://fr.wikipedia.org/wiki/Championnat_de_France_de_football_2016-2017\"\n", "    \n", "from urllib import request\n", "\n", "request_text = request.urlopen(url_ligue_1).read()\n", "print(request_text[:1000])    "]}, {"cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": ["# Etape 2 : utiliser le package BeautifulSoup\n", "# qui \"comprend\" les balises contenues dans la chaine de caract\u00e8res renvoy\u00e9e par la fonction request\n", "\n", "page = bs4.BeautifulSoup(request_text, \"lxml\")\n", "\n", "#print(page)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Si on print l'objet, page cr\u00e9\u00e9e avec BeautifulSoup, on voit que ce n'est plus une chaine de caract\u00e8res mais bien une page HTML avec des balises. On peut \u00e0 pr\u00e9senter chercher des \u00e9lements \u00e0 l'int\u00e9rieur de ces balises.\n", "\n", "\n", "par exemple, si on veut connaire le titre de la page, on utilise la m\u00e9thode .find et on lui demande \"title\""]}, {"cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["<title>Championnat de France de football 2016-2017 \u2014 Wikip\u00e9dia</title>\n"]}], "source": ["print(page.find(\"title\"))"]}, {"cell_type": "markdown", "metadata": {}, "source": ["La methode ``.find`` ne renvoie que la premi\u00e8re occurence de l'\u00e9l\u00e9ment"]}, {"cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["<table><caption style=\"background-color:#99cc99;color:#000000;\">G\u00e9n\u00e9ralit\u00e9s</caption><tbody><tr>\n", "<th scope=\"row\" style=\"width:10.5em;\">Sport</th>\n", "<td>\n", "<a href=\"/wiki/Football\" title=\"Football\">Football</a></td>\n", "</tr>\n", "<tr>\n", "<th scope=\"row\" style=\"width:10.5em;\">Organisateur(s)</th>\n", "<td>\n", "<a href=\"/wiki/Ligue_de_football_professionnel\" title=\"Ligue de football professionnel\">LFP</a></td>\n", "</tr>\n", "<tr>\n", "<th scope=\"row\" style=\"width:10.5em;\">\u00c9dition</th>\n", "<td>\n", "<abbr class=\"abbr\" title=\"Soixante-dix-neuvi\u00e8me (septante-neuvi\u00e8me)\">79<sup>e</sup></abbr></td>\n", "</tr>\n", "<tr>\n", "<th scope=\"row\" style=\"width:10.5em;\">Lieu(x)</th>\n", "<td>\n", "<span class=\"datasortkey\" data-sort-value=\"France\"><span class=\"flagicon\"><a class=\"image\" href=\"/wiki/Fichier:Flag_of_France.svg\" title=\"Drapeau de la France\"><img alt=\"Drapeau de la France\" class=\"noviewer thumbborder\" data-file-height=\"600\" data-file-width=\"900\" decoding=\"async\" height=\"13\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/20px-Flag_of_France.svg.png\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/30px-Flag_of_France.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/40px-Flag_of_France.svg.png 2x\" width=\"20\"/></a> </span><a href=\"/wiki/France\" title=\"France\">France</a></span> et <span class=\"datasortkey\" data-sort-value=\"Monaco\"><span class=\"flagicon\"><a class=\"image\" href=\"/wiki/Fichier:Flag_of_Monaco.svg\" title=\"Drapeau de Monaco\"><img alt=\"Drapeau de Monaco\" class=\"noviewer thumbborder\" data-file-height=\"800\" data-file-width=\"1000\" decoding=\"async\" height=\"16\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Flag_of_Monaco.svg/20px-Flag_of_Monaco.svg.png\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Flag_of_Monaco.svg/30px-Flag_of_Monaco.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Flag_of_Monaco.svg/40px-Flag_of_Monaco.svg.png 2x\" width=\"20\"/></a> </span><a href=\"/wiki/Monaco\" title=\"Monaco\">Monaco</a></span></td>\n", "</tr>\n", "<tr>\n", "<th scope=\"row\" style=\"width:10.5em;\">Date</th>\n", "<td>\n", "du <time class=\"nowrap date-lien\" data-sort-value=\"2016-08-12\" datetime=\"2016-08-12\"><a href=\"/wiki/12_ao%C3%BBt\" title=\"12 ao\u00fbt\">12</a> <a href=\"/wiki/Ao%C3%BBt_2016\" title=\"Ao\u00fbt 2016\">ao\u00fbt</a> <a href=\"/wiki/2016\" title=\"2016\">2016</a></time><br/>au <time class=\"nowrap date-lien\" data-sort-value=\"2017-05-20\" datetime=\"2017-05-20\"><a href=\"/wiki/20_mai\" title=\"20 mai\">20</a> <a href=\"/wiki/Mai_2017\" title=\"Mai 2017\">mai</a> <a href=\"/wiki/2017\" title=\"2017\">2017</a></time></td>\n", "</tr>\n", "<tr>\n", "<th scope=\"row\" style=\"width:10.5em;\">Participants</th>\n", "<td>\n", "20 \u00e9quipes</td>\n", "</tr>\n", "<tr>\n", "<th scope=\"row\" style=\"width:10.5em;\">Matchs jou\u00e9s</th>\n", "<td>\n", "380</td>\n", "</tr>\n", "<tr>\n", "<th scope=\"row\" style=\"width:10.5em;\">Affluence</th>\n", "<td>\n", "7965830 <small>(20963 par match)</small></td>\n", "</tr>\n", "<tr>\n", "<th scope=\"row\" style=\"width:10.5em;\">Site web officiel</th>\n", "<td>\n", "<a class=\"external text\" href=\"http://www.lfp.fr\" rel=\"nofollow\">Site officiel</a></td>\n", "</tr></tbody></table>\n"]}], "source": ["print(page.find(\"table\"))"]}, {"cell_type": "markdown", "metadata": {}, "source": ["---------------\n", "Pour trouver toutes les occurences, on utilise ``.findAll()``."]}, {"cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["Il y a 32 \u00e9l\u00e9ments dans la page qui sont des <table>\n"]}], "source": ["print(\"Il y a\", len(page.findAll(\"table\")), \"\u00e9l\u00e9ments dans la page qui sont des <table>\")"]}, {"cell_type": "code", "execution_count": 8, "metadata": {"scrolled": false}, "outputs": [{"name": "stdout", "output_type": "stream", "text": [" Le 2eme tableau de la page : Hi\u00e9rarchie \n", " <table><caption style=\"background-color:#99cc99;color:#000000;\">Hi\u00e9rarchie</caption><tbody><tr>\n", "<th scope=\"row\" style=\"width:10.5em;\">Hi\u00e9rarchie</th>\n", "<td>\n", "<abbr class=\"abbr\" title=\"Premier\">1<sup>er</sup></abbr>\u00a0\u00e9chelon</td>\n", "</tr>\n", "<tr>\n", "<th scope=\"row\" style=\"width:10.5em;\">Niveau inf\u00e9rieur</th>\n", "<td>\n", "<a class=\"mw-redirect\" href=\"/wiki/Championnat_de_France_de_football_de_Ligue_2_2016-2017\" title=\"Championnat de France de football de Ligue 2 2016-2017\">Ligue 2 2016-2017</a></td>\n", "</tr></tbody></table>\n", "--------------------------------------------------------\n", "Le 3eme tableau de la page : Palmar\u00e8s \n", " <table><caption style=\"background-color:#99cc99;color:#000000;\">Palmar\u00e8s</caption>\n", "<tbody><tr>\n", "<th scope=\"row\" style=\"width:10.5em;\">Tenant du titre</th>\n", "<td>\n", "<a href=\"/wiki/Paris_Saint-Germain_Football_Club\" title=\"Paris Saint-Germain Football Club\">Paris Saint-Germain</a> (6)</td>\n", "</tr>\n", "<tr>\n", "<th scope=\"row\" style=\"width:10.5em;\">Promu(s) en d\u00e9but de saison</th>\n", "<td>\n", "<a href=\"/wiki/Association_sportive_Nancy-Lorraine\" title=\"Association sportive Nancy-Lorraine\">AS Nancy-Lorraine</a><br/><a href=\"/wiki/Dijon_Football_C%C3%B4te-d%27Or\" title=\"Dijon Football C\u00f4te-d'Or\">Dijon FCO</a><br/><a href=\"/wiki/Football_Club_de_Metz\" title=\"Football Club de Metz\">FC Metz</a></td>\n", "</tr>\n", "<tr>\n", "<th scope=\"row\" style=\"width:10.5em;\">Vainqueur</th>\n", "<td>\n", "<b><a href=\"/wiki/Association_sportive_de_Monaco_Football_Club\" title=\"Association sportive de Monaco Football Club\">AS Monaco</a></b> (8)</td>\n", "</tr>\n", "<tr>\n", "<th scope=\"row\" style=\"width:10.5em;\">Deuxi\u00e8me</th>\n", "<td>\n", "<a href=\"/wiki/Paris_Saint-Germain_Football_Club\" title=\"Paris Saint-Germain Football Club\">Paris Saint-Germain</a></td>\n", "</tr>\n", "<tr>\n", "<th scope=\"row\" style=\"width:10.5em;\">Troisi\u00e8me</th>\n", "<td>\n", "<a class=\"mw-redirect\" href=\"/wiki/Olympique_Gymnaste_Club_Nice_C%C3%B4te_d%27Azur\" title=\"Olympique Gymnaste Club Nice C\u00f4te d'Azur\">OGC Nice</a></td>\n", "</tr>\n", "<tr>\n", "<th scope=\"row\" style=\"width:10.5em;\">Rel\u00e9gu\u00e9(s)</th>\n", "<td>\n", "<a class=\"mw-redirect\" href=\"/wiki/AS_Nancy-Lorraine\" title=\"AS Nancy-Lorraine\">AS Nancy-Lorraine</a><br/><a class=\"mw-redirect\" href=\"/wiki/SC_Bastia\" title=\"SC Bastia\">SC Bastia</a><br/><a class=\"mw-redirect\" href=\"/wiki/FC_Lorient\" title=\"FC Lorient\">FC Lorient</a></td>\n", "</tr>\n", "<tr>\n", "<th scope=\"row\" style=\"width:10.5em;\">Buts</th>\n", "<td>\n", "991 <small>(2,61 par match)</small></td>\n", "</tr>\n", "<tr>\n", "<th scope=\"row\" style=\"width:10.5em;\"><img alt=\"Averti\" data-file-height=\"666\" data-file-width=\"512\" decoding=\"async\" height=\"13\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/b/b1/Yellow_card.svg/10px-Yellow_card.svg.png\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/b/b1/Yellow_card.svg/15px-Yellow_card.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/b/b1/Yellow_card.svg/20px-Yellow_card.svg.png 2x\" title=\"Averti\" width=\"10\"/> Cartons jaunes</th>\n", "<td>\n", "1297</td>\n", "</tr>\n", "<tr>\n", "<th scope=\"row\" style=\"width:10.5em;\"><a class=\"image\" href=\"/wiki/Fichier:Red_card.svg\"><img alt=\"Red card.svg\" data-file-height=\"260\" data-file-width=\"200\" decoding=\"async\" height=\"13\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/e/e7/Red_card.svg/10px-Red_card.svg.png\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/e/e7/Red_card.svg/15px-Red_card.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/e/e7/Red_card.svg/20px-Red_card.svg.png 2x\" width=\"10\"/></a> Cartons rouges</th>\n", "<td>\n", "96</td>\n", "</tr>\n", "<tr>\n", "<th scope=\"row\" style=\"width:10.5em;\">Meilleur joueur</th>\n", "<td>\n", "<span class=\"flagicon\"><a class=\"image\" href=\"/wiki/Fichier:Flag_of_Uruguay.svg\" title=\"Drapeau : Uruguay\"><img alt=\"\" class=\"noviewer thumbborder\" data-file-height=\"630\" data-file-width=\"945\" decoding=\"async\" height=\"13\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/f/fe/Flag_of_Uruguay.svg/20px-Flag_of_Uruguay.svg.png\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/f/fe/Flag_of_Uruguay.svg/30px-Flag_of_Uruguay.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/f/fe/Flag_of_Uruguay.svg/40px-Flag_of_Uruguay.svg.png 2x\" width=\"20\"/></a></span> <a href=\"/wiki/Edinson_Cavani\" title=\"Edinson Cavani\">Edinson Cavani</a></td>\n", "</tr>\n", "<tr>\n", "<th scope=\"row\" style=\"width:10.5em;\">Meilleur(s) buteur(s)</th>\n", "<td>\n", "<span class=\"flagicon\"><a class=\"image\" href=\"/wiki/Fichier:Flag_of_Uruguay.svg\" title=\"Drapeau : Uruguay\"><img alt=\"\" class=\"noviewer thumbborder\" data-file-height=\"630\" data-file-width=\"945\" decoding=\"async\" height=\"13\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/f/fe/Flag_of_Uruguay.svg/20px-Flag_of_Uruguay.svg.png\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/f/fe/Flag_of_Uruguay.svg/30px-Flag_of_Uruguay.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/f/fe/Flag_of_Uruguay.svg/40px-Flag_of_Uruguay.svg.png 2x\" width=\"20\"/></a></span> <a href=\"/wiki/Edinson_Cavani\" title=\"Edinson Cavani\">Edinson Cavani</a> (35)</td>\n", "</tr>\n", "<tr>\n", "<th scope=\"row\" style=\"width:10.5em;\">Meilleur(s) passeur(s)</th>\n", "<td>\n", "<span class=\"flagicon\"><a class=\"image\" href=\"/wiki/Fichier:Flag_of_France.svg\" title=\"Drapeau : France\"><img alt=\"\" class=\"noviewer thumbborder\" data-file-height=\"600\" data-file-width=\"900\" decoding=\"async\" height=\"13\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/20px-Flag_of_France.svg.png\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/30px-Flag_of_France.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/40px-Flag_of_France.svg.png 2x\" width=\"20\"/></a></span> <a href=\"/wiki/Morgan_Sanson\" title=\"Morgan Sanson\">Morgan Sanson</a> (12)</td>\n", "</tr>\n", "<tr>\n", "<th scope=\"row\" style=\"width:10.5em;\">Barragiste(s)</th>\n", "<td>\n", "<a class=\"mw-redirect\" href=\"/wiki/FC_Lorient\" title=\"FC Lorient\">FC Lorient</a></td>\n", "</tr></tbody></table>\n"]}], "source": ["print(\" Le 2eme tableau de la page : Hi\u00e9rarchie \\n\", page.findAll(\"table\")[1])\n", "print(\"--------------------------------------------------------\")\n", "print(\"Le 3eme tableau de la page : Palmar\u00e8s \\n\",page.findAll(\"table\")[2])"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Exercice guid\u00e9 : obtenir la liste des \u00e9quipes de Ligue 1\n", "\n", "La liste des \u00e9quipes est dans le tableau ``\"Participants\"`` : dans le code source, on voit que ce tableau est celui qui a ``class=\"DebutCarte\"``. On voit \u00e9galement que les balises qui encerclent les noms et les urls des clubs sont de la forme suivante"]}, {"cell_type": "markdown", "metadata": {}, "source": ["```\n", "<a href=\"url_club\" title=\"nom_club\"> Nom du club </a>\n", "```"]}, {"cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["<a class=\"image\" href=\"/wiki/Fichier:France_location_map-Regions-2016.svg\"><img alt=\"France location map-Regions-2016.svg\" data-file-height=\"1922\" data-file-width=\"2000\" decoding=\"async\" height=\"288\" src=\"//upload.wikimedia.org/wikipedia/commons/thumb/b/b1/France_location_map-Regions-2016.svg/300px-France_location_map-Regions-2016.svg.png\" srcset=\"//upload.wikimedia.org/wikipedia/commons/thumb/b/b1/France_location_map-Regions-2016.svg/450px-France_location_map-Regions-2016.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/b/b1/France_location_map-Regions-2016.svg/600px-France_location_map-Regions-2016.svg.png 2x\" width=\"300\"/></a> \n", "-------\n", "<a href=\"/wiki/Paris_Saint-Germain_Football_Club\" title=\"Paris Saint-Germain Football Club\">Paris SG</a> \n", "-------\n", "<a href=\"/wiki/Association_sportive_de_Monaco_Football_Club\" title=\"Association sportive de Monaco Football Club\">AS Monaco</a> \n", "-------\n", "<a href=\"/wiki/Olympique_lyonnais\" title=\"Olympique lyonnais\">Olympique lyonnais</a> \n", "-------\n", "<a href=\"/wiki/Stade_rennais_Football_Club\" title=\"Stade rennais Football Club\">Stade rennais FC</a> \n", "-------\n"]}], "source": ["for item in page.find('table', {'class' : 'DebutCarte'}).findAll({'a'})[0:5] : \n", "    print(item, \"\\n-------\")"]}, {"cell_type": "markdown", "metadata": {}, "source": ["On n'a pas envie de prendre le premier \u00e9l\u00e9ment qui ne correspond pas \u00e0 un club mais \u00e0 une image.\n", "Or cet \u00e9l\u00e9ment est le seul qui n'ait pas de ``title=\"\"``. Il est conseill\u00e9 d'exclure les \u00e9lements qui ne nous int\u00e9ressent pas en indiquant les \u00e9l\u00e9ments que la ligne doit avoir au lieu de les exclure en fonction de leur place dans la liste."]}, {"cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["<a href=\"/wiki/Paris_Saint-Germain_Football_Club\" title=\"Paris Saint-Germain Football Club\">Paris SG</a>\n", "<a href=\"/wiki/Association_sportive_de_Monaco_Football_Club\" title=\"Association sportive de Monaco Football Club\">AS Monaco</a>\n", "<a href=\"/wiki/Olympique_lyonnais\" title=\"Olympique lyonnais\">Olympique lyonnais</a>\n", "<a href=\"/wiki/Stade_rennais_Football_Club\" title=\"Stade rennais Football Club\">Stade rennais FC</a>\n"]}], "source": ["### condition sur la place dans la liste >>>> MAUVAIS\n", "for e, item in enumerate(page.find('table', {'class' : 'DebutCarte'}).findAll({'a'})[0:5]) : \n", "    if  e == 0: \n", "        pass\n", "    else : \n", "        print(item)"]}, {"cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["<a href=\"/wiki/Paris_Saint-Germain_Football_Club\" title=\"Paris Saint-Germain Football Club\">Paris SG</a>\n", "<a href=\"/wiki/Association_sportive_de_Monaco_Football_Club\" title=\"Association sportive de Monaco Football Club\">AS Monaco</a>\n", "<a href=\"/wiki/Olympique_lyonnais\" title=\"Olympique lyonnais\">Olympique lyonnais</a>\n", "<a href=\"/wiki/Stade_rennais_Football_Club\" title=\"Stade rennais Football Club\">Stade rennais FC</a>\n"]}], "source": ["#### condition sur les \u00e9l\u00e9ments que doit avoir la ligne >>>> BIEN \n", "for item in page.find('table', {'class' : 'DebutCarte'}).findAll({'a'})[0:5] : \n", "    if item.get(\"title\") :\n", "        print(item)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Enfin la derni\u00e8re \u00e9tape, consiste \u00e0 obtenir les informations souhait\u00e9es, c'est \u00e0 dire dans notre cas, le nom et l'url des 20 clubs. Pour cela, nous allons utiliser deux m\u00e9thodes de l'\u00e9lement item :\n", "\n", "- ``getText()`` qui permet d'obtenir le texte qui est sur la page web et dans la balise  ``<a>``\n", "- ``get('xxxx')`` qui permet d'obtenir l'\u00e9l\u00e9ment qui est \u00e9gal \u00e0 ``xxxx``\n", "\n", "Dans notre cas, nous allons vouloir le nom du club ainsi que l'url : on va donc utiliser ``__getText__`` et ``__get(\"href\")__``."]}, {"cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["/wiki/Paris_Saint-Germain_Football_Club\n", "Paris SG\n", "/wiki/Association_sportive_de_Monaco_Football_Club\n", "AS Monaco\n", "/wiki/Olympique_lyonnais\n", "Olympique lyonnais\n", "/wiki/Stade_rennais_Football_Club\n", "Stade rennais FC\n"]}], "source": ["for item in page.find('table', {'class' : 'DebutCarte'}).findAll({'a'})[0:5] : \n", "    if item.get(\"title\") :\n", "        print(item.get(\"href\"))\n", "        print(item.getText())"]}, {"cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["Paris Saint-Germain Football Club\n", "Association sportive de Monaco Football Club\n", "Olympique lyonnais\n", "Stade rennais Football Club\n"]}], "source": ["# pour avoir le nom officiel, on aurait utiliser l'\u00e9l\u00e9ment <title>\n", "for item in page.find('table', {'class' : 'DebutCarte'}).findAll({'a'})[0:5] : \n", "    if item.get(\"title\") :\n", "        print(item.get(\"title\"))"]}, {"cell_type": "markdown", "metadata": {"collapsed": true}, "source": ["Toutes ces informations, on souhaite les conserver dans un tableau *Excel* pour pouvoir les r\u00e9uitiliser \u00e0 l'envie : pour cela, rien de plus simple, on va passer par pandas, parce qu'on le maitrise parfaitement \u00e0 ce stade de la formation."]}, {"cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": ["import pandas\n", "\n", "liste_noms = []\n", "liste_urls = []\n", "\n", "for item in page.find('table', {'class' : 'DebutCarte'}).findAll({'a'}) : \n", "    if item.get(\"title\") :\n", "        liste_urls.append(item.get(\"href\"))\n", "        liste_noms.append(item.getText())\n", "        \n", "df = pandas.DataFrame.from_dict( {\"clubs\" : liste_noms, 'url' : liste_urls})"]}, {"cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [{"data": {"text/html": ["<div>\n", "<style scoped>\n", "    .dataframe tbody tr th:only-of-type {\n", "        vertical-align: middle;\n", "    }\n", "\n", "    .dataframe tbody tr th {\n", "        vertical-align: top;\n", "    }\n", "\n", "    .dataframe thead th {\n", "        text-align: right;\n", "    }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", "  <thead>\n", "    <tr style=\"text-align: right;\">\n", "      <th></th>\n", "      <th>clubs</th>\n", "      <th>url</th>\n", "    </tr>\n", "  </thead>\n", "  <tbody>\n", "    <tr>\n", "      <th>0</th>\n", "      <td>Paris SG</td>\n", "      <td>/wiki/Paris_Saint-Germain_Football_Club</td>\n", "    </tr>\n", "    <tr>\n", "      <th>1</th>\n", "      <td>AS Monaco</td>\n", "      <td>/wiki/Association_sportive_de_Monaco_Football_...</td>\n", "    </tr>\n", "    <tr>\n", "      <th>2</th>\n", "      <td>Olympique lyonnais</td>\n", "      <td>/wiki/Olympique_lyonnais</td>\n", "    </tr>\n", "    <tr>\n", "      <th>3</th>\n", "      <td>Stade rennais FC</td>\n", "      <td>/wiki/Stade_rennais_Football_Club</td>\n", "    </tr>\n", "    <tr>\n", "      <th>4</th>\n", "      <td>OGC Nice</td>\n", "      <td>/wiki/Olympique_gymnaste_club_Nice_C%C3%B4te_d...</td>\n", "    </tr>\n", "  </tbody>\n", "</table>\n", "</div>"], "text/plain": ["                clubs                                                url\n", "0            Paris SG            /wiki/Paris_Saint-Germain_Football_Club\n", "1           AS Monaco  /wiki/Association_sportive_de_Monaco_Football_...\n", "2  Olympique lyonnais                           /wiki/Olympique_lyonnais\n", "3    Stade rennais FC                  /wiki/Stade_rennais_Football_Club\n", "4            OGC Nice  /wiki/Olympique_gymnaste_club_Nice_C%C3%B4te_d..."]}, "execution_count": 16, "metadata": {}, "output_type": "execute_result"}], "source": ["df.head()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Exercice de web scraping avec BeautifulSoup\n", "\n", "Pour cet exercice, nous vous demandons d'obtenir 1) les informations personnelles des 721 pokemons sur le site internet [pokemondb.net](http://pokemondb.net/pokedex/national). Les informations que nous aimerions obtenir au final pour les pokemons sont celles contenues dans 4 tableaux :\n", "\n", "- Pok\u00e9dex data\n", "- Training\n", "- Breeding\n", "- Base stats\n", "\n", "Pour exemple : [Pokemon Database](http://pokemondb.net/pokedex/nincada).\n", "\n", "2) Nous aimerions que vous r\u00e9cup\u00e9riez \u00e9galement les images de chacun des pok\u00e9mons et que vous les enregistriez dans un dossier  (indice : utilisez les modules request et [shutil](https://docs.python.org/3/library/shutil.html))\n", "_pour cette question ci, il faut que vous cherchiez de vous m\u00eame certains \u00e9l\u00e9ments, tout n'est pas pr\u00e9sent dans le TD_."]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Aller sur internet avec Selenium\n", "\n", "L'avantage du package [Selenium](https://pypi.python.org/pypi/selenium) est d'obtenir des informations du site qui ne sont pas dans le code html mais qui apparaissent uniquement \u00e0 la suite de l'ex\u00e9cution de script javascript en arri\u00e8re plan. [Selenium](https://pypi.python.org/pypi/selenium) se comporte comme un utilisateur qui surfe sur internet : il clique sur des liens, il remplit des formulaires etc. Dans cet exemple, nous allons essayer de aller sur le site de [Bing Actualit\u00e9s](https://www.bing.com/news) et entrer dans la barre de recherche un sujet donn\u00e9. La version de [chromedriver](https://sites.google.com/a/chromium.org/chromedriver/) doit \u00eatre ``>= 2.36``."]}, {"cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": ["# Si selenium n'est pas install\u00e9.\n", "# !pip install selenium"]}, {"cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": ["import selenium #pip install selenium\n", "# t\u00e9l\u00e9charger le chrome driver https://chromedriver.storage.googleapis.com/index.html?path=74.0.3729.6/\n", "path_to_web_driver = \"chromedriver\""]}, {"cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": ["import os, sys\n", "from pyquickhelper.filehelper import download, unzip_files\n", "version = \"73.0.3683.68\"\n", "url = \"https://chromedriver.storage.googleapis.com/%s/\" % version\n", "\n", "if \"win\" in sys.platform:\n", "    if not os.path.exists(\"chromedriver_win32.zip\"):\n", "        d = download(url + \"chromedriver_win32.zip\")\n", "    if not os.path.exists(\"chromedriver.exe\"):\n", "        unzip_files(\"chromedriver_win32.zip\", where_to=\".\")\n", "elif sys.platform.startswith(\"linux\"):\n", "    if not os.path.exists(\"chromedriver_linux64.zip\"):\n", "        d = download(url + \"chromedriver_linux64.zip\")\n", "    if not os.path.exists(\"chromedriver\"):\n", "        unzip_files(\"chromedriver_linux64.zip\", where_to=\".\")\n", "elif sys.platform.startswith(\"darwin\"):\n", "    if not os.path.exists(\"chromedriver_mac64.zip\"):\n", "        d = download(url + \"chromedriver_mac64.zip\")\n", "    if not os.path.exists(\"chromedriver\"):\n", "        unzip_files(\"chromedriver_mac64.zip\", where_to=\".\")        "]}, {"cell_type": "markdown", "metadata": {}, "source": ["On soumet la requ\u00eate."]}, {"cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [{"ename": "SessionNotCreatedException", "evalue": "Message: session not created: Missing or invalid capabilities\n  (Driver info: chromedriver=73.0.3683.68 (47787ec04b6e38e22703e856e101e840b65afe72),platform=Windows NT 10.0.19044 x86_64)\n", "output_type": "error", "traceback": ["\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mSessionNotCreatedException\u001b[0m                Traceback (most recent call last)", "Cell \u001b[1;32mIn [20], line 11\u001b[0m\n\u001b[0;32m      8\u001b[0m chrome_options\u001b[38;5;241m.\u001b[39madd_argument(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m--no-sandbox\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[0;32m      9\u001b[0m chrome_options\u001b[38;5;241m.\u001b[39madd_argument(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m--verbose\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[1;32m---> 11\u001b[0m browser \u001b[38;5;241m=\u001b[39m \u001b[43mwebdriver\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mChrome\u001b[49m\u001b[43m(\u001b[49m\u001b[43mexecutable_path\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpath_to_web_driver\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     12\u001b[0m \u001b[43m                           \u001b[49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mchrome_options\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m     14\u001b[0m browser\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mhttps://www.bing.com/news\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[0;32m     16\u001b[0m \u001b[38;5;66;03m# on cherche l'endroit o\u00f9 on peut remplir un formulaire\u001b[39;00m\n\u001b[0;32m     17\u001b[0m \u001b[38;5;66;03m# en utilisant les outils du navigateur > inspecter les \u00e9l\u00e9ments de la page\u001b[39;00m\n\u001b[0;32m     18\u001b[0m \u001b[38;5;66;03m# on voit que la barre de recherche est un \u00e9lement du code appel\u00e9 'q' comme query\u001b[39;00m\n\u001b[0;32m     19\u001b[0m \u001b[38;5;66;03m# on lui demande de chercher cet \u00e9l\u00e9ment\u001b[39;00m\n", "File \u001b[1;32mC:\\Python3105_x64\\lib\\site-packages\\selenium\\webdriver\\chrome\\webdriver.py:69\u001b[0m, in \u001b[0;36mWebDriver.__init__\u001b[1;34m(self, executable_path, port, options, service_args, desired_capabilities, service_log_path, chrome_options, service, keep_alive)\u001b[0m\n\u001b[0;32m     66\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m service:\n\u001b[0;32m     67\u001b[0m     service \u001b[38;5;241m=\u001b[39m Service(executable_path, port, service_args, service_log_path)\n\u001b[1;32m---> 69\u001b[0m \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[38;5;21;43m__init__\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mDesiredCapabilities\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mCHROME\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mbrowserName\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mgoog\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[0;32m     70\u001b[0m \u001b[43m                 \u001b[49m\u001b[43mport\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43moptions\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     71\u001b[0m \u001b[43m                 \u001b[49m\u001b[43mservice_args\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdesired_capabilities\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     72\u001b[0m \u001b[43m                 \u001b[49m\u001b[43mservice_log_path\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mservice\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mkeep_alive\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[1;32mC:\\Python3105_x64\\lib\\site-packages\\selenium\\webdriver\\chromium\\webdriver.py:92\u001b[0m, in \u001b[0;36mChromiumDriver.__init__\u001b[1;34m(self, browser_name, vendor_prefix, port, options, service_args, desired_capabilities, service_log_path, service, keep_alive)\u001b[0m\n\u001b[0;32m     89\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mservice\u001b[38;5;241m.\u001b[39mstart()\n\u001b[0;32m     91\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m---> 92\u001b[0m     \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[38;5;21;43m__init__\u001b[39;49m\u001b[43m(\u001b[49m\n\u001b[0;32m     93\u001b[0m \u001b[43m        \u001b[49m\u001b[43mcommand_executor\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mChromiumRemoteConnection\u001b[49m\u001b[43m(\u001b[49m\n\u001b[0;32m     94\u001b[0m \u001b[43m            \u001b[49m\u001b[43mremote_server_addr\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mservice\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mservice_url\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     95\u001b[0m \u001b[43m            \u001b[49m\u001b[43mbrowser_name\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mbrowser_name\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mvendor_prefix\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mvendor_prefix\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     96\u001b[0m \u001b[43m            \u001b[49m\u001b[43mkeep_alive\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mkeep_alive\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mignore_proxy\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m_ignore_proxy\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     97\u001b[0m \u001b[43m        \u001b[49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43moptions\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m     98\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m:\n\u001b[0;32m     99\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mquit()\n", "File \u001b[1;32mC:\\Python3105_x64\\lib\\site-packages\\selenium\\webdriver\\remote\\webdriver.py:272\u001b[0m, in \u001b[0;36mWebDriver.__init__\u001b[1;34m(self, command_executor, desired_capabilities, browser_profile, proxy, keep_alive, file_detector, options)\u001b[0m\n\u001b[0;32m    270\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_authenticator_id \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[0;32m    271\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mstart_client()\n\u001b[1;32m--> 272\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mstart_session\u001b[49m\u001b[43m(\u001b[49m\u001b[43mcapabilities\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mbrowser_profile\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[1;32mC:\\Python3105_x64\\lib\\site-packages\\selenium\\webdriver\\remote\\webdriver.py:364\u001b[0m, in \u001b[0;36mWebDriver.start_session\u001b[1;34m(self, capabilities, browser_profile)\u001b[0m\n\u001b[0;32m    362\u001b[0m w3c_caps \u001b[38;5;241m=\u001b[39m _make_w3c_caps(capabilities)\n\u001b[0;32m    363\u001b[0m parameters \u001b[38;5;241m=\u001b[39m {\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcapabilities\u001b[39m\u001b[38;5;124m\"\u001b[39m: w3c_caps}\n\u001b[1;32m--> 364\u001b[0m response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mexecute\u001b[49m\u001b[43m(\u001b[49m\u001b[43mCommand\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mNEW_SESSION\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mparameters\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m    365\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124msessionId\u001b[39m\u001b[38;5;124m'\u001b[39m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m response:\n\u001b[0;32m    366\u001b[0m     response \u001b[38;5;241m=\u001b[39m response[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mvalue\u001b[39m\u001b[38;5;124m'\u001b[39m]\n", "File \u001b[1;32mC:\\Python3105_x64\\lib\\site-packages\\selenium\\webdriver\\remote\\webdriver.py:429\u001b[0m, in \u001b[0;36mWebDriver.execute\u001b[1;34m(self, driver_command, params)\u001b[0m\n\u001b[0;32m    427\u001b[0m response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mcommand_executor\u001b[38;5;241m.\u001b[39mexecute(driver_command, params)\n\u001b[0;32m    428\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m response:\n\u001b[1;32m--> 429\u001b[0m     \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43merror_handler\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcheck_response\u001b[49m\u001b[43m(\u001b[49m\u001b[43mresponse\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m    430\u001b[0m     response[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mvalue\u001b[39m\u001b[38;5;124m'\u001b[39m] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_unwrap_value(\n\u001b[0;32m    431\u001b[0m         response\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mvalue\u001b[39m\u001b[38;5;124m'\u001b[39m, \u001b[38;5;28;01mNone\u001b[39;00m))\n\u001b[0;32m    432\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m response\n", "File \u001b[1;32mC:\\Python3105_x64\\lib\\site-packages\\selenium\\webdriver\\remote\\errorhandler.py:243\u001b[0m, in \u001b[0;36mErrorHandler.check_response\u001b[1;34m(self, response)\u001b[0m\n\u001b[0;32m    241\u001b[0m         alert_text \u001b[38;5;241m=\u001b[39m value[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124malert\u001b[39m\u001b[38;5;124m'\u001b[39m]\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mtext\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[0;32m    242\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m exception_class(message, screen, stacktrace, alert_text)  \u001b[38;5;66;03m# type: ignore[call-arg]  # mypy is not smart enough here\u001b[39;00m\n\u001b[1;32m--> 243\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m exception_class(message, screen, stacktrace)\n", "\u001b[1;31mSessionNotCreatedException\u001b[0m: Message: session not created: Missing or invalid capabilities\n  (Driver info: chromedriver=73.0.3683.68 (47787ec04b6e38e22703e856e101e840b65afe72),platform=Windows NT 10.0.19044 x86_64)\n"]}], "source": ["import time\n", "\n", "from selenium import webdriver\n", "from selenium.webdriver.common.keys import Keys\n", "\n", "chrome_options = webdriver.ChromeOptions()\n", "chrome_options.add_argument('--headless')\n", "chrome_options.add_argument('--no-sandbox')\n", "chrome_options.add_argument('--verbose')\n", "\n", "browser = webdriver.Chrome(executable_path=path_to_web_driver,\n", "                           options=chrome_options)\n", "\n", "browser.get('https://www.bing.com/news')\n", "\n", "# on cherche l'endroit o\u00f9 on peut remplir un formulaire\n", "# en utilisant les outils du navigateur > inspecter les \u00e9l\u00e9ments de la page\n", "# on voit que la barre de recherche est un \u00e9lement du code appel\u00e9 'q' comme query\n", "# on lui demande de chercher cet \u00e9l\u00e9ment\n", "search = browser.find_element_by_name('q')\n", "print(search)\n", "print([search.text, search.tag_name, search.id])\n", "\n", "# on envoie \u00e0 cet endroit le mot qu'on aurait tap\u00e9 dans la barre de recherche\n", "search.send_keys(\"alstom\")\n", "\n", "search_button = browser.find_element_by_xpath(\"//input[@id='sb_form_go']\")\n", "\n", "#search_button = browser.find_element_by_id('search_button_homepage')\n", " \n", "search_button.click()\n", "\n", "# on appuie sur le bouton \"Entr\u00e9e\" Return en anglais\n", "#search.send_keys(Keys.RETURN)"]}, {"cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": ["png = browser.get_screenshot_as_png()"]}, {"cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": ["from IPython.display import Image\n", "Image(png, width='500')"]}, {"cell_type": "markdown", "metadata": {}, "source": ["On extrait les r\u00e9sultats."]}, {"cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": ["from selenium.common.exceptions import StaleElementReferenceException\n", "links = browser.find_elements_by_xpath(\"//div/a[@class='title'][@href]\")\n", "\n", "results = []\n", "for link in links:\n", "    try:\n", "        url = link.get_attribute('href')\n", "    except StaleElementReferenceException as e:\n", "        print(\"Issue with '{0}' and '{1}'\".format(url, link))\n", "        print(\"It might be due to slow javascript which produces the HTML page.\")\n", "    results.append(url)\n", "\n", "len(results)"]}, {"cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": ["# on a une pause de 10 secondes pour aller voir ce qui se passe sur la page internet\n", "# on demande de quitter le navigateur quand tout est fini\n", "browser.quit()"]}, {"cell_type": "code", "execution_count": 24, "metadata": {"scrolled": false}, "outputs": [], "source": ["print(results)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Obtenir des informations entre deux dates sur Google News"]}, {"cell_type": "markdown", "metadata": {}, "source": ["En r\u00e9alit\u00e9, l'exemple de Google News aurait pu se passer de Selenium et \u00eatre utilis\u00e9 directement avec BeautifulSoup et les url qu'on r\u00e9ussit \u00e0 deviner de Google. \n", "\n", "Ici, on utilise l'url de Google News pour cr\u00e9er une petite fonction qui donne pour chaque ensemble de (sujet, debut d'une p\u00e9riode, fin d'une p\u00e9riode) des liens pertinents issus de la recherche Google."]}, {"cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": ["import time\n", "from selenium import webdriver\n", "\n", "\n", "def get_news_specific_dates (beg_date, end_date, subject, hl=\"fr\",\n", "                             gl=\"fr\", tbm=\"nws\", authuser=\"0\") :\n", "    '''\n", "    Permet d obtenir pour une requete donn\u00e9e et un intervalle temporel\n", "    pr\u00e9cis les 10 premiers r\u00e9sultats \n", "    d articles de presse parus sur le sujet\n", "    '''\n", "    get_string = 'https://www.google.com/search?hl={}&gl={}&tbm={}&authuser={}&q={}&tbs=cdr%3A1%2Ccd_min%3A{}%2Ccd_max%3A{}&tbm={}'.format(\n", "                                    hl, gl, tbm, authuser, subject, beg_date, end_date,tbm)\n", "    print(get_string)\n", "    browser.get(get_string)\n", "    \n", "    # La class peut changer si Google met \u00e0 jour le style de sa page.\n", "    # Cela arrive r\u00e9guli\u00e8rement. Dans ce cas, il faut utiliser des\n", "    # outils de d\u00e9buggage web (Chrome - Outils de d\u00e9veloppement)\n", "    # links = browser.find_elements_by_xpath(\"//h3[@class='r dO0Ag']/a[@href]\")\n", "    links = browser.find_elements_by_xpath(\"//h3/a[@href]\")\n", "    print(len(links))\n", "\n", "    results = []\n", "    for link in links:\n", "        url = link.get_attribute('href')\n", "        results.append(url)\n", "    browser.quit()    \n", "    return results"]}, {"cell_type": "markdown", "metadata": {}, "source": ["On appelle la fonction cr\u00e9\u00e9e \u00e0 l'instant."]}, {"cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [{"ename": "SessionNotCreatedException", "evalue": "Message: session not created: Missing or invalid capabilities\n  (Driver info: chromedriver=73.0.3683.68 (47787ec04b6e38e22703e856e101e840b65afe72),platform=Windows NT 10.0.19044 x86_64)\n", "output_type": "error", "traceback": ["\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mSessionNotCreatedException\u001b[0m                Traceback (most recent call last)", "Cell \u001b[1;32mIn [22], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m browser \u001b[38;5;241m=\u001b[39m \u001b[43mwebdriver\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mChrome\u001b[49m\u001b[43m(\u001b[49m\u001b[43mexecutable_path\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpath_to_web_driver\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m      2\u001b[0m \u001b[43m                           \u001b[49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mchrome_options\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m      3\u001b[0m articles \u001b[38;5;241m=\u001b[39m get_news_specific_dates(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m3/15/2018\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m3/31/2018\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124malstom\u001b[39m\u001b[38;5;124m\"\u001b[39m, hl\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfr\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n", "File \u001b[1;32mC:\\Python3105_x64\\lib\\site-packages\\selenium\\webdriver\\chrome\\webdriver.py:69\u001b[0m, in \u001b[0;36mWebDriver.__init__\u001b[1;34m(self, executable_path, port, options, service_args, desired_capabilities, service_log_path, chrome_options, service, keep_alive)\u001b[0m\n\u001b[0;32m     66\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m service:\n\u001b[0;32m     67\u001b[0m     service \u001b[38;5;241m=\u001b[39m Service(executable_path, port, service_args, service_log_path)\n\u001b[1;32m---> 69\u001b[0m \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[38;5;21;43m__init__\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mDesiredCapabilities\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mCHROME\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mbrowserName\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mgoog\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[0;32m     70\u001b[0m \u001b[43m                 \u001b[49m\u001b[43mport\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43moptions\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     71\u001b[0m \u001b[43m                 \u001b[49m\u001b[43mservice_args\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdesired_capabilities\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     72\u001b[0m \u001b[43m                 \u001b[49m\u001b[43mservice_log_path\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mservice\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mkeep_alive\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[1;32mC:\\Python3105_x64\\lib\\site-packages\\selenium\\webdriver\\chromium\\webdriver.py:92\u001b[0m, in \u001b[0;36mChromiumDriver.__init__\u001b[1;34m(self, browser_name, vendor_prefix, port, options, service_args, desired_capabilities, service_log_path, service, keep_alive)\u001b[0m\n\u001b[0;32m     89\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mservice\u001b[38;5;241m.\u001b[39mstart()\n\u001b[0;32m     91\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m---> 92\u001b[0m     \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[38;5;21;43m__init__\u001b[39;49m\u001b[43m(\u001b[49m\n\u001b[0;32m     93\u001b[0m \u001b[43m        \u001b[49m\u001b[43mcommand_executor\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mChromiumRemoteConnection\u001b[49m\u001b[43m(\u001b[49m\n\u001b[0;32m     94\u001b[0m \u001b[43m            \u001b[49m\u001b[43mremote_server_addr\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mservice\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mservice_url\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     95\u001b[0m \u001b[43m            \u001b[49m\u001b[43mbrowser_name\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mbrowser_name\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mvendor_prefix\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mvendor_prefix\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     96\u001b[0m \u001b[43m            \u001b[49m\u001b[43mkeep_alive\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mkeep_alive\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mignore_proxy\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m_ignore_proxy\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     97\u001b[0m \u001b[43m        \u001b[49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43moptions\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m     98\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m:\n\u001b[0;32m     99\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mquit()\n", "File \u001b[1;32mC:\\Python3105_x64\\lib\\site-packages\\selenium\\webdriver\\remote\\webdriver.py:272\u001b[0m, in \u001b[0;36mWebDriver.__init__\u001b[1;34m(self, command_executor, desired_capabilities, browser_profile, proxy, keep_alive, file_detector, options)\u001b[0m\n\u001b[0;32m    270\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_authenticator_id \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[0;32m    271\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mstart_client()\n\u001b[1;32m--> 272\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mstart_session\u001b[49m\u001b[43m(\u001b[49m\u001b[43mcapabilities\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mbrowser_profile\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[1;32mC:\\Python3105_x64\\lib\\site-packages\\selenium\\webdriver\\remote\\webdriver.py:364\u001b[0m, in \u001b[0;36mWebDriver.start_session\u001b[1;34m(self, capabilities, browser_profile)\u001b[0m\n\u001b[0;32m    362\u001b[0m w3c_caps \u001b[38;5;241m=\u001b[39m _make_w3c_caps(capabilities)\n\u001b[0;32m    363\u001b[0m parameters \u001b[38;5;241m=\u001b[39m {\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcapabilities\u001b[39m\u001b[38;5;124m\"\u001b[39m: w3c_caps}\n\u001b[1;32m--> 364\u001b[0m response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mexecute\u001b[49m\u001b[43m(\u001b[49m\u001b[43mCommand\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mNEW_SESSION\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mparameters\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m    365\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124msessionId\u001b[39m\u001b[38;5;124m'\u001b[39m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m response:\n\u001b[0;32m    366\u001b[0m     response \u001b[38;5;241m=\u001b[39m response[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mvalue\u001b[39m\u001b[38;5;124m'\u001b[39m]\n", "File \u001b[1;32mC:\\Python3105_x64\\lib\\site-packages\\selenium\\webdriver\\remote\\webdriver.py:429\u001b[0m, in \u001b[0;36mWebDriver.execute\u001b[1;34m(self, driver_command, params)\u001b[0m\n\u001b[0;32m    427\u001b[0m response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mcommand_executor\u001b[38;5;241m.\u001b[39mexecute(driver_command, params)\n\u001b[0;32m    428\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m response:\n\u001b[1;32m--> 429\u001b[0m     \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43merror_handler\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcheck_response\u001b[49m\u001b[43m(\u001b[49m\u001b[43mresponse\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m    430\u001b[0m     response[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mvalue\u001b[39m\u001b[38;5;124m'\u001b[39m] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_unwrap_value(\n\u001b[0;32m    431\u001b[0m         response\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mvalue\u001b[39m\u001b[38;5;124m'\u001b[39m, \u001b[38;5;28;01mNone\u001b[39;00m))\n\u001b[0;32m    432\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m response\n", "File \u001b[1;32mC:\\Python3105_x64\\lib\\site-packages\\selenium\\webdriver\\remote\\errorhandler.py:243\u001b[0m, in \u001b[0;36mErrorHandler.check_response\u001b[1;34m(self, response)\u001b[0m\n\u001b[0;32m    241\u001b[0m         alert_text \u001b[38;5;241m=\u001b[39m value[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124malert\u001b[39m\u001b[38;5;124m'\u001b[39m]\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mtext\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[0;32m    242\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m exception_class(message, screen, stacktrace, alert_text)  \u001b[38;5;66;03m# type: ignore[call-arg]  # mypy is not smart enough here\u001b[39;00m\n\u001b[1;32m--> 243\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m exception_class(message, screen, stacktrace)\n", "\u001b[1;31mSessionNotCreatedException\u001b[0m: Message: session not created: Missing or invalid capabilities\n  (Driver info: chromedriver=73.0.3683.68 (47787ec04b6e38e22703e856e101e840b65afe72),platform=Windows NT 10.0.19044 x86_64)\n"]}], "source": ["browser = webdriver.Chrome(executable_path=path_to_web_driver,\n", "                           options=chrome_options)\n", "articles = get_news_specific_dates(\"3/15/2018\", \"3/31/2018\", \"alstom\", hl=\"fr\")"]}, {"cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": ["print(articles)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Utiliser selenium pour jouer \u00e0 2048\n", "\n", "Dans cet exemple, on utilise le module pour que python appuie lui m\u00eame sur les touches du clavier afin de jouer \u00e0 2048.\n", "\n", "Note : ce bout de code ne donne pas une solution \u00e0 2048, il permet juste de voir ce qu'on peut faire avec selenium"]}, {"cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [{"ename": "SessionNotCreatedException", "evalue": "Message: session not created: Missing or invalid capabilities\n  (Driver info: chromedriver=73.0.3683.68 (47787ec04b6e38e22703e856e101e840b65afe72),platform=Windows NT 10.0.19044 x86_64)\n", "output_type": "error", "traceback": ["\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mSessionNotCreatedException\u001b[0m                Traceback (most recent call last)", "Cell \u001b[1;32mIn [24], line 6\u001b[0m\n\u001b[0;32m      2\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mselenium\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mwebdriver\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mcommon\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mkeys\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m Keys\n\u001b[0;32m      4\u001b[0m \u001b[38;5;66;03m# on ouvre la page internet du jeu 2048\u001b[39;00m\n\u001b[1;32m----> 6\u001b[0m browser \u001b[38;5;241m=\u001b[39m \u001b[43mwebdriver\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mChrome\u001b[49m\u001b[43m(\u001b[49m\u001b[43mexecutable_path\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpath_to_web_driver\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m      7\u001b[0m \u001b[43m                           \u001b[49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mchrome_options\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m      8\u001b[0m browser\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mhttps://gabrielecirulli.github.io/2048/\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[0;32m     10\u001b[0m \u001b[38;5;66;03m# Ce qu'on va faire : une boucle qui r\u00e9p\u00e8te inlassablement la m\u00eame chose : haut / droite / bas / gauche\u001b[39;00m\n\u001b[0;32m     11\u001b[0m \n\u001b[0;32m     12\u001b[0m \u001b[38;5;66;03m# on commence par cliquer sur la page pour que les touches sachent \u001b[39;00m\n", "File \u001b[1;32mC:\\Python3105_x64\\lib\\site-packages\\selenium\\webdriver\\chrome\\webdriver.py:69\u001b[0m, in \u001b[0;36mWebDriver.__init__\u001b[1;34m(self, executable_path, port, options, service_args, desired_capabilities, service_log_path, chrome_options, service, keep_alive)\u001b[0m\n\u001b[0;32m     66\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m service:\n\u001b[0;32m     67\u001b[0m     service \u001b[38;5;241m=\u001b[39m Service(executable_path, port, service_args, service_log_path)\n\u001b[1;32m---> 69\u001b[0m \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[38;5;21;43m__init__\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mDesiredCapabilities\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mCHROME\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mbrowserName\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mgoog\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[0;32m     70\u001b[0m \u001b[43m                 \u001b[49m\u001b[43mport\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43moptions\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     71\u001b[0m \u001b[43m                 \u001b[49m\u001b[43mservice_args\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdesired_capabilities\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     72\u001b[0m \u001b[43m                 \u001b[49m\u001b[43mservice_log_path\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mservice\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mkeep_alive\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[1;32mC:\\Python3105_x64\\lib\\site-packages\\selenium\\webdriver\\chromium\\webdriver.py:92\u001b[0m, in \u001b[0;36mChromiumDriver.__init__\u001b[1;34m(self, browser_name, vendor_prefix, port, options, service_args, desired_capabilities, service_log_path, service, keep_alive)\u001b[0m\n\u001b[0;32m     89\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mservice\u001b[38;5;241m.\u001b[39mstart()\n\u001b[0;32m     91\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m---> 92\u001b[0m     \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[38;5;21;43m__init__\u001b[39;49m\u001b[43m(\u001b[49m\n\u001b[0;32m     93\u001b[0m \u001b[43m        \u001b[49m\u001b[43mcommand_executor\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mChromiumRemoteConnection\u001b[49m\u001b[43m(\u001b[49m\n\u001b[0;32m     94\u001b[0m \u001b[43m            \u001b[49m\u001b[43mremote_server_addr\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mservice\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mservice_url\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     95\u001b[0m \u001b[43m            \u001b[49m\u001b[43mbrowser_name\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mbrowser_name\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mvendor_prefix\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mvendor_prefix\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     96\u001b[0m \u001b[43m            \u001b[49m\u001b[43mkeep_alive\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mkeep_alive\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mignore_proxy\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m_ignore_proxy\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     97\u001b[0m \u001b[43m        \u001b[49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43moptions\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m     98\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m:\n\u001b[0;32m     99\u001b[0m     \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mquit()\n", "File \u001b[1;32mC:\\Python3105_x64\\lib\\site-packages\\selenium\\webdriver\\remote\\webdriver.py:272\u001b[0m, in \u001b[0;36mWebDriver.__init__\u001b[1;34m(self, command_executor, desired_capabilities, browser_profile, proxy, keep_alive, file_detector, options)\u001b[0m\n\u001b[0;32m    270\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_authenticator_id \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[0;32m    271\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mstart_client()\n\u001b[1;32m--> 272\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mstart_session\u001b[49m\u001b[43m(\u001b[49m\u001b[43mcapabilities\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mbrowser_profile\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[1;32mC:\\Python3105_x64\\lib\\site-packages\\selenium\\webdriver\\remote\\webdriver.py:364\u001b[0m, in \u001b[0;36mWebDriver.start_session\u001b[1;34m(self, capabilities, browser_profile)\u001b[0m\n\u001b[0;32m    362\u001b[0m w3c_caps \u001b[38;5;241m=\u001b[39m _make_w3c_caps(capabilities)\n\u001b[0;32m    363\u001b[0m parameters \u001b[38;5;241m=\u001b[39m {\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcapabilities\u001b[39m\u001b[38;5;124m\"\u001b[39m: w3c_caps}\n\u001b[1;32m--> 364\u001b[0m response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mexecute\u001b[49m\u001b[43m(\u001b[49m\u001b[43mCommand\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mNEW_SESSION\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mparameters\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m    365\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124msessionId\u001b[39m\u001b[38;5;124m'\u001b[39m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m response:\n\u001b[0;32m    366\u001b[0m     response \u001b[38;5;241m=\u001b[39m response[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mvalue\u001b[39m\u001b[38;5;124m'\u001b[39m]\n", "File \u001b[1;32mC:\\Python3105_x64\\lib\\site-packages\\selenium\\webdriver\\remote\\webdriver.py:429\u001b[0m, in \u001b[0;36mWebDriver.execute\u001b[1;34m(self, driver_command, params)\u001b[0m\n\u001b[0;32m    427\u001b[0m response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mcommand_executor\u001b[38;5;241m.\u001b[39mexecute(driver_command, params)\n\u001b[0;32m    428\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m response:\n\u001b[1;32m--> 429\u001b[0m     \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43merror_handler\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcheck_response\u001b[49m\u001b[43m(\u001b[49m\u001b[43mresponse\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m    430\u001b[0m     response[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mvalue\u001b[39m\u001b[38;5;124m'\u001b[39m] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_unwrap_value(\n\u001b[0;32m    431\u001b[0m         response\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mvalue\u001b[39m\u001b[38;5;124m'\u001b[39m, \u001b[38;5;28;01mNone\u001b[39;00m))\n\u001b[0;32m    432\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m response\n", "File \u001b[1;32mC:\\Python3105_x64\\lib\\site-packages\\selenium\\webdriver\\remote\\errorhandler.py:243\u001b[0m, in \u001b[0;36mErrorHandler.check_response\u001b[1;34m(self, response)\u001b[0m\n\u001b[0;32m    241\u001b[0m         alert_text \u001b[38;5;241m=\u001b[39m value[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124malert\u001b[39m\u001b[38;5;124m'\u001b[39m]\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mtext\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[0;32m    242\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m exception_class(message, screen, stacktrace, alert_text)  \u001b[38;5;66;03m# type: ignore[call-arg]  # mypy is not smart enough here\u001b[39;00m\n\u001b[1;32m--> 243\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m exception_class(message, screen, stacktrace)\n", "\u001b[1;31mSessionNotCreatedException\u001b[0m: Message: session not created: Missing or invalid capabilities\n  (Driver info: chromedriver=73.0.3683.68 (47787ec04b6e38e22703e856e101e840b65afe72),platform=Windows NT 10.0.19044 x86_64)\n"]}], "source": ["from selenium import webdriver\n", "from selenium.webdriver.common.keys import Keys\n", "\n", "# on ouvre la page internet du jeu 2048\n", "\n", "browser = webdriver.Chrome(executable_path=path_to_web_driver,\n", "                           options=chrome_options)\n", "browser.get('https://gabrielecirulli.github.io/2048/')\n", "\n", "# Ce qu'on va faire : une boucle qui r\u00e9p\u00e8te inlassablement la m\u00eame chose : haut / droite / bas / gauche\n", "\n", "# on commence par cliquer sur la page pour que les touches sachent \n", "browser.find_element_by_class_name('grid-container').click()\n", "grid = browser.find_element_by_tag_name('body')\n", "\n", "# pour savoir quels coups faire \u00e0 quel moment, on cr\u00e9e un dictionnaire\n", "direction = {0: Keys.UP, 1: Keys.RIGHT, 2: Keys.DOWN, 3: Keys.LEFT}\n", "count = 0\n", "\n", "while True:\n", "    try: # on v\u00e9rifie que le bouton \"Try again\" n'est pas l\u00e0 - sinon \u00e7a veut dire que le jeu est fini\n", "        retryButton = browser.find_element_by_link_text('Try again')\n", "        scoreElem = browser.find_element_by_class_name('score-container')\n", "        break\n", "    except:\n", "        #Do nothing.  Game is not over yet\n", "        pass\n", "    # on continue le jeu - on appuie sur la touche suivante pour le coup d'apr\u00e8s\n", "    count += 1\n", "    grid.send_keys(direction[count % 4]) \n", "    time.sleep(0.1)\n", "\n", "print('Score final : {} en {} coups'.format(scoreElem.text, count))    \n", "browser.quit()"]}, {"cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": []}], "metadata": {"anaconda-cloud": {}, "kernelspec": {"display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.5"}}, "nbformat": 4, "nbformat_minor": 2}