module td_1a.discours_politique
#
Short summary#
module ensae_teaching_cs.td_1a.discours_politique
Retrive political speeches from Internet
Functions#
function |
truncated documentation |
---|---|
Enumerates speeches from the Elysees. |
|
Deals with unicodes. |
|
Retrieves the text from the Elysees. |
|
Removes HTML or XML character references and entities from a text string. keep |
|
Replaces French accents by regular letters. |
|
Extracts the longest div section. |
Documentation#
Retrive political speeches from Internet
- ensae_teaching_cs.td_1a.discours_politique.enumerate_speeches_from_elysees(url='agenda', skip=0)#
Enumerates speeches from the Elysees.
- Paramètres:
url – subaddress, url source will be
'https://www.elysee.fr/' + url
skip – skip the first skip one in the list
- Renvoie:
enumerate dictionaries
Récupérer des discours du président de la république
for i, disc in enumerate(enumerate_speeches_from_elysees()): print(disc)
Others links can be used such as
https://www.elysee.fr/recherche?query=discours
. The website changed in 2018 and no longer support xml or json streams.
- ensae_teaching_cs.td_1a.discours_politique.force_unicode(text)#
Deals with unicodes.
- Paramètres:
text – text
- Renvoie:
text
- ensae_teaching_cs.td_1a.discours_politique.get_elysee_speech_from_elysees(title, url='https://www.elysee.fr/')#
Retrieves the text from the Elysees.
- Paramètres:
title – title of the document
url – website
- Renvoie:
html page
The function tries something like:
url + title.replace(" ","-")
- ensae_teaching_cs.td_1a.discours_politique.html_unescape(text)#
Removes HTML or XML character references and entities from a text string. keep
&
,>
,<
in the source code. from Fredrik Lundh- Paramètres:
text – text
- Renvoie:
cleaning text
- ensae_teaching_cs.td_1a.discours_politique.remove_accent(text)#
Replaces French accents by regular letters.
- Paramètres:
text – text
- Renvoie:
cleaned text
- ensae_teaching_cs.td_1a.discours_politique.xmlParsingLongestDiv(text)#
Extracts the longest div section.
- Paramètres:
text – text of HTML page
- Renvoie:
text