module td_1a.discours_politique

Short summary

module ensae_teaching_cs.td_1a.discours_politique

Retrive political speeches from Internet

source on GitHub

Functions

function truncated documentation
enumerate_speeches_from_elysees Enumerates speeches from the Elysees.
force_unicode Deals with unicodes.
get_elysee_speech_from_elysees Retrieves the text from the Elysees.
html_unescape Removes HTML or XML character references and entities from a text string. keep &, >, …
remove_accent Replaces French accents by regular letters.
xmlParsingLongestDiv Extracts the longest div section.

Documentation

Retrive political speeches from Internet

source on GitHub

ensae_teaching_cs.td_1a.discours_politique.enumerate_speeches_from_elysees(url='agenda', skip=0)[source]

Enumerates speeches from the Elysees.

Paramètres:
  • url – subaddress, url source will be 'https://www.elysee.fr/' + url
  • skip – skip the first skip one in the list
Renvoie:

enumerate dictionaries

Récupérer des discours du président de la république

for i, disc in enumerate(enumerate_speeches_from_elysees()):
    print(disc)

Others links can be used such as https://www.elysee.fr/recherche?query=discours. The website changed in 2018 and no longer support xml or json streams.

source on GitHub

ensae_teaching_cs.td_1a.discours_politique.force_unicode(text)[source]

Deals with unicodes.

Paramètres:text – text
Renvoie:text

source on GitHub

ensae_teaching_cs.td_1a.discours_politique.get_elysee_speech_from_elysees(title, url='https://www.elysee.fr/')[source]

Retrieves the text from the Elysees.

Paramètres:
  • title – title of the document
  • url – website
Renvoie:

html page

The function tries something like:

url + title.replace(" ","-")

source on GitHub

ensae_teaching_cs.td_1a.discours_politique.html_unescape(text)[source]

Removes HTML or XML character references and entities from a text string. keep &, >, < in the source code. from Fredrik Lundh

Paramètres:text – text
Renvoie:cleaning text

source on GitHub

ensae_teaching_cs.td_1a.discours_politique.remove_accent(text)[source]

Replaces French accents by regular letters.

Paramètres:text – text
Renvoie:cleaned text

source on GitHub

ensae_teaching_cs.td_1a.discours_politique.xmlParsingLongestDiv(text)[source]

Extracts the longest div section.

Paramètres:text – text of HTML page
Renvoie:text

source on GitHub