module td_1a.discours_politique

Short summary

module ensae_teaching_cs.td_1a.discours_politique

Retrive political speeches from Internet

source on GitHub

Functions

function

truncated documentation

enumerate_speeches_from_elysees

Enumerates speeches from the Elysees.

force_unicode

Deals with unicodes.

get_elysee_speech_from_elysees

Retrieves the text from the Elysees.

html_unescape

Removes HTML or XML character references and entities from a text string. keep &, >, …

remove_accent

Replaces French accents by regular letters.

xmlParsingLongestDiv

Extracts the longest div section.

Documentation

Retrive political speeches from Internet

source on GitHub

ensae_teaching_cs.td_1a.discours_politique.enumerate_speeches_from_elysees(url='agenda', skip=0)[source]

Enumerates speeches from the Elysees.

Paramètres
  • url – subaddress, url source will be 'https://www.elysee.fr/' + url

  • skip – skip the first skip one in the list

Renvoie

enumerate dictionaries

Récupérer des discours du président de la république

for i, disc in enumerate(enumerate_speeches_from_elysees()):
    print(disc)

Others links can be used such as https://www.elysee.fr/recherche?query=discours. The website changed in 2018 and no longer support xml or json streams.

source on GitHub

ensae_teaching_cs.td_1a.discours_politique.force_unicode(text)[source]

Deals with unicodes.

Paramètres

text – text

Renvoie

text

source on GitHub

ensae_teaching_cs.td_1a.discours_politique.get_elysee_speech_from_elysees(title, url='https://www.elysee.fr/')[source]

Retrieves the text from the Elysees.

Paramètres
  • title – title of the document

  • url – website

Renvoie

html page

The function tries something like:

url + title.replace(" ","-")

source on GitHub

ensae_teaching_cs.td_1a.discours_politique.html_unescape(text)[source]

Removes HTML or XML character references and entities from a text string. keep &, >, < in the source code. from Fredrik Lundh

Paramètres

text – text

Renvoie

cleaning text

source on GitHub

ensae_teaching_cs.td_1a.discours_politique.remove_accent(text)[source]

Replaces French accents by regular letters.

Paramètres

text – text

Renvoie

cleaned text

source on GitHub

ensae_teaching_cs.td_1a.discours_politique.xmlParsingLongestDiv(text)[source]

Extracts the longest div section.

Paramètres

text – text of HTML page

Renvoie

text

source on GitHub