module td_1a.discours_politique#

Short summary#

module ensae_teaching_cs.td_1a.discours_politique

Retrive political speeches from Internet

source on GitHub

Functions#

function

truncated documentation

enumerate_speeches_from_elysees

Enumerates speeches from the Elysees.

force_unicode

Deals with unicodes.

get_elysee_speech_from_elysees

Retrieves the text from the Elysees.

html_unescape

Removes HTML or XML character references and entities from a text string. keep &, >, …

remove_accent

Replaces French accents by regular letters.

xmlParsingLongestDiv

Extracts the longest div section.

Documentation#

Retrive political speeches from Internet

source on GitHub

ensae_teaching_cs.td_1a.discours_politique.enumerate_speeches_from_elysees(url='agenda', skip=0)#

Enumerates speeches from the Elysees.

Paramètres:
  • url – subaddress, url source will be 'https://www.elysee.fr/' + url

  • skip – skip the first skip one in the list

Renvoie:

enumerate dictionaries

Récupérer des discours du président de la république

for i, disc in enumerate(enumerate_speeches_from_elysees()):
    print(disc)

Others links can be used such as https://www.elysee.fr/recherche?query=discours. The website changed in 2018 and no longer support xml or json streams.

source on GitHub

ensae_teaching_cs.td_1a.discours_politique.force_unicode(text)#

Deals with unicodes.

Paramètres:

text – text

Renvoie:

text

source on GitHub

ensae_teaching_cs.td_1a.discours_politique.get_elysee_speech_from_elysees(title, url='https://www.elysee.fr/')#

Retrieves the text from the Elysees.

Paramètres:
  • title – title of the document

  • url – website

Renvoie:

html page

The function tries something like:

url + title.replace(" ","-")

source on GitHub

ensae_teaching_cs.td_1a.discours_politique.html_unescape(text)#

Removes HTML or XML character references and entities from a text string. keep &, >, < in the source code. from Fredrik Lundh

Paramètres:

text – text

Renvoie:

cleaning text

source on GitHub

ensae_teaching_cs.td_1a.discours_politique.remove_accent(text)#

Replaces French accents by regular letters.

Paramètres:

text – text

Renvoie:

cleaned text

source on GitHub

ensae_teaching_cs.td_1a.discours_politique.xmlParsingLongestDiv(text)#

Extracts the longest div section.

Paramètres:

text – text of HTML page

Renvoie:

text

source on GitHub