Source de données#

Wikipédia#

mlstatpy.data.wikipedia.download_dump (country, name, folder = “.”, unzip = True, timeout = -1, overwrite = False, fLOG = <function noLOG at 0x7fb8b202bb80>)

Downloads wikipedia dumps from dumps.wikimedia.org/frwiki/latest/.

mlstatpy.data.wikipedia.download_pageviews (dt, folder = “.”, unzip = True, timeout = -1, overwrite = False, fLOG = <function noLOG at 0x7fb8b202bb80>)

Downloads wikipedia pagacount for a precise date (up to the hours), the url follows the pattern:

https://dumps.wikimedia.org/other/pageviews/%Y/%Y-%m/pagecounts-%Y%m%d-%H0000.gz

mlstatpy.data.wikipedia.download_titles (country, folder = “.”, unzip = True, timeout = -1, overwrite = False, fLOG = <function noLOG at 0x7fb8b202bb80>)

mlstatpy.data.wikipedia.enumerate_titles (filename, norm = True, encoding = “utf8”)

Enumerates titles from a file.

mlstatpy.data.wikipedia.download_dump (country, name, folder = “.”, unzip = True, timeout = -1, overwrite = False, fLOG = <function noLOG at 0x7fb8b202bb80>)

Downloads wikipedia dumps from dumps.wikimedia.org/frwiki/latest/.