Source de données#
Wikipédia#
mlstatpy.data.wikipedia.download_dump
(country, name, folder = “.”, unzip = True, timeout = -1, overwrite = False, fLOG = <function noLOG at 0x7fb8b202bb80>)
Downloads wikipedia dumps from dumps.wikimedia.org/frwiki/latest/.
mlstatpy.data.wikipedia.download_pageviews
(dt, folder = “.”, unzip = True, timeout = -1, overwrite = False, fLOG = <function noLOG at 0x7fb8b202bb80>)
Downloads wikipedia pagacount for a precise date (up to the hours), the url follows the pattern:
https://dumps.wikimedia.org/other/pageviews/%Y/%Y-%m/pagecounts-%Y%m%d-%H0000.gz
mlstatpy.data.wikipedia.download_titles
(country, folder = “.”, unzip = True, timeout = -1, overwrite = False, fLOG = <function noLOG at 0x7fb8b202bb80>)
Downloads wikipedia titles from dumps.wikimedia.org/frwiki/latest/latest-all-titles-in-ns0.gz.
mlstatpy.data.wikipedia.enumerate_titles
(filename, norm = True, encoding = “utf8”)
Enumerates titles from a file.
mlstatpy.data.wikipedia.download_dump
(country, name, folder = “.”, unzip = True, timeout = -1, overwrite = False, fLOG = <function noLOG at 0x7fb8b202bb80>)
Downloads wikipedia dumps from dumps.wikimedia.org/frwiki/latest/.