module data.population
¶
Short summary¶
module actuariat_python.data.population
Various function to download data about population
Functions¶
function |
truncated documentation |
---|---|
download fecondity table for France (Excel format) |
|
Downloads the data for the French population from INSEE website |
|
This function retrieves mortality table from EuroStat through table de mortalité … |
|
Download mortality table for France assuming they are available in Excel format. |
Documentation¶
Various function to download data about population
-
actuariat_python.data.population.
fecondite_france
(url=None)¶ download fecondity table for France (Excel format)
- Paramètres
url – source (url or file)
- Renvoie
DataFrame
By default, the data is coming from a local file which is a copy of INSEE: Fécondité selon l’âge détaillé de la mère. The original file cannot be read by pandas so we convert it first. See also INSEE Bilan Démographique 2016.
-
actuariat_python.data.population.
population_france_year
(url='https://www.insee.fr/fr/statistiques/fichier/1892086/pop-totale-france.xls', sheet_name=0, year=2020)¶ Downloads the data for the French population from INSEE website
- Paramètres
url – url
sheet_name – sheet index
year – last year to find
- Renvoie
DataFrame
The sheet index is 0 for the all France, 1 for metropolitean France. The last row aggregates multiple ages
1914 ou avant
, they will remain aggregated but the label will be changed to 1914.100 ou plus
is replaced by 100.By default, the data is coming from INSEE, Bilan Démographique.
2017/01: pandas does not seem to be able to read the format (old format). You should convert the file in txt with Excel.
-
actuariat_python.data.population.
table_mortalite_euro_stat
(url='http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file=data/', name='demo_mlifetable.tsv.gz', final_name='mortalite.txt', whereTo='.', stop_at=None, fLOG=<function noLOG>)¶ This function retrieves mortality table from EuroStat through table de mortalité (this link is currently broken, data-publica does not provide such a database anymore, a copy is provided).
- Paramètres
url – data source
name – data table name
final_name – the data is compressed, it needs to be uncompressed into a file, this parameter defines its name
whereTo – data needs to be downloaded, location of this place
stop_at – the overall process is quite long, if not None, it only keeps the first rows
fLOG – logging function
- Renvoie
data_frame
The function checks the file final_name exists. If it is the case, the data is not downloaded twice.
The header contains a weird format as coordinates are separated by a comma:
indic_de,sex,age,geo ime 2013 2012 2011 2010 2009
We need to preprocess the data to split this information into columns. The overall process takes 4-5 minutes, 10 seconds to download (< 10 Mb), 4-5 minutes to preprocess the data (it could be improved). The processed data contains the following columns:
['annee', 'valeur', 'age', 'age_num', 'indicateur', 'genre', 'pays']
Columns age and age_num look alike. age_num is numeric and is equal to age except when age_num is 85. Everybody above that age fall into the same category. The table contains many indicators:
PROBSURV: Probabilité de survie entre deux âges exacts (px)
LIFEXP: Esperance de vie à l’âge exact (ex)
SURVIVORS: Nombre des survivants à l’âge exact (lx)
PYLIVED: Nombre d’années personnes vécues entre deux âges exacts (Lx)
DEATHRATE: Taux de mortalité à l’âge x (Mx)
PROBDEATH: Probabilité de décès entre deux âges exacts (qx)
TOTPYLIVED: Nombre total d’années personne vécues après l’âge exact (Tx)
-
actuariat_python.data.population.
table_mortalite_france_00_02
(homme=None, femme=None)¶ Download mortality table for France assuming they are available in Excel format.
- Paramètres
homme – table for men
femme – table for women
- Renvoie
DataFrame
The final DataFrame merges both sheets. The data is coming from Institut des Actuaires: Reférences de mortalité or Références techniques.