{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Cube multidimensionnel - correction\n", "\n", "Manipulation de tables de mortalit\u00e9s fa\u00e7on OLAP, correction des exercices."]}, {"cell_type": "code", "execution_count": 1, "metadata": {"collapsed": false}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["Populating the interactive namespace from numpy and matplotlib\n"]}, {"data": {"text/html": ["Plan\n", "
run previous cell, wait for 2 seconds
\n", ""], "text/plain": [""]}, "execution_count": 2, "metadata": {}, "output_type": "execute_result"}], "source": ["%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.style.use('ggplot')\n", "import pyensae\n", "from pyquickhelper.helpgen import NbImage\n", "from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["On lit les donn\u00e9es puis on recr\u00e9e un [DataSet](http://xarray.pydata.org/en/stable/data-structures.html#dataset) :"]}, {"cell_type": "code", "execution_count": 2, "metadata": {"collapsed": false}, "outputs": [{"data": {"text/plain": ["\n", "Dimensions: (age_num: 84, annee: 54, genre: 3, pays: 54)\n", "Coordinates:\n", " * annee (annee) int64 1960 1961 1962 1963 1964 1965 1966 1967 1968 ...\n", " * age_num (age_num) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 ...\n", " * pays (pays) object 'AM' 'AT' 'AZ' 'BE' 'BG' 'BY' 'CH' 'CY' 'CZ' ...\n", " * genre (genre) object 'F' 'M' 'T'\n", "Data variables:\n", " DEATHRATE (annee, age_num, pays, genre) float64 nan nan nan nan nan ...\n", " LIFEXP (annee, age_num, pays, genre) float64 nan nan nan nan nan ...\n", " PROBDEATH (annee, age_num, pays, genre) float64 nan nan nan nan nan ...\n", " PROBSURV (annee, age_num, pays, genre) float64 nan nan nan nan nan ...\n", " PYLIVED (annee, age_num, pays, genre) float64 nan nan nan nan nan ...\n", " SURVIVORS (annee, age_num, pays, genre) float64 nan nan nan nan nan ...\n", " TOTPYLIVED (annee, age_num, pays, genre) float64 nan nan nan nan nan ..."]}, "execution_count": 3, "metadata": {}, "output_type": "execute_result"}], "source": ["from actuariat_python.data import table_mortalite_euro_stat \n", "table_mortalite_euro_stat()\n", "import pandas\n", "df = pandas.read_csv(\"mortalite.txt\", sep=\"\\t\", encoding=\"utf8\", low_memory=False)\n", "df2 = df[[\"annee\", \"age_num\",\"indicateur\",\"pays\",\"genre\",\"valeur\"]].dropna().reset_index(drop=True)\n", "piv = df2.pivot_table(index=[\"annee\", \"age_num\",\"pays\",\"genre\"],\n", " columns=[\"indicateur\"],\n", " values=\"valeur\")\n", "import xarray\n", "ds = xarray.Dataset.from_dataframe(piv)\n", "ds"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Exercice 1 : que font les lignes suivantes ?\n", "\n", "Le programme suivant uilise les fonctions [align nad reindex](http://xarray.pydata.org/en/stable/data-structures.html?highlight=assign#transforming-datasets) pour faire une moyenne sur une des dimensions du DataSet (le pays) puis \u00e0 ajouter une variable *meanp* contenant le r\u00e9sultat."]}, {"cell_type": "code", "execution_count": 3, "metadata": {"collapsed": false}, "outputs": [{"data": {"text/plain": ["\n", "Dimensions: (age_num: 84, annee: 54, genre: 3, pays: 54)\n", "Coordinates:\n", " * annee (annee) int64 1960 1961 1962 1963 1964 1965 1966 1967 1968 ...\n", " * age_num (age_num) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 ...\n", " * pays (pays) object 'AM' 'AT' 'AZ' 'BE' 'BG' 'BY' 'CH' 'CY' 'CZ' ...\n", " * genre (genre) object 'F' 'M' 'T'\n", "Data variables:\n", " DEATHRATE (annee, age_num, pays, genre) float64 nan nan nan nan nan ...\n", " LIFEXP (annee, age_num, pays, genre) float64 nan nan nan nan nan ...\n", " PROBDEATH (annee, age_num, pays, genre) float64 nan nan nan nan nan ...\n", " PROBSURV (annee, age_num, pays, genre) float64 nan nan nan nan nan ...\n", " PYLIVED (annee, age_num, pays, genre) float64 nan nan nan nan nan ...\n", " SURVIVORS (annee, age_num, pays, genre) float64 nan nan nan nan nan ...\n", " TOTPYLIVED (annee, age_num, pays, genre) float64 nan nan nan nan nan ...\n", " LIFEEXP_add (annee, age_num, pays, genre) float64 nan nan nan nan nan ..."]}, "execution_count": 4, "metadata": {}, "output_type": "execute_result"}], "source": ["ds.assign(LIFEEXP_add = ds.LIFEXP-1)"]}, {"cell_type": "code", "execution_count": 4, "metadata": {"collapsed": false}, "outputs": [], "source": ["meanp = ds.mean(dim=\"pays\")\n", "ds1, ds2 = xarray.align(ds, meanp, join='outer')"]}, {"cell_type": "code", "execution_count": 5, "metadata": {"collapsed": false}, "outputs": [], "source": ["joined = ds1.assign(meanp = ds2[\"LIFEXP\"])"]}, {"cell_type": "code", "execution_count": 6, "metadata": {"collapsed": false}, "outputs": [{"data": {"text/html": ["
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DEATHRATELIFEXPPROBDEATHPROBSURVPYLIVEDSURVIVORSTOTPYLIVEDmeanp
age_numanneegenrepays
11960FAMNaNNaNNaNNaNNaNNaNNaN73.52
ATNaNNaNNaNNaNNaNNaNNaN73.52
AZNaNNaNNaNNaNNaNNaNNaN73.52
BE0.0015973.70.001590.998419731697393717946573.52
BG0.0065273.20.006500.993509550295813701702373.52
\n", "
"], "text/plain": [" DEATHRATE LIFEXP PROBDEATH PROBSURV PYLIVED \\\n", "age_num annee genre pays \n", "1 1960 F AM NaN NaN NaN NaN NaN \n", " AT NaN NaN NaN NaN NaN \n", " AZ NaN NaN NaN NaN NaN \n", " BE 0.00159 73.7 0.00159 0.99841 97316 \n", " BG 0.00652 73.2 0.00650 0.99350 95502 \n", "\n", " SURVIVORS TOTPYLIVED meanp \n", "age_num annee genre pays \n", "1 1960 F AM NaN NaN 73.52 \n", " AT NaN NaN 73.52 \n", " AZ NaN NaN 73.52 \n", " BE 97393 7179465 73.52 \n", " BG 95813 7017023 73.52 "]}, "execution_count": 7, "metadata": {}, "output_type": "execute_result"}], "source": ["joined.to_dataframe().head()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Les valeurs *meanp* sont constantes quelque soient le pays \u00e0 *annee*, *age_num*, *genre* fix\u00e9s."]}, {"cell_type": "code", "execution_count": 7, "metadata": {"collapsed": false}, "outputs": [{"data": {"text/plain": ["\n", "array(23.83243243243243)\n", "Coordinates:\n", " annee int64 2000\n", " genre object 'F'\n", " age_num float64 59.0"]}, "execution_count": 8, "metadata": {}, "output_type": "execute_result"}], "source": ["joined.sel(annee=2000, age_num=59, genre='F')[\"meanp\"]"]}, {"cell_type": "code", "execution_count": 8, "metadata": {"collapsed": true}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2"}}, "nbformat": 4, "nbformat_minor": 2}