.. _2018-09-18rappelspythonpandasmatplotlibrst: ============================================ 2018-09-18 - Rappels sur pandas et maplotlib ============================================ .. only:: html **Links:** :download:`notebook <2018-09-18_rappels_python_pandas_matplotlib.ipynb>`, :downloadlink:`html <2018-09-18_rappels_python_pandas_matplotlib2html.html>`, :download:`python <2018-09-18_rappels_python_pandas_matplotlib.py>`, :downloadlink:`slides <2018-09-18_rappels_python_pandas_matplotlib.slides.html>`, :githublink:`GitHub|_doc/notebooks/notebook_eleves/2018-2019/2018-09-18_rappels_python_pandas_matplotlib.ipynb|*` Manipulation de données autour du jeu des passagers du Titanic qu’on peut récupérer sur `opendatasoft `__ ou `awesome-public-datasets `__. .. code:: ipython3 import pandas .. code:: ipython3 df = pandas.read_csv("titanic.csv/titanic.csv") .. code:: ipython3 type(df) .. parsed-literal:: pandas.core.frame.DataFrame .. code:: ipython3 df.head(n=2) .. raw:: html
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
.. code:: ipython3 subset = df[ ["PassengerId", "Survived", "Pclass"] ] subset.head(n=2) .. raw:: html
PassengerId Survived Pclass
0 1 0 3
1 2 1 1
.. code:: ipython3 survived = subset[ ["Survived", "Pclass"] ].groupby(["Pclass"]).sum() compte = subset[ ["Survived", "Pclass"] ].groupby(["Pclass"]).count() compte.columns = ['total'] .. code:: ipython3 survived .. raw:: html
Survived
Pclass
1 136
2 87
3 119
.. code:: ipython3 compte .. raw:: html
total
Pclass
1 216
2 184
3 491
.. code:: ipython3 jointure = survived.join(compte) jointure .. raw:: html
Survived total
Pclass
1 136 216
2 87 184
3 119 491
.. code:: ipython3 jointure["survie"] = jointure['Survived'] / jointure.total jointure .. raw:: html
Survived total survie
Pclass
1 136 216 0.629630
2 87 184 0.472826
3 119 491 0.242363
.. code:: ipython3 %matplotlib inline .. code:: ipython3 ax = jointure[['survie']].plot(kind="bar", figsize=(2, 2)) ax.set_title("Titanic") ax.set_ylabel("%"); .. image:: 2018-09-18_rappels_python_pandas_matplotlib_12_0.png .. code:: ipython3 import matplotlib.pyplot as plt fig, ax = plt.subplots(1, 2, figsize=(8,3)) jointure[['survie']].plot(kind="bar", ax=ax[0]) ax[0].set_title("Titanic") ax[0].set_ylabel("%"); jointure.drop('survie', axis=1).plot(kind="bar", ax=ax[1]) ax[1].set_title("Titanic") ax[1].set_ylabel("%"); .. image:: 2018-09-18_rappels_python_pandas_matplotlib_13_0.png .. code:: ipython3 jointure.to_excel("titanic.xlsx") .. code:: ipython3 df.head(n=2) .. raw:: html
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
.. code:: ipython3 mat = df[['Survived', 'Age']].values .. code:: ipython3 mat.T .. parsed-literal:: array([[ 0., 1., 1., ..., 0., 1., 0.], [22., 38., 26., ..., nan, 26., 32.]]) .. code:: ipython3 mat.T @ mat .. parsed-literal:: array([[342., nan], [ nan, nan]]) .. code:: ipython3 mat @ mat.T .. parsed-literal:: array([[ 484., 836., 572., ..., nan, 572., 704.], [ 836., 1445., 989., ..., nan, 989., 1216.], [ 572., 989., 677., ..., nan, 677., 832.], ..., [ nan, nan, nan, ..., nan, nan, nan], [ 572., 989., 677., ..., nan, 677., 832.], [ 704., 1216., 832., ..., nan, 832., 1024.]]) .. code:: ipython3 df.tail(n=2) .. raw:: html
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.00 C148 C
890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.75 NaN Q
.. code:: ipython3 names = list(df['Name']) names nom = names[0] nom .. parsed-literal:: 'Braund, Mr. Owen Harris' .. code:: ipython3 nom.split(',')[1].split('.')[0].strip().lower() .. parsed-literal:: 'mr' .. code:: ipython3 mr = [] for nom in names: mr.append(nom.split(',')[1].split('.')[0].strip().lower()) .. code:: ipython3 df['mr'] = mr .. code:: ipython3 gr = df[ ['Sex', "mr", "PassengerId"] ].groupby(['Sex', "mr"], as_index=False).count() .. code:: ipython3 gr.head() .. raw:: html
Sex mr PassengerId
0 female dr 1
1 female lady 1
2 female miss 182
3 female mlle 2
4 female mme 1
.. code:: ipython3 gr.pivot("mr", "Sex", "PassengerId") .. raw:: html
Sex female male
mr
capt NaN 1.0
col NaN 2.0
don NaN 1.0
dr 1.0 6.0
jonkheer NaN 1.0
lady 1.0 NaN
major NaN 2.0
master NaN 40.0
miss 182.0 NaN
mlle 2.0 NaN
mme 1.0 NaN
mr NaN 517.0
mrs 125.0 NaN
ms 1.0 NaN
rev NaN 6.0
sir NaN 1.0
the countess 1.0 NaN