.. _seance6graphesmlcorrectionrst: ======================================== Graphes en machine learning - correction ======================================== .. only:: html **Links:** :download:`notebook `, :downloadlink:`html `, :download:`PDF `, :download:`python `, :downloadlink:`slides `, :githublink:`GitHub|_doc/notebooks/sessions/seance6_graphes_ml_correction.ipynb|*` Correction (en cours de rédaction) des exercices autour des graphes courants en machine learning. .. code:: ipython3 %matplotlib inline %load_ext pyensae .. code:: ipython3 import matplotlib.pyplot as plt plt.style.use('ggplot') .. code:: ipython3 from jyquickhelper import add_notebook_menu add_notebook_menu() .. contents:: :local: Le module utilise des données issue de `Wine Quality Data Set `__ pour lequel on essaye de prédire la qualité du vin en fonction de ses caractéristiques chimiques. .. code:: ipython3 from pyensae.datasource import download_data, DownloadDataException uci = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/" try: download_data("winequality-red.csv", url=uci) download_data("winequality-white.csv", url=uci) except DownloadDataException: print("backup") download_data("winequality-red.csv", website="xd") download_data("winequality-white.csv", website="xd") .. code:: ipython3 %head winequality-red.csv .. raw:: html
    "fixed acidity";"volatile acidity";"citric acid";"residual sugar";"chlorides";"free sulfur dioxide";"total sulfur dioxide";"density";"pH";"sulphates";"alcohol";"quality"
    7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4;5
    7.8;0.88;0;2.6;0.098;25;67;0.9968;3.2;0.68;9.8;5
    7.8;0.76;0.04;2.3;0.092;15;54;0.997;3.26;0.65;9.8;5
    11.2;0.28;0.56;1.9;0.075;17;60;0.998;3.16;0.58;9.8;6
    7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4;5
    7.4;0.66;0;1.8;0.075;13;40;0.9978;3.51;0.56;9.4;5
    7.9;0.6;0.06;1.6;0.069;15;59;0.9964;3.3;0.46;9.4;5
    7.3;0.65;0;1.2;0.065;15;21;0.9946;3.39;0.47;10;7
    7.8;0.58;0.02;2;0.073;9;18;0.9968;3.36;0.57;9.5;7

    
.. code:: ipython3 import pandas red_wine = pandas.read_csv("winequality-red.csv", sep=";") red_wine["red"] = 1 white_wine = pandas.read_csv("winequality-white.csv", sep=";") white_wine["red"] = 0 wines = pandas.concat([red_wine, white_wine]) wines.head() .. raw:: html
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality red
0 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5 1
1 7.8 0.88 0.00 2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8 5 1
2 7.8 0.76 0.04 2.3 0.092 15.0 54.0 0.9970 3.26 0.65 9.8 5 1
3 11.2 0.28 0.56 1.9 0.075 17.0 60.0 0.9980 3.16 0.58 9.8 6 1
4 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5 1
On découpe en base d’apprentissage, base de test : .. code:: ipython3 from sklearn.model_selection import train_test_split X = wines[[c for c in wines.columns if c != "quality"]] Y = wines["quality"] x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=42) type(x_train), type(y_train) .. parsed-literal:: (pandas.core.frame.DataFrame, pandas.core.series.Series) .. code:: ipython3 wines.shape, x_train.shape, y_train.shape .. parsed-literal:: ((6497, 13), (4352, 12), (4352,)) Exercice 1 : créer une fonction pour automatiser la création de ce graphe ------------------------------------------------------------------------- Exercice 2 : simplifier l’apprentissage de chaque modèle -------------------------------------------------------- Exercice 3 : grid_search ------------------------ Considérer un modèle et estimer au mieux ses paramètres.