Graphes en machine learning - correction

Links: notebook, html ., PDF, python, slides ., presentation ., GitHub

Correction (en cours de rédaction) des exercices autour des graphes courants en machine learning.

%matplotlib inline
%load_ext pyensae
import matplotlib.pyplot as plt
plt.style.use('ggplot')
from jyquickhelper import add_notebook_menu
add_notebook_menu()
run previous cell, wait for 2 seconds

Le module utilise des données issue de Wine Quality Data Set pour lequel on essaye de prédire la qualité du vin en fonction de ses caractéristiques chimiques.

import pyensae
from pyensae.datasource import DownloadDataException
uci = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/"
try:
    pyensae.download_data("winequality-red.csv", url=uci)
    pyensae.download_data("winequality-white.csv", url=uci)
except DownloadDataException:
    print("backup")
    pyensae.download_data("winequality-red.csv", website="xd")
    pyensae.download_data("winequality-white.csv", website="xd")
%head winequality-red.csv
"fixed acidity";"volatile acidity";"citric acid";"residual sugar";"chlorides";"free sulfur dioxide";"total sulfur dioxide";"density";"pH";"sulphates";"alcohol";"quality"
7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4;5
7.8;0.88;0;2.6;0.098;25;67;0.9968;3.2;0.68;9.8;5
7.8;0.76;0.04;2.3;0.092;15;54;0.997;3.26;0.65;9.8;5
11.2;0.28;0.56;1.9;0.075;17;60;0.998;3.16;0.58;9.8;6
7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4;5
7.4;0.66;0;1.8;0.075;13;40;0.9978;3.51;0.56;9.4;5
7.9;0.6;0.06;1.6;0.069;15;59;0.9964;3.3;0.46;9.4;5
7.3;0.65;0;1.2;0.065;15;21;0.9946;3.39;0.47;10;7
7.8;0.58;0.02;2;0.073;9;18;0.9968;3.36;0.57;9.5;7

import pandas
red_wine = pandas.read_csv("winequality-red.csv", sep=";")
red_wine["red"] = 1
white_wine = pandas.read_csv("winequality-white.csv", sep=";")
white_wine["red"] = 0
wines = pandas.concat([red_wine, white_wine])
wines.head()
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality red
0 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5 1
1 7.8 0.88 0.00 2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8 5 1
2 7.8 0.76 0.04 2.3 0.092 15.0 54.0 0.9970 3.26 0.65 9.8 5 1
3 11.2 0.28 0.56 1.9 0.075 17.0 60.0 0.9980 3.16 0.58 9.8 6 1
4 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5 1

On découpe en base d’apprentissage, base de test :

from sklearn.model_selection import train_test_split
X = wines[[c for c in wines.columns if c != "quality"]]
Y = wines["quality"]
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=42)
type(x_train), type(y_train)
(pandas.core.frame.DataFrame, pandas.core.series.Series)
wines.shape, x_train.shape, y_train.shape
((6497, 13), (4352, 12), (4352,))

Exercice 1 : créer une fonction pour automatiser la création de ce graphe

Exercice 2 : simplifier l’apprentissage de chaque modèle