{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# R\u00e9gression polyn\u00f4miale et pipeline\n", "\n", "Le notebook compare plusieurs de mod\u00e8les de r\u00e9gression polyn\u00f4miale."]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": ["%matplotlib inline"]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": ["from papierstat.datasets import load_wines_dataset\n", "data = load_wines_dataset()\n", "X = data.drop(['quality', 'color'], axis=1)\n", "y = data['quality']"]}, {"cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": ["from sklearn.model_selection import train_test_split\n", "X_train, X_test, y_train, y_test = train_test_split(X, y)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["On normalise les donn\u00e9es. Pour ce cas particulier, c'est d'autant plus important que les polyn\u00f4mes prendront de tr\u00e8s grandes valeurs si cela n'est pas fait et les librairies de calculs n'aiment pas les ordres de grandeurs trop diff\u00e9rents."]}, {"cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": ["from sklearn.preprocessing import Normalizer\n", "norm = Normalizer()\n", "X_train_norm = norm.fit_transform(X_train)\n", "X_test_norm = norm.transform(X_test)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["La transformation [PolynomialFeatures](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html) cr\u00e9\u00e9e de nouvelles features en multipliant les variables les unes avec les autres. Pour le degr\u00e9 deux et trois features $a, b, c$, on obtient les nouvelles features : $1, a, b, c, a^2, ab, ac, b^2, bc, c^2$."]}, {"cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["1 0.189007413643138 0.17548948727814861 0.005909326000001158\n", "2 0.3090044704138045 0.3016856760353912 0.027130041999996024\n", "3 0.4065060987061494 -0.057880204420430736 0.22084438099999915\n", "4 0.5874526458338967 -3659.6472584680923 2.230189553999999\n"]}], "source": ["from time import perf_counter \n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.preprocessing import PolynomialFeatures\n", "from sklearn.pipeline import make_pipeline\n", "from sklearn.metrics import r2_score\n", "\n", "r2ts = []\n", "r2es = []\n", "degs = []\n", "tts = []\n", "models = []\n", "\n", "for d in range(1, 5):\n", " begin = perf_counter ()\n", " pipe = make_pipeline(PolynomialFeatures(degree=d), \n", " LinearRegression())\n", " pipe.fit(X_train_norm, y_train)\n", " duree = perf_counter () - begin\n", " r2t = r2_score(y_train, pipe.predict(X_train_norm))\n", " r2e = r2_score(y_test, pipe.predict(X_test_norm))\n", " degs.append(d)\n", " r2ts.append(r2t)\n", " r2es.append(r2e)\n", " tts.append(duree)\n", " models.append(pipe)\n", " print(d, r2t, r2e, duree)"]}, {"cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tempsr2_trainr2_test
degr\u00e9
10.0059090.1890070.175489
20.0271300.3090040.301686
30.2208440.406506-0.057880
42.2301900.587453-3659.647258
\n", "
"], "text/plain": [" temps r2_train r2_test\n", "degr\u00e9 \n", "1 0.005909 0.189007 0.175489\n", "2 0.027130 0.309004 0.301686\n", "3 0.220844 0.406506 -0.057880\n", "4 2.230190 0.587453 -3659.647258"]}, "execution_count": 7, "metadata": {}, "output_type": "execute_result"}], "source": ["import pandas\n", "df = pandas.DataFrame(dict(temps=tts, r2_train=r2ts, r2_test=r2es, degr\u00e9=degs))\n", "df.set_index('degr\u00e9')"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Le polyn\u00f4mes de degr\u00e9 2 para\u00eet le meilleur mod\u00e8le. Le temps de calcul est multipli\u00e9 par 10 \u00e0 chaque fois, ce qui correspond au nombre de features. On voit n\u00e9anmoins que l'ajout de features crois\u00e9e fonctionne sur ce jeu de donn\u00e9es. Mais au del\u00e0 de 3, la r\u00e9gression produit des r\u00e9sultats tr\u00e8s mauvais sur la base de test alors qu'ils continuent d'augmenter sur la base d'apprentissage. Voyons cela un peu plus en d\u00e9tail."]}, {"cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [{"data": {"image/png": "\n", "text/plain": ["
"]}, "metadata": {}, "output_type": "display_data"}], "source": ["import matplotlib.pyplot as plt\n", "fig, ax = plt.subplots(1, 2, figsize=(12, 4))\n", "\n", "n = 15\n", "ax[0].plot(y_train[:n].reset_index(), 'o')\n", "ax[1].plot(y_test[:n].reset_index(), 'o')\n", "ax[0].set_title('Pr\u00e9dictions sur quelques valeurs\\napprentissage')\n", "ax[1].set_title('Pr\u00e9dictions sur quelques valeurs\\ntest')\n", "for x in ax:\n", " x.set_ylim([3, 9])\n", " x.get_xaxis().set_visible(False)\n", "\n", "for model in models:\n", " d = model.get_params()['polynomialfeatures__degree']\n", " tr = model.predict(X_train_norm[:n])\n", " te = model.predict(X_test_norm[:n])\n", " ax[0].plot(tr, label=\"d=%d\" % d)\n", " ax[1].plot(te, label=\"d=%d\" % d)\n", "ax[0].legend()\n", "ax[1].legend();"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Le mod\u00e8le de degr\u00e9 4 a l'air performant sur la base d'apprentissage mais s'\u00e9gare compl\u00e8tement sur la base de test comme s'il \u00e9tait surpris des valeurs rencontr\u00e9es sur la base de test. On dit que le mod\u00e8le fait du [sur-apprentissage](https://fr.wikipedia.org/wiki/Surapprentissage) ou [overfitting](https://en.wikipedia.org/wiki/Overfitting) en anglais. Le polyn\u00f4me de degr\u00e9 fonctionne mieux que la r\u00e9gression lin\u00e9aire simple. On peut se demander quelles sont les variables crois\u00e9es qui ont un impact sur la performance. On utilise le mod\u00e8le [statsmodels](http://www.statsmodels.org/stable/index.html)."]}, {"cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": ["poly = PolynomialFeatures(degree=2)\n", "poly_feat_train = poly.fit_transform(X_train_norm)\n", "poly_feat_test = poly.fit_transform(X_test_norm)"]}, {"cell_type": "code", "execution_count": 9, "metadata": {"scrolled": false}, "outputs": [{"data": {"text/html": ["\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Model: OLS Adj. R-squared: 0.302
Dependent Variable: quality AIC: 10821.7528
Date: 2018-09-09 14:59 BIC: 11321.5798
No. Observations: 4872 Log-Likelihood: -5333.9
Df Model: 76 F-statistic: 28.70
Df Residuals: 4795 Prob (F-statistic): 0.00
R-squared: 0.313 Scale: 0.53135
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Coef. Std.Err. t P>|t| [0.025 0.975]
const -1062.0749 1873.8933 -0.5668 0.5709 -4735.7657 2611.6159
x1 -2.0670 24.3092 -0.0850 0.9322 -49.7241 45.5901
x2 -729.2270 157.0307 -4.6439 0.0000 -1037.0792 -421.3748
x3 8.7231 200.5623 0.0435 0.9653 -384.4709 401.9171
x4 3.4131 13.3672 0.2553 0.7985 -22.7927 29.6188
x5 -1171.6645 689.5425 -1.6992 0.0893 -2523.4842 180.1551
x6 40.1316 8.1636 4.9159 0.0000 24.1272 56.1360
x7 70.8825 22.2599 3.1843 0.0015 27.2429 114.5221
x8 -724.4624 724.6583 -0.9997 0.3175 -2145.1251 696.2003
x9 -251.7727 192.4538 -1.3082 0.1909 -629.0704 125.5250
x10 -276.1104 163.5189 -1.6886 0.0914 -596.6824 44.4616
x11 258.8220 24.9389 10.3782 0.0000 209.9303 307.7138
x12 1021.3730 1866.5461 0.5472 0.5843 -2637.9138 4680.6598
x13 394.9815 155.7356 2.5362 0.0112 89.6682 700.2947
x14 250.5746 208.6039 1.2012 0.2297 -158.3848 659.5340
x15 -4.4734 21.4718 -0.2083 0.8350 -46.5681 37.6213
x16 -829.1409 537.0924 -1.5438 0.1227 -1882.0886 223.8067
x17 -7.9049 9.5287 -0.8296 0.4068 -26.5856 10.7758
x18 5.7595 21.3304 0.2700 0.7872 -36.0579 47.5770
x19 375.0969 1075.4182 0.3488 0.7273 -1733.2162 2483.4100
x20 114.3253 261.5978 0.4370 0.6621 -398.5264 627.1770
x21 -134.9056 151.9652 -0.8877 0.3747 -432.8272 163.0159
x22 -72.6985 26.4607 -2.7474 0.0060 -124.5735 -20.8234
x23 520.0041 1956.5000 0.2658 0.7904 -3315.6336 4355.6418
x24 -325.9049 1598.8675 -0.2038 0.8385 -3460.4188 2808.6090
x25 -231.1519 129.0113 -1.7917 0.0732 -484.0732 21.7694
x26 4345.4355 4193.6897 1.0362 0.3002 -3876.1206 12566.9917
x27 169.8998 63.4267 2.6787 0.0074 45.5543 294.2453
x28 542.4856 138.8860 3.9060 0.0001 270.2053 814.7658
x29 -8778.6169 5436.2565 -1.6148 0.1064 -19436.1739 1878.9402
x30 1809.6247 1561.8380 1.1587 0.2467 -1252.2945 4871.5438
x31 -2415.5578 1236.4205 -1.9537 0.0508 -4839.5094 8.3938
x32 749.0360 191.8871 3.9035 0.0001 372.8492 1125.2228
x33 -566.9825 2058.9378 -0.2754 0.7830 -4603.4454 3469.4803
x34 -107.1104 167.6736 -0.6388 0.5230 -435.8275 221.6068
x35 3906.7743 5806.8281 0.6728 0.5011 -7477.2732 15290.8218
x36 10.0617 81.2204 0.1239 0.9014 -149.1677 169.2910
x37 -24.5923 177.8119 -0.1383 0.8900 -373.1851 324.0006
x38 743.3110 8266.7074 0.0899 0.9284 -15463.2288 16949.8507
x39 -2927.3589 2162.4656 -1.3537 0.1759 -7166.7837 1312.0660
x40 1943.9211 1604.4419 1.2116 0.2257 -1201.5212 5089.3633
x41 604.3677 250.6121 2.4116 0.0159 113.0530 1095.6823
x42 1020.5273 1873.8778 0.5446 0.5860 -2653.1330 4694.1876
x43 1250.7078 760.6158 1.6443 0.1002 -240.4481 2741.8637
x44 -10.1449 4.8977 -2.0714 0.0384 -19.7466 -0.5432
x45 2.2443 12.2455 0.1833 0.8546 -21.7624 26.2511
x46 616.4495 725.3393 0.8499 0.3954 -805.5484 2038.4474
x47 -134.1735 203.3448 -0.6598 0.5094 -532.8226 264.4757
x48 -123.4359 167.3097 -0.7378 0.4607 -451.4397 204.5678
x49 -4.4518 20.8884 -0.2131 0.8312 -45.4026 36.4990
x50 4274.1441 6764.6180 0.6318 0.5275 -8987.6111 17535.8993
x51 -361.0029 283.6415 -1.2727 0.2032 -917.0705 195.0647
x52 1132.4796 607.3146 1.8647 0.0623 -58.1356 2323.0948
x53 32386.1105 22852.8882 1.4172 0.1565 -12416.0364 77188.2575
x54 -9169.2287 6043.2328 -1.5173 0.1293 -21016.7378 2678.2804
x55 -3935.3569 4533.8479 -0.8680 0.3854 -12823.7790 4953.0653
x56 1044.0159 730.5207 1.4291 0.1530 -388.1398 2476.1716
x57 1011.9144 1874.0030 0.5400 0.5892 -2661.9913 4685.8201
x58 -31.4356 7.2561 -4.3323 0.0000 -45.6610 -17.2103
x59 527.6354 299.1980 1.7635 0.0779 -58.9301 1114.2008
x60 -35.3657 79.2158 -0.4464 0.6553 -190.6650 119.9336
x61 77.6789 65.5638 1.1848 0.2362 -50.8562 206.2139
x62 -57.8306 9.2836 -6.2293 0.0000 -76.0308 -39.6304
x63 995.5783 1874.0603 0.5312 0.5953 -2678.4398 4669.5964
x64 28.6082 645.1442 0.0443 0.9646 -1236.1706 1293.3869
x65 296.2682 172.0530 1.7220 0.0851 -41.0347 633.5711
x66 296.9037 146.6463 2.0246 0.0430 9.4096 584.3978
x67 -196.1275 22.5097 -8.7130 0.0000 -240.2568 -151.9981
x68 -6153.6885 18362.8686 -0.3351 0.7376 -42153.3367 29845.9598
x69 14642.0880 9841.6551 1.4878 0.1369 -4652.0717 33936.2477
x70 -1516.3079 5845.7221 -0.2594 0.7953 -12976.6055 9943.9897
x71 -2197.1628 981.3622 -2.2389 0.0252 -4121.0830 -273.2427
x72 -2425.8554 1193.3038 -2.0329 0.0421 -4765.2784 -86.4324
x73 1641.2450 1566.4224 1.0478 0.2948 -1429.6615 4712.1516
x74 725.5301 253.2564 2.8648 0.0042 229.0313 1222.0290
x75 -1657.2401 2016.9159 -0.8217 0.4113 -5611.3208 2296.8406
x76 398.5591 200.1289 1.9915 0.0465 6.2146 790.9036
x77 898.1505 1871.4360 0.4799 0.6313 -2770.7228 4567.0237
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Omnibus: 85.458 Durbin-Watson: 1.991
Prob(Omnibus): 0.000 Jarque-Bera (JB): 148.619
Skew: 0.132 Prob(JB): 0.000
Kurtosis: 3.814 Condition No.: 4493323340279420
"], "text/plain": ["\n", "\"\"\"\n", " Results: Ordinary least squares\n", "===================================================================\n", "Model: OLS Adj. R-squared: 0.302 \n", "Dependent Variable: quality AIC: 10821.7528\n", "Date: 2018-09-09 14:59 BIC: 11321.5798\n", "No. Observations: 4872 Log-Likelihood: -5333.9 \n", "Df Model: 76 F-statistic: 28.70 \n", "Df Residuals: 4795 Prob (F-statistic): 0.00 \n", "R-squared: 0.313 Scale: 0.53135 \n", "-------------------------------------------------------------------\n", " Coef. Std.Err. t P>|t| [0.025 0.975] \n", "-------------------------------------------------------------------\n", "const -1062.0749 1873.8933 -0.5668 0.5709 -4735.7657 2611.6159\n", "x1 -2.0670 24.3092 -0.0850 0.9322 -49.7241 45.5901\n", "x2 -729.2270 157.0307 -4.6439 0.0000 -1037.0792 -421.3748\n", "x3 8.7231 200.5623 0.0435 0.9653 -384.4709 401.9171\n", "x4 3.4131 13.3672 0.2553 0.7985 -22.7927 29.6188\n", "x5 -1171.6645 689.5425 -1.6992 0.0893 -2523.4842 180.1551\n", "x6 40.1316 8.1636 4.9159 0.0000 24.1272 56.1360\n", "x7 70.8825 22.2599 3.1843 0.0015 27.2429 114.5221\n", "x8 -724.4624 724.6583 -0.9997 0.3175 -2145.1251 696.2003\n", "x9 -251.7727 192.4538 -1.3082 0.1909 -629.0704 125.5250\n", "x10 -276.1104 163.5189 -1.6886 0.0914 -596.6824 44.4616\n", "x11 258.8220 24.9389 10.3782 0.0000 209.9303 307.7138\n", "x12 1021.3730 1866.5461 0.5472 0.5843 -2637.9138 4680.6598\n", "x13 394.9815 155.7356 2.5362 0.0112 89.6682 700.2947\n", "x14 250.5746 208.6039 1.2012 0.2297 -158.3848 659.5340\n", "x15 -4.4734 21.4718 -0.2083 0.8350 -46.5681 37.6213\n", "x16 -829.1409 537.0924 -1.5438 0.1227 -1882.0886 223.8067\n", "x17 -7.9049 9.5287 -0.8296 0.4068 -26.5856 10.7758\n", "x18 5.7595 21.3304 0.2700 0.7872 -36.0579 47.5770\n", "x19 375.0969 1075.4182 0.3488 0.7273 -1733.2162 2483.4100\n", "x20 114.3253 261.5978 0.4370 0.6621 -398.5264 627.1770\n", "x21 -134.9056 151.9652 -0.8877 0.3747 -432.8272 163.0159\n", "x22 -72.6985 26.4607 -2.7474 0.0060 -124.5735 -20.8234\n", "x23 520.0041 1956.5000 0.2658 0.7904 -3315.6336 4355.6418\n", "x24 -325.9049 1598.8675 -0.2038 0.8385 -3460.4188 2808.6090\n", "x25 -231.1519 129.0113 -1.7917 0.0732 -484.0732 21.7694\n", "x26 4345.4355 4193.6897 1.0362 0.3002 -3876.1206 12566.9917\n", "x27 169.8998 63.4267 2.6787 0.0074 45.5543 294.2453\n", "x28 542.4856 138.8860 3.9060 0.0001 270.2053 814.7658\n", "x29 -8778.6169 5436.2565 -1.6148 0.1064 -19436.1739 1878.9402\n", "x30 1809.6247 1561.8380 1.1587 0.2467 -1252.2945 4871.5438\n", "x31 -2415.5578 1236.4205 -1.9537 0.0508 -4839.5094 8.3938\n", "x32 749.0360 191.8871 3.9035 0.0001 372.8492 1125.2228\n", "x33 -566.9825 2058.9378 -0.2754 0.7830 -4603.4454 3469.4803\n", "x34 -107.1104 167.6736 -0.6388 0.5230 -435.8275 221.6068\n", "x35 3906.7743 5806.8281 0.6728 0.5011 -7477.2732 15290.8218\n", "x36 10.0617 81.2204 0.1239 0.9014 -149.1677 169.2910\n", "x37 -24.5923 177.8119 -0.1383 0.8900 -373.1851 324.0006\n", "x38 743.3110 8266.7074 0.0899 0.9284 -15463.2288 16949.8507\n", "x39 -2927.3589 2162.4656 -1.3537 0.1759 -7166.7837 1312.0660\n", "x40 1943.9211 1604.4419 1.2116 0.2257 -1201.5212 5089.3633\n", "x41 604.3677 250.6121 2.4116 0.0159 113.0530 1095.6823\n", "x42 1020.5273 1873.8778 0.5446 0.5860 -2653.1330 4694.1876\n", "x43 1250.7078 760.6158 1.6443 0.1002 -240.4481 2741.8637\n", "x44 -10.1449 4.8977 -2.0714 0.0384 -19.7466 -0.5432\n", "x45 2.2443 12.2455 0.1833 0.8546 -21.7624 26.2511\n", "x46 616.4495 725.3393 0.8499 0.3954 -805.5484 2038.4474\n", "x47 -134.1735 203.3448 -0.6598 0.5094 -532.8226 264.4757\n", "x48 -123.4359 167.3097 -0.7378 0.4607 -451.4397 204.5678\n", "x49 -4.4518 20.8884 -0.2131 0.8312 -45.4026 36.4990\n", "x50 4274.1441 6764.6180 0.6318 0.5275 -8987.6111 17535.8993\n", "x51 -361.0029 283.6415 -1.2727 0.2032 -917.0705 195.0647\n", "x52 1132.4796 607.3146 1.8647 0.0623 -58.1356 2323.0948\n", "x53 32386.1105 22852.8882 1.4172 0.1565 -12416.0364 77188.2575\n", "x54 -9169.2287 6043.2328 -1.5173 0.1293 -21016.7378 2678.2804\n", "x55 -3935.3569 4533.8479 -0.8680 0.3854 -12823.7790 4953.0653\n", "x56 1044.0159 730.5207 1.4291 0.1530 -388.1398 2476.1716\n", "x57 1011.9144 1874.0030 0.5400 0.5892 -2661.9913 4685.8201\n", "x58 -31.4356 7.2561 -4.3323 0.0000 -45.6610 -17.2103\n", "x59 527.6354 299.1980 1.7635 0.0779 -58.9301 1114.2008\n", "x60 -35.3657 79.2158 -0.4464 0.6553 -190.6650 119.9336\n", "x61 77.6789 65.5638 1.1848 0.2362 -50.8562 206.2139\n", "x62 -57.8306 9.2836 -6.2293 0.0000 -76.0308 -39.6304\n", "x63 995.5783 1874.0603 0.5312 0.5953 -2678.4398 4669.5964\n", "x64 28.6082 645.1442 0.0443 0.9646 -1236.1706 1293.3869\n", "x65 296.2682 172.0530 1.7220 0.0851 -41.0347 633.5711\n", "x66 296.9037 146.6463 2.0246 0.0430 9.4096 584.3978\n", "x67 -196.1275 22.5097 -8.7130 0.0000 -240.2568 -151.9981\n", "x68 -6153.6885 18362.8686 -0.3351 0.7376 -42153.3367 29845.9598\n", "x69 14642.0880 9841.6551 1.4878 0.1369 -4652.0717 33936.2477\n", "x70 -1516.3079 5845.7221 -0.2594 0.7953 -12976.6055 9943.9897\n", "x71 -2197.1628 981.3622 -2.2389 0.0252 -4121.0830 -273.2427\n", "x72 -2425.8554 1193.3038 -2.0329 0.0421 -4765.2784 -86.4324\n", "x73 1641.2450 1566.4224 1.0478 0.2948 -1429.6615 4712.1516\n", "x74 725.5301 253.2564 2.8648 0.0042 229.0313 1222.0290\n", "x75 -1657.2401 2016.9159 -0.8217 0.4113 -5611.3208 2296.8406\n", "x76 398.5591 200.1289 1.9915 0.0465 6.2146 790.9036\n", "x77 898.1505 1871.4360 0.4799 0.6313 -2770.7228 4567.0237\n", "-------------------------------------------------------------------\n", "Omnibus: 85.458 Durbin-Watson: 1.991 \n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 148.619 \n", "Skew: 0.132 Prob(JB): 0.000 \n", "Kurtosis: 3.814 Condition No.: 4493323340279420\n", "===================================================================\n", "* The condition number is large (4e+15). This might indicate\n", "strong multicollinearity or other numerical problems.\n", "\"\"\""]}, "execution_count": 10, "metadata": {}, "output_type": "execute_result"}], "source": ["from statsmodels.regression.linear_model import OLS\n", "model = OLS(y_train, poly_feat_train)\n", "results = model.fit()\n", "results.summary2()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Ce n'est pas tr\u00e8s lisible. Il faut ajouter le nom de chaque variable et recommencer."]}, {"cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
1fixed_acidityvolatile_aciditycitric_acidresidual_sugarchloridesfree_sulfur_dioxidetotal_sulfur_dioxidedensitypH...density^2density * pHdensity * sulphatesdensity * alcoholpH^2pH * sulphatespH * alcoholsulphates^2sulphates * alcoholalcohol^2
01.00.0345350.0018890.0027520.0420890.0002970.2859930.9551080.0053690.016836...0.0000290.0000900.0000130.0002780.0002830.0000410.0008720.0000060.0001260.002683
11.00.0565140.0028680.0036270.0134960.0003460.2446140.9615850.0083520.027245...0.0000700.0002280.0000310.0008880.0007420.0001010.0028960.0000140.0003940.011296
21.00.0629380.0010330.0024420.1390270.0004980.4133250.8924060.0093630.030060...0.0000880.0002810.0000310.0008620.0009040.0000990.0027670.0000110.0003030.008475
31.00.1208820.0046380.0056220.0365460.0011670.2248960.9558080.0140250.046385...0.0001970.0006510.0000950.0018530.0021520.0003130.0061290.0000460.0008910.017457
41.00.4552530.0186020.0234970.1860170.0064620.1468560.5384710.0487450.158115...0.0023760.0077070.0013600.0314970.0250000.0044120.1021680.0007790.0180300.417530
\n", "

5 rows \u00d7 78 columns

\n", "
"], "text/plain": [" 1 fixed_acidity volatile_acidity citric_acid residual_sugar \\\n", "0 1.0 0.034535 0.001889 0.002752 0.042089 \n", "1 1.0 0.056514 0.002868 0.003627 0.013496 \n", "2 1.0 0.062938 0.001033 0.002442 0.139027 \n", "3 1.0 0.120882 0.004638 0.005622 0.036546 \n", "4 1.0 0.455253 0.018602 0.023497 0.186017 \n", "\n", " chlorides free_sulfur_dioxide total_sulfur_dioxide density pH \\\n", "0 0.000297 0.285993 0.955108 0.005369 0.016836 \n", "1 0.000346 0.244614 0.961585 0.008352 0.027245 \n", "2 0.000498 0.413325 0.892406 0.009363 0.030060 \n", "3 0.001167 0.224896 0.955808 0.014025 0.046385 \n", "4 0.006462 0.146856 0.538471 0.048745 0.158115 \n", "\n", " ... density^2 density * pH density * sulphates density * alcohol \\\n", "0 ... 0.000029 0.000090 0.000013 0.000278 \n", "1 ... 0.000070 0.000228 0.000031 0.000888 \n", "2 ... 0.000088 0.000281 0.000031 0.000862 \n", "3 ... 0.000197 0.000651 0.000095 0.001853 \n", "4 ... 0.002376 0.007707 0.001360 0.031497 \n", "\n", " pH^2 pH * sulphates pH * alcohol sulphates^2 sulphates * alcohol \\\n", "0 0.000283 0.000041 0.000872 0.000006 0.000126 \n", "1 0.000742 0.000101 0.002896 0.000014 0.000394 \n", "2 0.000904 0.000099 0.002767 0.000011 0.000303 \n", "3 0.002152 0.000313 0.006129 0.000046 0.000891 \n", "4 0.025000 0.004412 0.102168 0.000779 0.018030 \n", "\n", " alcohol^2 \n", "0 0.002683 \n", "1 0.011296 \n", "2 0.008475 \n", "3 0.017457 \n", "4 0.417530 \n", "\n", "[5 rows x 78 columns]"]}, "execution_count": 11, "metadata": {}, "output_type": "execute_result"}], "source": ["names = poly.get_feature_names(input_features=data.columns[:-2])\n", "names = [n.replace(\" \", \" * \") for n in names]\n", "pft = pandas.DataFrame(poly_feat_train, columns=names)\n", "pft.head()"]}, {"cell_type": "code", "execution_count": 11, "metadata": {"scrolled": false}, "outputs": [{"data": {"text/html": ["\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Model: OLS Adj. R-squared: 0.302
Dependent Variable: quality AIC: 10821.7528
Date: 2018-09-09 15:02 BIC: 11321.5798
No. Observations: 4872 Log-Likelihood: -5333.9
Df Model: 76 F-statistic: 28.70
Df Residuals: 4795 Prob (F-statistic): 0.00
R-squared: 0.313 Scale: 0.53135
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Coef. Std.Err. t P>|t| [0.025 0.975]
1 -1062.0749 1873.8933 -0.5668 0.5709 -4735.7657 2611.6159
fixed_acidity -2.0670 24.3092 -0.0850 0.9322 -49.7241 45.5901
volatile_acidity -729.2270 157.0307 -4.6439 0.0000 -1037.0792 -421.3748
citric_acid 8.7231 200.5623 0.0435 0.9653 -384.4709 401.9171
residual_sugar 3.4131 13.3672 0.2553 0.7985 -22.7927 29.6188
chlorides -1171.6645 689.5425 -1.6992 0.0893 -2523.4842 180.1551
free_sulfur_dioxide 40.1316 8.1636 4.9159 0.0000 24.1272 56.1360
total_sulfur_dioxide 70.8825 22.2599 3.1843 0.0015 27.2429 114.5221
density -724.4624 724.6583 -0.9997 0.3175 -2145.1251 696.2003
pH -251.7727 192.4538 -1.3082 0.1909 -629.0704 125.5250
sulphates -276.1104 163.5189 -1.6886 0.0914 -596.6824 44.4616
alcohol 258.8220 24.9389 10.3782 0.0000 209.9303 307.7138
fixed_acidity^2 1021.3730 1866.5461 0.5472 0.5843 -2637.9138 4680.6598
fixed_acidity * volatile_acidity 394.9815 155.7356 2.5362 0.0112 89.6682 700.2947
fixed_acidity * citric_acid 250.5746 208.6039 1.2012 0.2297 -158.3848 659.5340
fixed_acidity * residual_sugar -4.4734 21.4718 -0.2083 0.8350 -46.5681 37.6213
fixed_acidity * chlorides -829.1409 537.0924 -1.5438 0.1227 -1882.0886 223.8067
fixed_acidity * free_sulfur_dioxide -7.9049 9.5287 -0.8296 0.4068 -26.5856 10.7758
fixed_acidity * total_sulfur_dioxide 5.7595 21.3304 0.2700 0.7872 -36.0579 47.5770
fixed_acidity * density 375.0969 1075.4182 0.3488 0.7273 -1733.2162 2483.4100
fixed_acidity * pH 114.3253 261.5978 0.4370 0.6621 -398.5264 627.1770
fixed_acidity * sulphates -134.9056 151.9652 -0.8877 0.3747 -432.8272 163.0159
fixed_acidity * alcohol -72.6985 26.4607 -2.7474 0.0060 -124.5735 -20.8234
volatile_acidity^2 520.0041 1956.5000 0.2658 0.7904 -3315.6336 4355.6418
volatile_acidity * citric_acid -325.9049 1598.8675 -0.2038 0.8385 -3460.4188 2808.6090
volatile_acidity * residual_sugar -231.1519 129.0113 -1.7917 0.0732 -484.0732 21.7694
volatile_acidity * chlorides 4345.4355 4193.6897 1.0362 0.3002 -3876.1206 12566.9917
volatile_acidity * free_sulfur_dioxide 169.8998 63.4267 2.6787 0.0074 45.5543 294.2453
volatile_acidity * total_sulfur_dioxide 542.4856 138.8860 3.9060 0.0001 270.2053 814.7658
volatile_acidity * density -8778.6169 5436.2565 -1.6148 0.1064 -19436.1739 1878.9402
volatile_acidity * pH 1809.6247 1561.8380 1.1587 0.2467 -1252.2945 4871.5438
volatile_acidity * sulphates -2415.5578 1236.4205 -1.9537 0.0508 -4839.5094 8.3938
volatile_acidity * alcohol 749.0360 191.8871 3.9035 0.0001 372.8492 1125.2228
citric_acid^2 -566.9825 2058.9378 -0.2754 0.7830 -4603.4454 3469.4803
citric_acid * residual_sugar -107.1104 167.6736 -0.6388 0.5230 -435.8275 221.6068
citric_acid * chlorides 3906.7743 5806.8281 0.6728 0.5011 -7477.2732 15290.8218
citric_acid * free_sulfur_dioxide 10.0617 81.2204 0.1239 0.9014 -149.1677 169.2910
citric_acid * total_sulfur_dioxide -24.5923 177.8119 -0.1383 0.8900 -373.1851 324.0006
citric_acid * density 743.3110 8266.7074 0.0899 0.9284 -15463.2288 16949.8507
citric_acid * pH -2927.3589 2162.4656 -1.3537 0.1759 -7166.7837 1312.0660
citric_acid * sulphates 1943.9211 1604.4419 1.2116 0.2257 -1201.5212 5089.3633
citric_acid * alcohol 604.3677 250.6121 2.4116 0.0159 113.0530 1095.6823
residual_sugar^2 1020.5273 1873.8778 0.5446 0.5860 -2653.1330 4694.1876
residual_sugar * chlorides 1250.7078 760.6158 1.6443 0.1002 -240.4481 2741.8637
residual_sugar * free_sulfur_dioxide -10.1449 4.8977 -2.0714 0.0384 -19.7466 -0.5432
residual_sugar * total_sulfur_dioxide 2.2443 12.2455 0.1833 0.8546 -21.7624 26.2511
residual_sugar * density 616.4495 725.3393 0.8499 0.3954 -805.5484 2038.4474
residual_sugar * pH -134.1735 203.3448 -0.6598 0.5094 -532.8226 264.4757
residual_sugar * sulphates -123.4359 167.3097 -0.7378 0.4607 -451.4397 204.5678
residual_sugar * alcohol -4.4518 20.8884 -0.2131 0.8312 -45.4026 36.4990
chlorides^2 4274.1441 6764.6180 0.6318 0.5275 -8987.6111 17535.8993
chlorides * free_sulfur_dioxide -361.0029 283.6415 -1.2727 0.2032 -917.0705 195.0647
chlorides * total_sulfur_dioxide 1132.4796 607.3146 1.8647 0.0623 -58.1356 2323.0948
chlorides * density 32386.1105 22852.8882 1.4172 0.1565 -12416.0364 77188.2575
chlorides * pH -9169.2287 6043.2328 -1.5173 0.1293 -21016.7378 2678.2804
chlorides * sulphates -3935.3569 4533.8479 -0.8680 0.3854 -12823.7790 4953.0653
chlorides * alcohol 1044.0159 730.5207 1.4291 0.1530 -388.1398 2476.1716
free_sulfur_dioxide^2 1011.9144 1874.0030 0.5400 0.5892 -2661.9913 4685.8201
free_sulfur_dioxide * total_sulfur_dioxide -31.4356 7.2561 -4.3323 0.0000 -45.6610 -17.2103
free_sulfur_dioxide * density 527.6354 299.1980 1.7635 0.0779 -58.9301 1114.2008
free_sulfur_dioxide * pH -35.3657 79.2158 -0.4464 0.6553 -190.6650 119.9336
free_sulfur_dioxide * sulphates 77.6789 65.5638 1.1848 0.2362 -50.8562 206.2139
free_sulfur_dioxide * alcohol -57.8306 9.2836 -6.2293 0.0000 -76.0308 -39.6304
total_sulfur_dioxide^2 995.5783 1874.0603 0.5312 0.5953 -2678.4398 4669.5964
total_sulfur_dioxide * density 28.6082 645.1442 0.0443 0.9646 -1236.1706 1293.3869
total_sulfur_dioxide * pH 296.2682 172.0530 1.7220 0.0851 -41.0347 633.5711
total_sulfur_dioxide * sulphates 296.9037 146.6463 2.0246 0.0430 9.4096 584.3978
total_sulfur_dioxide * alcohol -196.1275 22.5097 -8.7130 0.0000 -240.2568 -151.9981
density^2 -6153.6885 18362.8686 -0.3351 0.7376 -42153.3367 29845.9598
density * pH 14642.0880 9841.6551 1.4878 0.1369 -4652.0717 33936.2477
density * sulphates -1516.3079 5845.7221 -0.2594 0.7953 -12976.6055 9943.9897
density * alcohol -2197.1628 981.3622 -2.2389 0.0252 -4121.0830 -273.2427
pH^2 -2425.8554 1193.3038 -2.0329 0.0421 -4765.2784 -86.4324
pH * sulphates 1641.2450 1566.4224 1.0478 0.2948 -1429.6615 4712.1516
pH * alcohol 725.5301 253.2564 2.8648 0.0042 229.0313 1222.0290
sulphates^2 -1657.2401 2016.9159 -0.8217 0.4113 -5611.3208 2296.8406
sulphates * alcohol 398.5591 200.1289 1.9915 0.0465 6.2146 790.9036
alcohol^2 898.1505 1871.4360 0.4799 0.6313 -2770.7228 4567.0237
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Omnibus: 85.458 Durbin-Watson: 1.991
Prob(Omnibus): 0.000 Jarque-Bera (JB): 148.619
Skew: 0.132 Prob(JB): 0.000
Kurtosis: 3.814 Condition No.: 4493323340279420
"], "text/plain": ["\n", "\"\"\"\n", " Results: Ordinary least squares\n", "======================================================================================================\n", "Model: OLS Adj. R-squared: 0.302 \n", "Dependent Variable: quality AIC: 10821.7528\n", "Date: 2018-09-09 15:02 BIC: 11321.5798\n", "No. Observations: 4872 Log-Likelihood: -5333.9 \n", "Df Model: 76 F-statistic: 28.70 \n", "Df Residuals: 4795 Prob (F-statistic): 0.00 \n", "R-squared: 0.313 Scale: 0.53135 \n", "------------------------------------------------------------------------------------------------------\n", " Coef. Std.Err. t P>|t| [0.025 0.975] \n", "------------------------------------------------------------------------------------------------------\n", "1 -1062.0749 1873.8933 -0.5668 0.5709 -4735.7657 2611.6159\n", "fixed_acidity -2.0670 24.3092 -0.0850 0.9322 -49.7241 45.5901\n", "volatile_acidity -729.2270 157.0307 -4.6439 0.0000 -1037.0792 -421.3748\n", "citric_acid 8.7231 200.5623 0.0435 0.9653 -384.4709 401.9171\n", "residual_sugar 3.4131 13.3672 0.2553 0.7985 -22.7927 29.6188\n", "chlorides -1171.6645 689.5425 -1.6992 0.0893 -2523.4842 180.1551\n", "free_sulfur_dioxide 40.1316 8.1636 4.9159 0.0000 24.1272 56.1360\n", "total_sulfur_dioxide 70.8825 22.2599 3.1843 0.0015 27.2429 114.5221\n", "density -724.4624 724.6583 -0.9997 0.3175 -2145.1251 696.2003\n", "pH -251.7727 192.4538 -1.3082 0.1909 -629.0704 125.5250\n", "sulphates -276.1104 163.5189 -1.6886 0.0914 -596.6824 44.4616\n", "alcohol 258.8220 24.9389 10.3782 0.0000 209.9303 307.7138\n", "fixed_acidity^2 1021.3730 1866.5461 0.5472 0.5843 -2637.9138 4680.6598\n", "fixed_acidity * volatile_acidity 394.9815 155.7356 2.5362 0.0112 89.6682 700.2947\n", "fixed_acidity * citric_acid 250.5746 208.6039 1.2012 0.2297 -158.3848 659.5340\n", "fixed_acidity * residual_sugar -4.4734 21.4718 -0.2083 0.8350 -46.5681 37.6213\n", "fixed_acidity * chlorides -829.1409 537.0924 -1.5438 0.1227 -1882.0886 223.8067\n", "fixed_acidity * free_sulfur_dioxide -7.9049 9.5287 -0.8296 0.4068 -26.5856 10.7758\n", "fixed_acidity * total_sulfur_dioxide 5.7595 21.3304 0.2700 0.7872 -36.0579 47.5770\n", "fixed_acidity * density 375.0969 1075.4182 0.3488 0.7273 -1733.2162 2483.4100\n", "fixed_acidity * pH 114.3253 261.5978 0.4370 0.6621 -398.5264 627.1770\n", "fixed_acidity * sulphates -134.9056 151.9652 -0.8877 0.3747 -432.8272 163.0159\n", "fixed_acidity * alcohol -72.6985 26.4607 -2.7474 0.0060 -124.5735 -20.8234\n", "volatile_acidity^2 520.0041 1956.5000 0.2658 0.7904 -3315.6336 4355.6418\n", "volatile_acidity * citric_acid -325.9049 1598.8675 -0.2038 0.8385 -3460.4188 2808.6090\n", "volatile_acidity * residual_sugar -231.1519 129.0113 -1.7917 0.0732 -484.0732 21.7694\n", "volatile_acidity * chlorides 4345.4355 4193.6897 1.0362 0.3002 -3876.1206 12566.9917\n", "volatile_acidity * free_sulfur_dioxide 169.8998 63.4267 2.6787 0.0074 45.5543 294.2453\n", "volatile_acidity * total_sulfur_dioxide 542.4856 138.8860 3.9060 0.0001 270.2053 814.7658\n", "volatile_acidity * density -8778.6169 5436.2565 -1.6148 0.1064 -19436.1739 1878.9402\n", "volatile_acidity * pH 1809.6247 1561.8380 1.1587 0.2467 -1252.2945 4871.5438\n", "volatile_acidity * sulphates -2415.5578 1236.4205 -1.9537 0.0508 -4839.5094 8.3938\n", "volatile_acidity * alcohol 749.0360 191.8871 3.9035 0.0001 372.8492 1125.2228\n", "citric_acid^2 -566.9825 2058.9378 -0.2754 0.7830 -4603.4454 3469.4803\n", "citric_acid * residual_sugar -107.1104 167.6736 -0.6388 0.5230 -435.8275 221.6068\n", "citric_acid * chlorides 3906.7743 5806.8281 0.6728 0.5011 -7477.2732 15290.8218\n", "citric_acid * free_sulfur_dioxide 10.0617 81.2204 0.1239 0.9014 -149.1677 169.2910\n", "citric_acid * total_sulfur_dioxide -24.5923 177.8119 -0.1383 0.8900 -373.1851 324.0006\n", "citric_acid * density 743.3110 8266.7074 0.0899 0.9284 -15463.2288 16949.8507\n", "citric_acid * pH -2927.3589 2162.4656 -1.3537 0.1759 -7166.7837 1312.0660\n", "citric_acid * sulphates 1943.9211 1604.4419 1.2116 0.2257 -1201.5212 5089.3633\n", "citric_acid * alcohol 604.3677 250.6121 2.4116 0.0159 113.0530 1095.6823\n", "residual_sugar^2 1020.5273 1873.8778 0.5446 0.5860 -2653.1330 4694.1876\n", "residual_sugar * chlorides 1250.7078 760.6158 1.6443 0.1002 -240.4481 2741.8637\n", "residual_sugar * free_sulfur_dioxide -10.1449 4.8977 -2.0714 0.0384 -19.7466 -0.5432\n", "residual_sugar * total_sulfur_dioxide 2.2443 12.2455 0.1833 0.8546 -21.7624 26.2511\n", "residual_sugar * density 616.4495 725.3393 0.8499 0.3954 -805.5484 2038.4474\n", "residual_sugar * pH -134.1735 203.3448 -0.6598 0.5094 -532.8226 264.4757\n", "residual_sugar * sulphates -123.4359 167.3097 -0.7378 0.4607 -451.4397 204.5678\n", "residual_sugar * alcohol -4.4518 20.8884 -0.2131 0.8312 -45.4026 36.4990\n", "chlorides^2 4274.1441 6764.6180 0.6318 0.5275 -8987.6111 17535.8993\n", "chlorides * free_sulfur_dioxide -361.0029 283.6415 -1.2727 0.2032 -917.0705 195.0647\n", "chlorides * total_sulfur_dioxide 1132.4796 607.3146 1.8647 0.0623 -58.1356 2323.0948\n", "chlorides * density 32386.1105 22852.8882 1.4172 0.1565 -12416.0364 77188.2575\n", "chlorides * pH -9169.2287 6043.2328 -1.5173 0.1293 -21016.7378 2678.2804\n", "chlorides * sulphates -3935.3569 4533.8479 -0.8680 0.3854 -12823.7790 4953.0653\n", "chlorides * alcohol 1044.0159 730.5207 1.4291 0.1530 -388.1398 2476.1716\n", "free_sulfur_dioxide^2 1011.9144 1874.0030 0.5400 0.5892 -2661.9913 4685.8201\n", "free_sulfur_dioxide * total_sulfur_dioxide -31.4356 7.2561 -4.3323 0.0000 -45.6610 -17.2103\n", "free_sulfur_dioxide * density 527.6354 299.1980 1.7635 0.0779 -58.9301 1114.2008\n", "free_sulfur_dioxide * pH -35.3657 79.2158 -0.4464 0.6553 -190.6650 119.9336\n", "free_sulfur_dioxide * sulphates 77.6789 65.5638 1.1848 0.2362 -50.8562 206.2139\n", "free_sulfur_dioxide * alcohol -57.8306 9.2836 -6.2293 0.0000 -76.0308 -39.6304\n", "total_sulfur_dioxide^2 995.5783 1874.0603 0.5312 0.5953 -2678.4398 4669.5964\n", "total_sulfur_dioxide * density 28.6082 645.1442 0.0443 0.9646 -1236.1706 1293.3869\n", "total_sulfur_dioxide * pH 296.2682 172.0530 1.7220 0.0851 -41.0347 633.5711\n", "total_sulfur_dioxide * sulphates 296.9037 146.6463 2.0246 0.0430 9.4096 584.3978\n", "total_sulfur_dioxide * alcohol -196.1275 22.5097 -8.7130 0.0000 -240.2568 -151.9981\n", "density^2 -6153.6885 18362.8686 -0.3351 0.7376 -42153.3367 29845.9598\n", "density * pH 14642.0880 9841.6551 1.4878 0.1369 -4652.0717 33936.2477\n", "density * sulphates -1516.3079 5845.7221 -0.2594 0.7953 -12976.6055 9943.9897\n", "density * alcohol -2197.1628 981.3622 -2.2389 0.0252 -4121.0830 -273.2427\n", "pH^2 -2425.8554 1193.3038 -2.0329 0.0421 -4765.2784 -86.4324\n", "pH * sulphates 1641.2450 1566.4224 1.0478 0.2948 -1429.6615 4712.1516\n", "pH * alcohol 725.5301 253.2564 2.8648 0.0042 229.0313 1222.0290\n", "sulphates^2 -1657.2401 2016.9159 -0.8217 0.4113 -5611.3208 2296.8406\n", "sulphates * alcohol 398.5591 200.1289 1.9915 0.0465 6.2146 790.9036\n", "alcohol^2 898.1505 1871.4360 0.4799 0.6313 -2770.7228 4567.0237\n", "------------------------------------------------------------------------------------------------------\n", "Omnibus: 85.458 Durbin-Watson: 1.991 \n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 148.619 \n", "Skew: 0.132 Prob(JB): 0.000 \n", "Kurtosis: 3.814 Condition No.: 4493323340279420\n", "======================================================================================================\n", "* The condition number is large (4e+15). This might indicate strong multicollinearity or\n", "other numerical problems.\n", "\"\"\""]}, "execution_count": 12, "metadata": {}, "output_type": "execute_result"}], "source": ["results.summary2(xname=pft.columns)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["On ne garde que celles dont la [p-value](http://www.xavierdupre.fr/app/mlstatpy/helpsphinx/c_metric/pvalues.html) est inf\u00e9rieur \u00e0 0.05."]}, {"cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [{"data": {"text/plain": ["x2 3.511159e-06\n", "x6 9.131393e-07\n", "x7 1.460269e-03\n", "x11 5.715290e-25\n", "x13 1.123675e-02\n", "x22 6.029122e-03\n", "x27 7.416579e-03\n", "x28 9.513630e-05\n", "x32 9.610320e-05\n", "x41 1.592150e-02\n", "x44 3.837865e-02\n", "x58 1.505845e-05\n", "x62 5.085920e-10\n", "x66 4.296131e-02\n", "x67 4.014502e-18\n", "x71 2.520870e-02\n", "x72 4.211861e-02\n", "x74 4.190811e-03\n", "x76 4.648129e-02\n", "dtype: float64"]}, "execution_count": 13, "metadata": {}, "output_type": "execute_result"}], "source": ["pval = results.pvalues.copy()\n", "pval[pval <= 0.05]"]}, {"cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [{"data": {"text/plain": ["volatile_acidity 3.511159e-06\n", "free_sulfur_dioxide 9.131393e-07\n", "total_sulfur_dioxide 1.460269e-03\n", "alcohol 5.715290e-25\n", "fixed_acidity * volatile_acidity 1.123675e-02\n", "fixed_acidity * alcohol 6.029122e-03\n", "volatile_acidity * free_sulfur_dioxide 7.416579e-03\n", "volatile_acidity * total_sulfur_dioxide 9.513630e-05\n", "volatile_acidity * alcohol 9.610320e-05\n", "citric_acid * alcohol 1.592150e-02\n", "residual_sugar * free_sulfur_dioxide 3.837865e-02\n", "free_sulfur_dioxide * total_sulfur_dioxide 1.505845e-05\n", "free_sulfur_dioxide * alcohol 5.085920e-10\n", "total_sulfur_dioxide * sulphates 4.296131e-02\n", "total_sulfur_dioxide * alcohol 4.014502e-18\n", "density * alcohol 2.520870e-02\n", "pH^2 4.211861e-02\n", "pH * alcohol 4.190811e-03\n", "sulphates * alcohol 4.648129e-02\n", "dtype: float64"]}, "execution_count": 14, "metadata": {}, "output_type": "execute_result"}], "source": ["pval.index = pft.columns\n", "pval[pval <= 0.05]"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Le mod\u00e8le fonctionne mieux mais il est plus compliqu\u00e9 de savoir si la contribution de l'alcool est corr\u00e9l\u00e9e positivement avec la qualit\u00e9 car l'alcool appara\u00eet dans plus d'une variable."]}, {"cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0"}}, "nbformat": 4, "nbformat_minor": 2}