2A.ml - Séries temporelles - correction

Links: notebook, html, PDF, python, slides, GitHub

Prédictions sur des séries temporelles.

from jyquickhelper import add_notebook_menu
add_notebook_menu()
%matplotlib inline

Une série temporelles

On récupère le nombre de sessions d’un site web.

import pandas
data = pandas.read_csv("xavierdupre_sessions.csv", sep="\t")
data.set_index("Date", inplace=True)
data.plot(figsize=(12,4))
<matplotlib.axes._subplots.AxesSubplot at 0x2bb8d742128>
../_images/td2a_timeseries_correction_4_1.png
data[-365:].plot(figsize=(12,4))
<matplotlib.axes._subplots.AxesSubplot at 0x2bb8dba3208>
../_images/td2a_timeseries_correction_5_1.png

Enlever la saisonnalité sans la connaître

Avec fit_seasons.

from seasonal import fit_seasons
cv_seasons, trend = fit_seasons(data["Sessions"])
print(cv_seasons)
# data["cs_seasons"] = cv_seasons
data["trendcs"] = trend
data[-365:].plot(y=["Sessions", "trendcs", "trendsea"], figsize=(14,4))
[ 26.66213008  16.33420353 -86.59519495 -73.57497492  33.23110565
  52.87820674  30.87516435]
c:python36_x64libsite-packagespandasplotting_core.py:1714: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
  series.name = label
<matplotlib.axes._subplots.AxesSubplot at 0x2bb9914d390>
../_images/td2a_timeseries_correction_18_3.png

Autocorrélograme

On s’inspire de l’exemple : Autoregressive Moving Average (ARMA): Sunspots data.

import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = plot_acf(data["Sessions"], lags=40, ax=ax1)
ax2 = fig.add_subplot(212)
fig = plot_pacf(data["Sessions"], lags=40, ax=ax2)
../_images/td2a_timeseries_correction_20_0.png

On retrouve bien une période de 7.

from statsmodels.tsa.stattools import periodogram
p = periodogram(data["Sessions"])
p
array([        0.        ,  36864353.20831527,   7870812.5268101 , ...,
         4211956.47283695,   7870812.52681012,  36864353.20831531])
plt.plot(p[:25])
[<matplotlib.lines.Line2D at 0x2bb9913deb8>]
../_images/td2a_timeseries_correction_23_1.png