Un exemple pour montrer l'équivalence entre l'ACP et une factorisation de matrice.
from jyquickhelper import add_notebook_menu
add_notebook_menu()
%matplotlib inline
def erreur_mf(M, W, H):
d = M - W @ H
a = d.ravel()
e = a @ a.T
e ** 0.5 / (M.shape[0] * M.shape[1])
return e
On crée un nuage de points avec que des coordonnées positives pour satisfaire les hypothèses de la factorisation de matrices.
from numpy.random import rand
M = rand(2, 20)
M[1,:] += 3 * M[0,:]
M
array([[ 0.81960047, 0.63887134, 0.74019269, 0.96110175, 0.0685406 , 0.11103301, 0.06033529, 0.67913157, 0.10460611, 0.98860048, 0.50497448, 0.26893866, 0.73143267, 0.32617974, 0.1332449 , 0.83328515, 0.3775355 , 0.69163261, 0.53095348, 0.15601268], [ 2.48031078, 2.2279066 , 2.85929872, 3.27833973, 0.27323095, 0.53806662, 0.48019992, 2.09428487, 0.40521666, 3.94539474, 2.36639105, 1.66857684, 3.14027534, 1.94032092, 1.22602705, 3.09679803, 1.696636 , 2.69144798, 1.84350664, 1.16862532]])
from sklearn.decomposition import NMF
mf = NMF(1)
W = mf.fit_transform(M)
H = mf.components_
erreur_mf(M, W, H)
0.19729615330190822
wh = W @ H
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
ax.plot(M[0,:], M[1,:], "ob")
ax.plot(wh[0,:], wh[1,:], "or")
ax.set_xlim([0,1])
ax.set_ylim([0,4])
(0, 4)
from sklearn.decomposition import PCA
pca = PCA(n_components=1)
pca.fit(M.T)
PCA(copy=True, iterated_power='auto', n_components=1, random_state=None, svd_solver='auto', tol=0.0, whiten=False)
projected_points = pca.inverse_transform(pca.transform(M.T))
pj = projected_points.T
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
ax.plot(M[0,:], M[1,:], "ob")
ax.plot(wh[0,:], wh[1,:], "or")
ax.plot(pj[0,:], pj[1,:], "og")
ax.set_xlim([0,1])
ax.set_ylim([0,4])
(0, 4)
Les résultats ne sont pas exactement identiques car l'ACP centre le nuage de points par défaut. On utilise celui de statsmodels pour éviter cela.
from statsmodels.multivariate.pca import PCA
pca = PCA(M.T, ncomp=1, standardize=False, demean=False, normalize=False)
pca
Principal Component Analysis(nobs: 20, nvar: 2, transformation: None, normalization: False, number of components: 1, SVD, id: 0x1c01a2861d0)
pj2 = pca.projection.T
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
ax.plot(M[0,:], M[1,:], "ob")
#ax.plot(wh[0,:], wh[1,:], "or")
ax.plot(pj[0,:], pj[1,:], "og")
ax.plot(pj2[0,:], pj2[1,:], "oy")
ax.set_xlim([0,1])
ax.set_ylim([0,4])
(0, 4)
On retrouve exactement les mêmes résultats.