module mlmodel._kmeans_022#

Short summary#

module mlinsights.mlmodel._kmeans_022

Implements k-means with norms L1 and L2.

source on GitHub

Functions#

function

truncated documentation

_assign_labels_array

Compute label assignment and inertia for a dense array Return the inertia (sum of squared distances to the centers). …

_assign_labels_csr

Compute label assignment and inertia for a CSR input Return the inertia (sum of squared distances to the centers).

_centers_dense

M step of the K-means EM algorithm Computation of cluster centers / means.

_centers_sparse

M step of the K-means EM algorithm Computation of cluster centers / means.

_labels_inertia_precompute_dense

Computes labels and inertia using a full distance matrix. This will overwrite the ‘distances’ array in-place.

_labels_inertia_skl

E step of the K-means EM algorithm. Compute the labels and the inertia of the given samples and centers. This will …

Documentation#

Implements k-means with norms L1 and L2.

source on GitHub

mlinsights.mlmodel._kmeans_022._assign_labels_array(X, sample_weight, x_squared_norms, centers, labels, distances)#

Compute label assignment and inertia for a dense array Return the inertia (sum of squared distances to the centers).

source on GitHub

mlinsights.mlmodel._kmeans_022._assign_labels_csr(X, sample_weight, x_squared_norms, centers, labels, distances)#

Compute label assignment and inertia for a CSR input Return the inertia (sum of squared distances to the centers).

source on GitHub

mlinsights.mlmodel._kmeans_022._centers_dense(X, sample_weight, labels, n_clusters, distances)#

M step of the K-means EM algorithm Computation of cluster centers / means.

Parameters:
  • X – array-like, shape (n_samples, n_features)

  • sample_weight – array-like, shape (n_samples,) The weights for each observation in X.

  • labels – array of integers, shape (n_samples) Current label assignment

  • n_clusters – int Number of desired clusters

  • distances – array-like, shape (n_samples) Distance to closest cluster for each sample.

Returns:

centers : array, shape (n_clusters, n_features) The resulting centers

source on GitHub

mlinsights.mlmodel._kmeans_022._centers_sparse(X, sample_weight, labels, n_clusters, distances)#

M step of the K-means EM algorithm Computation of cluster centers / means.

Parameters:
  • X – scipy.sparse.csr_matrix, shape (n_samples, n_features)

  • sample_weight – array-like, shape (n_samples,) The weights for each observation in X.

  • labels – array of integers, shape (n_samples) Current label assignment

  • n_clusters – int Number of desired clusters

  • distances – array-like, shape (n_samples) Distance to closest cluster for each sample.

Returns:

centers, array, shape (n_clusters, n_features) The resulting centers

source on GitHub

mlinsights.mlmodel._kmeans_022._labels_inertia_precompute_dense(norm, X, sample_weight, centers, distances)#

Computes labels and inertia using a full distance matrix.

This will overwrite the ‘distances’ array in-place.

Parameters:
  • norm – ‘L1’ or ‘L2’

  • X – numpy array, shape (n_sample, n_features) Input data.

  • sample_weight – array-like, shape (n_samples,) The weights for each observation in X.

  • centers – numpy array, shape (n_clusters, n_features) Cluster centers which data is assigned to.

  • distances – numpy array, shape (n_samples,) Pre-allocated array in which distances are stored.

Returns:

labels : numpy array, dtype=numpy.int, shape (n_samples,) Indices of clusters that samples are assigned to.

Returns:

inertia : float Sum of squared distances of samples to their closest cluster center.

source on GitHub

mlinsights.mlmodel._kmeans_022._labels_inertia_skl(X, sample_weight, x_squared_norms, centers, distances=None)#

E step of the K-means EM algorithm. Compute the labels and the inertia of the given samples and centers. This will compute the distances in-place.

Parameters:
  • X – float64 array-like or CSR sparse matrix, shape (n_samples, n_features) The input samples to assign to the labels.

  • sample_weight – array-like, shape (n_samples,) The weights for each observation in X.

  • x_squared_norms – array, shape (n_samples,) Precomputed squared euclidean norm of each data point, to speed up computations.

  • centers – float array, shape (k, n_features) The cluster centers.

  • distances – float array, shape (n_samples,) Pre-allocated array to be filled in with each sample’s distance to the closest center.

Returns:

labels, int array of shape(n) The resulting assignment

Returns:

inertia, float Sum of squared distances of samples to their closest cluster center.

source on GitHub