module mlmodel._kmeans_022

Short summary

module mlinsights.mlmodel._kmeans_022

Implements k-means with norms L1 and L2.

source on GitHub

Functions

function

truncated documentation

_assign_labels_array

Compute label assignment and inertia for a dense array Return the inertia (sum of squared distances to the centers). …

_assign_labels_csr

Compute label assignment and inertia for a CSR input Return the inertia (sum of squared distances to the centers).

_centers_dense

M step of the K-means EM algorithm Computation of cluster centers / means.

_centers_sparse

M step of the K-means EM algorithm Computation of cluster centers / means.

_labels_inertia_precompute_dense

Computes labels and inertia using a full distance matrix. This will overwrite the ‘distances’ array in-place.

_labels_inertia_skl

E step of the K-means EM algorithm. Compute the labels and the inertia of the given samples and centers. This will …

Documentation

Implements k-means with norms L1 and L2.

source on GitHub

mlinsights.mlmodel._kmeans_022._assign_labels_array(X, sample_weight, x_squared_norms, centers, labels, distances)[source]

Compute label assignment and inertia for a dense array Return the inertia (sum of squared distances to the centers).

source on GitHub

mlinsights.mlmodel._kmeans_022._assign_labels_csr(X, sample_weight, x_squared_norms, centers, labels, distances)[source]

Compute label assignment and inertia for a CSR input Return the inertia (sum of squared distances to the centers).

source on GitHub

mlinsights.mlmodel._kmeans_022._centers_dense(X, sample_weight, labels, n_clusters, distances)[source]

M step of the K-means EM algorithm Computation of cluster centers / means.

Parameters
  • X – array-like, shape (n_samples, n_features)

  • sample_weight – array-like, shape (n_samples,) The weights for each observation in X.

  • labels – array of integers, shape (n_samples) Current label assignment

  • n_clusters – int Number of desired clusters

  • distances – array-like, shape (n_samples) Distance to closest cluster for each sample.

Returns

centers : array, shape (n_clusters, n_features) The resulting centers

source on GitHub

mlinsights.mlmodel._kmeans_022._centers_sparse(X, sample_weight, labels, n_clusters, distances)[source]

M step of the K-means EM algorithm Computation of cluster centers / means.

Parameters
  • X – scipy.sparse.csr_matrix, shape (n_samples, n_features)

  • sample_weight – array-like, shape (n_samples,) The weights for each observation in X.

  • labels – array of integers, shape (n_samples) Current label assignment

  • n_clusters – int Number of desired clusters

  • distances – array-like, shape (n_samples) Distance to closest cluster for each sample.

Returns

centers, array, shape (n_clusters, n_features) The resulting centers

source on GitHub

mlinsights.mlmodel._kmeans_022._labels_inertia_precompute_dense(norm, X, sample_weight, centers, distances)[source]

Computes labels and inertia using a full distance matrix.

This will overwrite the ‘distances’ array in-place.

Parameters
  • norm – ‘l1’ or ‘l2’

  • X – numpy array, shape (n_sample, n_features) Input data.

  • sample_weight – array-like, shape (n_samples,) The weights for each observation in X.

  • centers – numpy array, shape (n_clusters, n_features) Cluster centers which data is assigned to.

  • distances – numpy array, shape (n_samples,) Pre-allocated array in which distances are stored.

Returns

labels : numpy array, dtype=numpy.int, shape (n_samples,) Indices of clusters that samples are assigned to.

Returns

inertia : float Sum of squared distances of samples to their closest cluster center.

source on GitHub

mlinsights.mlmodel._kmeans_022._labels_inertia_skl(X, sample_weight, x_squared_norms, centers, precompute_distances=True, distances=None)[source]

E step of the K-means EM algorithm. Compute the labels and the inertia of the given samples and centers. This will compute the distances in-place.

Parameters
  • X – float64 array-like or CSR sparse matrix, shape (n_samples, n_features) The input samples to assign to the labels.

  • sample_weight – array-like, shape (n_samples,) The weights for each observation in X.

  • x_squared_norms – array, shape (n_samples,) Precomputed squared euclidean norm of each data point, to speed up computations.

  • centers – float array, shape (k, n_features) The cluster centers.

  • precompute_distances – boolean, default: True Precompute distances (faster but takes more memory).

  • distances – float array, shape (n_samples,) Pre-allocated array to be filled in with each sample’s distance to the closest center.

Returns

labels, int array of shape(n) The resulting assignment

Returns

inertia, float Sum of squared distances of samples to their closest cluster center.

source on GitHub