module mltricks.kmeans_constraint

Inheritance diagram of papierstat.mltricks.kmeans_constraint

Short summary

module papierstat.mltricks.kmeans_constraint

Implémente la classe ConstraintKMeans.

source on GitHub

Classes

class truncated documentation
ConstraintKMeans Defines a constraint k-means. Clusters are modified to have an equal size. The algorithm is initialized …

Methods

method truncated documentation
__init__  
constraint_kmeans Completes the constraint k-means.
fit Compute k-means clustering. Parameters ———- X : array-like or sparse matrix, shape=(n_samples, …
predict Computes the predictions.
score Returns the distances to all clusters.
transform Computes the predictions.

Documentation

Implémente la classe ConstraintKMeans.

source on GitHub

class papierstat.mltricks.kmeans_constraint.ConstraintKMeans(n_clusters=8, init='k-means++', n_init=10, max_iter=300, tol=0.0001, precompute_distances='auto', verbose=0, random_state=None, copy_x=True, n_jobs=1, algorithm='auto', balanced_predictions=False, strategy='gain', kmeans0=True)[source]

Bases : sklearn.cluster.k_means_.KMeans

Defines a constraint k-means. Clusters are modified to have an equal size. The algorithm is initialized with a regular k-means and continues with a modified version of it.

Computing the predictions offer a choice. The first one is to keep the predictions from the regular k-means algorithm but with the balanced clusters. The second is to compute balanced predictions over the test set. That implies that the predictions for the same observations might change depending on the set it belongs to.

source on GitHub

Paramètres:
  • n_clusters – number of clusters
  • init – used by k-means
  • n_init – used by k-means
  • max_iter – used by k-means
  • tol – used by k-means
  • precompute_distances – used by k-means
  • verbose – used by k-means
  • random_state – used by k-means
  • copy_x – used by k-means
  • n_jobs – used by k-means
  • algorithm – used by k-means
  • balanced_predictions – produced balanced prediction or the regular ones
  • strategy – strategy or algorithm used to abide by the constraint
  • kmeans0 – if True, applies k-means algorithm first

The parameter strategy determines how obseervations should be assigned to a cluster. The value can be:

  • 'distance': observations are ranked by distance to a cluster, the algorithm assigns first point to the closest center unless it reached the maximulmsize
  • 'gain': follows the algorithm described at
    see Same-size k-Means Variation

source on GitHub

__init__(n_clusters=8, init='k-means++', n_init=10, max_iter=300, tol=0.0001, precompute_distances='auto', verbose=0, random_state=None, copy_x=True, n_jobs=1, algorithm='auto', balanced_predictions=False, strategy='gain', kmeans0=True)[source]
Paramètres:
  • n_clusters – number of clusters
  • init – used by k-means
  • n_init – used by k-means
  • max_iter – used by k-means
  • tol – used by k-means
  • precompute_distances – used by k-means
  • verbose – used by k-means
  • random_state – used by k-means
  • copy_x – used by k-means
  • n_jobs – used by k-means
  • algorithm – used by k-means
  • balanced_predictions – produced balanced prediction or the regular ones
  • strategy – strategy or algorithm used to abide by the constraint
  • kmeans0 – if True, applies k-means algorithm first

The parameter strategy determines how obseervations should be assigned to a cluster. The value can be:

  • 'distance': observations are ranked by distance to a cluster, the algorithm assigns first point to the closest center unless it reached the maximulmsize
  • 'gain': follows the algorithm described at
    see Same-size k-Means Variation

source on GitHub

_strategy_value = {'distance', 'gain'}
constraint_kmeans(X, sample_weight=None, state=None, fLOG=None)[source]

Completes the constraint k-means.

Paramètres:
  • X – features
  • sample_weight – sample weight
  • state – state
  • fLOG – logging function

source on GitHub

fit(X, y=None, sample_weight=None, fLOG=None)[source]

Compute k-means clustering.

Paramètres:
  • X (array-like or sparse matrix, shape=(n_samples, n_features)) – Training instances to cluster. It must be noted that the data will be converted to C ordering, which will cause a memory copy if the given data is not C-contiguous.
  • sample_weight (sample weight) –
  • y (Ignored) –
  • fLOG (logging function) –

source on GitHub

predict(X, sample_weight=None)[source]

Computes the predictions.

Paramètres:X – features.
Renvoie:prediction

source on GitHub

score(X, y=None, sample_weight=None)[source]

Returns the distances to all clusters.

Paramètres:
  • X – features
  • y – unused
  • sample_weight – sample weight
Renvoie:

distances

source on GitHub

transform(X)[source]

Computes the predictions.

Paramètres:X – features.
Renvoie:prediction

source on GitHub