# Machine Learning Models#

## Helpers#

`mlinsights.mlmodel.model_featurizer`

(*model*, *params*)

Converts a machine learned model into a function which converts a vector into features produced by the model. It can be the output itself or intermediate results. The model can come from scikit-learn, keras or torch.

## Clustering#

`mlinsights.mlmodel.ConstraintKMeans`

(*self*, *n_clusters* = 8, *init* = ‘k-means++’, *n_init* = 10, *max_iter* = 500, *tol* = 0.0001, *verbose* = 0, *random_state* = None, *copy_x* = True, *algorithm* = ‘auto’, *balanced_predictions* = False, *strategy* = ‘gain’, *kmeans0* = True, *learning_rate* = 1.0, *history* = False)

Defines a constraint k-means. Clusters are modified to have an equal size. The algorithm is initialized with a regular KMeans and continues with a modified version of it.

Computing the predictions offer a choice. The first one is to keep the predictions from the regular k-means algorithm but with the balanced clusters. The second is to compute balanced predictions over the test set. That implies that the predictions for the same observations might change depending on the set it belongs to.

The parameter

strategydetermines how obseervations should be assigned to a cluster. The value can be:

`'distance'`

: observations are ranked by distance to a cluster, the algorithm assigns first point to the closest center unless it reached the maximum size, it deals first with the further point and maps it to the closest center

`'gain'`

: follows the algorithm described at

`'weights'`

: estimates weights attached to each cluster,it weights the distance to each cluster in order to balance the number of points mapped to every cluster, the strategy uses a learning rate.

The first two strategies cannot reach a good compromise without using function

`_switch_clusters`

which tries every switch between clusters: two points change clusters. It keeps the number of points and checks that the inertia is reduced.

`mlinsights.mlmodel.KMeansL1L2`

(*self*, *n_clusters* = 8, *init* = ‘k-means++’, *n_init* = 10, *max_iter* = 300, *tol* = 0.0001, *verbose* = 0, *random_state* = None, *copy_x* = True, *algorithm* = ‘full’, *norm* = ‘L2’)

K-Means clustering with either norm L1 or L2. See notebook KMeans with norm L1 for an example.

## Trainers#

`mlinsights.mlmodel.ClassifierAfterKMeans`

(*self*, *estimator* = None, *clus* = None, *kwargs*)

Applies a

k-means(see sklearn.cluster.KMeans) for each class, then adds the distance to each cluster as a feature for a classifier. See notebook LogisticRegression and Clustering.

`mlinsights.mlmodel.IntervalRegressor`

(*self*, *estimator* = None, *n_estimators* = 10, *n_jobs* = None, *alpha* = 1.0, *verbose* = False)

Trains multiple regressors to provide a confidence interval on prediction. It only works for single regression. Every training is made with a new sample of the training data, parameter

alphalet the user choose the size of this sample. A smalleralphaincreases the variance of the predictions. The current implementation draws sample by random but keeps the weight associated to each of them. Another way could be to draw a weighted sample but give them uniform weights.

`mlinsights.mlmodel.ApproximateNMFPredictor`

(*self*, *force_positive* = False, *kwargs*)

Converts sklearn.decomposition.NMF into a predictor so that the prediction does not involve training even for new observations. The class uses a sklearn.decomposition.TruncatedSVD of the components found by the sklearn.decomposition.NMF. The prediction projects the test data into the components vector space and retrieves them back into their original space. The issue is it does not necessarily produce results with only positive results as the sklearn.decomposition.NMF would do unless parameter

force_positiveis True.

`mlinsights.mlmodel.PiecewiseClassifier`

(*self*, *binner* = None, *estimator* = None, *n_jobs* = None, *random_state* = None, *verbose* = False)

Uses a decision tree to split the space of features into buckets and trains a logistic regression (default) on each of them. The second estimator is usually a sklearn.linear_model.LogisticRegression. It can also be sklearn.dummy.DummyClassifier to just get the average on each bucket.

The main issue with the

PiecewiseClassifieris that each piece requires one example of each class in each bucket which may not happen. To avoid that, the training will pick up random example from other bucket to ensure this case does not happen.

`mlinsights.mlmodel.PiecewiseRegressor`

(*self*, *binner* = None, *estimator* = None, *n_jobs* = None, *verbose* = False)

Uses a decision tree to split the space of features into buckets and trains a linear regression (default) on each of them. The second estimator is usually a sklearn.linear_model.LinearRegression. It can also be sklearn.dummy.DummyRegressor to just get the average on each bucket.

`mlinsights.mlmodel.PiecewiseTreeRegressor`

(*self*, *criterion* = ‘mselin’, *splitter* = ‘best’, *max_depth* = None, *min_samples_split* = 2, *min_samples_leaf* = 1, *min_weight_fraction_leaf* = 0.0, *max_features* = None, *random_state* = None, *max_leaf_nodes* = None, *min_impurity_decrease* = 0.0)

Implements a kind of piecewise linear regression by modifying the criterion used by the algorithm which builds a decision tree. See sklearn.tree.DecisionTreeRegressor to get the meaning of the parameters except criterion:

`mselin`

: optimizes for a piecewise linear regression

`simple`

: optimizes for a stepwise regression (equivalent tomse)

`mlinsights.mlmodel.QuantileMLPRegressor`

(*self*, *hidden_layer_sizes* = (100,), *activation* = ‘relu’, *solver* = ‘adam’, *alpha* = 0.0001, *batch_size* = ‘auto’, *learning_rate* = ‘constant’, *learning_rate_init* = 0.001, *power_t* = 0.5, *max_iter* = 200, *shuffle* = True, *random_state* = None, *tol* = 0.0001, *verbose* = False, *warm_start* = False, *momentum* = 0.9, *nesterovs_momentum* = True, *early_stopping* = False, *validation_fraction* = 0.1, *beta_1* = 0.9, *beta_2* = 0.999, *epsilon* = 1e-08, *n_iter_no_change* = 10, *kwargs*)

Quantile MLP Regression or neural networks regression trained with norm L1. This class inherits from sklearn.neural_networks.MLPRegressor. This model optimizes the absolute-loss using LBFGS or stochastic gradient descent. See

`CustomizedMultilayerPerceptron`

and`absolute_loss`

.

`mlinsights.mlmodel.QuantileLinearRegression`

(*self*, *fit_intercept* = True, *copy_X* = True, *n_jobs* = 1, *delta* = 0.0001, *max_iter* = 10, *quantile* = 0.5, *positive* = False, *verbose* = False)

Quantile Linear Regression or linear regression trained with norm L1. This class inherits from sklearn.linear_models.LinearRegression. See notebook Quantile Regression.

Norm L1 is chosen if

`quantile=0.5`

, otherwise, forquantile=, the following error is optimized…

`mlinsights.mlmodel.TransformedTargetClassifier2`

(*self*, *classifier* = None, *transformer* = None)

Meta-estimator to classify on a transformed target. Useful for applying permutation transformation in classification problems.

`mlinsights.mlmodel.TransformedTargetRegressor2`

(*self*, *regressor* = None, *transformer* = None)

Meta-estimator to regress on a transformed target. Useful for applying a non-linear transformation in regression problems.

## Transforms#

`mlinsights.mlmodel.CategoriesToIntegers`

(*self*, *columns* = None, *remove* = None, *skip_errors* = False, *single* = False)

Does something similar to what DictVectorizer does but in a transformer. The method

fitretains all categories, the methodtransformtransforms categories into integers. Categories are sorted by columns. If the methodtransformtries to convert a categories which was not seen by methodfit, it can raise an exception or ignore it and replace it by zero.

`mlinsights.mlmodel.ExtendedFeatures`

(*self*, *kind* = ‘poly’, *poly_degree* = 2, *poly_interaction_only* = False, *poly_include_bias* = True)

Generates extended features such as polynomial features.

`mlinsights.mlmodel.FunctionReciprocalTransformer`

(*self*, *fct*, *fct_inv* = None)

The transform is used to apply a function on a the target, predict, then transform the target back before scoring. The transforms implements a series of predefined functions…

`mlinsights.mlmodel.PermutationReciprocalTransformer`

(*self*, *random_state* = None, *closest* = False)

The transform is used to permute targets, predict, then permute the target back before scoring. nan values remain nan values. Once fitted, the transform has attribute

`permutation_`

which keeps track of the permutation to apply.

`mlinsights.mlmodel.PredictableTSNE`

(*self*, *normalizer* = None, *transformer* = None, *estimator* = None, *normalize* = True, *keep_tsne_outputs* = False)

t-SNE is an interesting transform which can only be used to study data as there is no way to reproduce the result once it was fitted. That’s why the class TSNE does not have any method

transform, only fit_transform. This example proposes a way to train a machine learned model which approximates the outputs of a TSNE transformer. Notebooks Predictable t-SNE gives an example on how to use this class.

`mlinsights.mlmodel.TransferTransformer`

(*self*, *estimator*, *method* = None, *copy_estimator* = True, *trainable* = False)

Wraps a predictor or a transformer in a transformer. This model is frozen: it cannot be trained and only computes the predictions.

`mlinsights.mlmodel.TraceableCountVectorizer`

(*self*, *input* = ‘content’, *encoding* = ‘utf-8’, *decode_error* = ‘strict’, *strip_accents* = None, *lowercase* = True, *preprocessor* = None, *tokenizer* = None, *stop_words* = None, *token_pattern* = ‘(?u)bww+b’, *ngram_range* = (1, 1), *analyzer* = ‘word’, *max_df* = 1.0, *min_df* = 1, *max_features* = None, *vocabulary* = None, *binary* = False, *dtype* = <class ‘numpy.int64’>)

Inherits from

`NGramsMixin`

which overloads method _word_ngrams to keep more information about n-grams but still produces the same outputs than CountVectorizer.

`mlinsights.mlmodel.TraceableTfidfVectorizer`

(*self*, *input* = ‘content’, *encoding* = ‘utf-8’, *decode_error* = ‘strict’, *strip_accents* = None, *lowercase* = True, *preprocessor* = None, *tokenizer* = None, *analyzer* = ‘word’, *stop_words* = None, *token_pattern* = ‘(?u)bww+b’, *ngram_range* = (1, 1), *max_df* = 1.0, *min_df* = 1, *max_features* = None, *vocabulary* = None, *binary* = False, *dtype* = <class ‘numpy.float64’>, *norm* = ‘l2’, *use_idf* = True, *smooth_idf* = True, *sublinear_tf* = False)

Inherits from

`NGramsMixin`

which overloads method _word_ngrams to keep more information about n-grams but still produces the same outputs than TfidfVectorizer.

## Exploration#

The following implementation play with scikit-learn API, it overwrites the code handling parameters.

`mlinsights.sklapi.SkBaseTransformLearner`

(*self*, *model* = None, *method* = None, *kwargs*)

A

transformwhich hides alearner, it converts methodpredictintotransform. This way, two learners can be inserted into the same pipeline. There is another a,d shorter implementation with class @see class TransferTransformer.

`mlinsights.sklapi.SkBaseTransformStacking`

(*self*, *models* = None, *method* = None, *kwargs*)

Un

transformqui cache plusieurslearners, arrangés selon la méthode du stacking.

## Exploration in C#

The following classes require scikit-learn *>= 0.21*,
otherwise, they do not get compiled.

`mlinsights.mlmodel.piecewise_tree_regression_criterion.SimpleRegressorCriterion`

(*self*, *args*, *kwargs*)

Implements mean square error criterion in a non efficient way. The code was inspired from hellinger_distance_criterion.pyx, Cython example of exposing C-computed arrays in Python without data copies, _criterion.pyx. This implementation is not efficient but was made that way on purpose. It adds the features to the class.

A similar design but a much faster implementation close to what scikit-learn implements.

`mlinsights.mlmodel.piecewise_tree_regression_criterion_fast.SimpleRegressorCriterionFast`

(*self*, *args*, *kwargs*)

Criterion which computes the mean square error assuming points falling into one node are approximated by a constant. The implementation follows the same design used in

`SimpleRegressorCriterion`

. This implementation is faster as it computes cumulated sums and avoids loops to compute intermediate gains.

The next one implements a criterion which optimizes the mean square error assuming the points falling into one node of the tree are approximated by a line. The mean square error is the error made with a linear regressor and not a constant anymore.

`mlinsights.mlmodel.piecewise_tree_regression_criterion_linear.LinearRegressorCriterion`

(*self*, *args*, *kwargs*)

Criterion which computes the mean square error assuming points falling into one node are approximated by a line (linear regression). The implementation follows the same design used in

`SimpleRegressorCriterion`

and is even slow as the criterion is more complex to compute.