module `mlmodel.piecewise_estimator`#

Short summary#

module mlinsights.mlmodel.piecewise_estimator

Implements a piecewise linear regression.

Classes#

class	truncated documentation
`PiecewiseClassifier`	Uses a decision tree to split the space of features into buckets and trains a logistic regression (default) …
`PiecewiseEstimator`	Uses a decision tree to split the space of features into buckets and trains a linear regression on each of them. …
`PiecewiseRegressor`	Uses a decision tree to split the space of features into buckets and trains a linear regression (default) on …

Functions#

function	truncated documentation
`_decision_function_piecewise_estimator`
`_fit_piecewise_estimator`
`_predict_piecewise_estimator`
`_predict_proba_piecewise_estimator`

Properties#

property	truncated documentation
`_repr_html_`	HTML representation of estimator. This is redundant with the logic of _repr_mimebundle_. The latter should …
`_repr_html_`	HTML representation of estimator. This is redundant with the logic of _repr_mimebundle_. The latter should …
`_repr_html_`	HTML representation of estimator. This is redundant with the logic of _repr_mimebundle_. The latter should …
`n_estimators_`	Returns the number of estimators = the number of buckets the data was split in.
`n_estimators_`	Returns the number of estimators = the number of buckets the data was split in.
`n_estimators_`	Returns the number of estimators = the number of buckets the data was split in.

Methods#

method	truncated documentation
`__init__`
`__init__`
`__init__`
`_apply_predict_method`	Generic predict method, works for predict_proba and decision_function as well.
`_apply_predict_method`	Generic predict method, works for predict_proba and decision_function as well.
`_apply_predict_method`	Generic predict method, works for predict_proba and decision_function as well.
`_mapping_train`
`_mapping_train`
`_mapping_train`
`decision_function`	Computes the predictions probabilities.
`fit`	Trains the binner and an estimator on every bucket.
`fit`	Trains the binner and an estimator on every bucket.
`fit`	Trains the binner and an estimator on every bucket.
`predict`	Computes the predictions.
`predict`	Computes the predictions.
`predict_proba`	Computes the predictions probabilities.
`transform_bins`	Maps every row to a tree in self.estimators_.
`transform_bins`	Maps every row to a tree in self.estimators_.
`transform_bins`	Maps every row to a tree in self.estimators_.

Documentation#

Implements a piecewise linear regression.

source on GitHub

class mlinsights.mlmodel.piecewise_estimator.PiecewiseClassifier(binner=None, estimator=None, n_jobs=None, random_state=None, verbose=False)#

Bases: PiecewiseEstimator, ClassifierMixin

Uses a decision tree to split the space of features into buckets and trains a logistic regression (default) on each of them. The second estimator is usually a sklearn.linear_model.LogisticRegression. It can also be sklearn.dummy.DummyClassifier to just get the average on each bucket.

The main issue with the PiecewiseClassifier is that each piece requires one example of each class in each bucket which may not happen. To avoid that, the training will pick up random example from other bucket to ensure this case does not happen.

source on GitHub

Parameters:

binner – transformer or predictor which creates the buckets
estimator – predictor trained on every bucket
n_jobs – number of parallel jobs (for training and predicting)
random_state – to pick up random examples when buckets do not contain enough examples of each class
verbose – boolean or use 'tqdm' to use tqdm to fit the estimators

binner allows the following values:

tree: the model is sklearn.tree.DecisionTreeClassifier
'bins': the model sklearn.preprocessing.KBinsDiscretizer
any instanciated model

estimator allows the following values:

None: the model is sklearn.linear_model.LogisticRegression
any instanciated model

source on GitHub

__init__(binner=None, estimator=None, n_jobs=None, random_state=None, verbose=False)#

Parameters:

binner – transformer or predictor which creates the buckets
estimator – predictor trained on every bucket
n_jobs – number of parallel jobs (for training and predicting)
random_state – to pick up random examples when buckets do not contain enough examples of each class
verbose – boolean or use 'tqdm' to use tqdm to fit the estimators

binner allows the following values:

tree: the model is sklearn.tree.DecisionTreeClassifier
'bins': the model sklearn.preprocessing.KBinsDiscretizer
any instanciated model

estimator allows the following values:

None: the model is sklearn.linear_model.LogisticRegression
any instanciated model

source on GitHub

decision_function(X)#

Computes the predictions probabilities.

Parameters:: X – features, X is converted into an array if X is a dataframe
Returns:: predictions probabilities

source on GitHub

predict(X)#

Computes the predictions.

Parameters:: X – features, X is converted into an array if X is a dataframe
Returns:: predictions

source on GitHub

predict_proba(X)#

Computes the predictions probabilities.

Parameters:: X – features, X is converted into an array if X is a dataframe
Returns:: predictions probabilities

source on GitHub

class mlinsights.mlmodel.piecewise_estimator.PiecewiseEstimator(binner=None, estimator=None, n_jobs=None, verbose=False)#

Bases: BaseEstimator

Uses a decision tree to split the space of features into buckets and trains a linear regression on each of them. The second estimator can be a sklearn.linear_model.LinearRegression for a regression or sklearn.linear_model.LogisticRegression for a classifier. It can also be sklearn.dummy.DummyRegressor sklearn.dummy.DummyClassifier to just get the average on each bucket. When the buckets are defined by a decision tree and the estimator is linear, PiecewiseTreeRegressor optimizes the buckets based on the results of a linear regression. The accuracy is usually better.

source on GitHub

Parameters:

binner – transformer or predictor which creates the buckets
estimator – predictor trained on every bucket
n_jobs – number of parallel jobs (for training and predicting)
verbose – boolean or use 'tqdm' to use tqdm to fit the estimators

binner must be filled or must be:

'bins': the model sklearn.preprocessing.KBinsDiscretizer
any instanciated model

estimator allows the following values:

None: the model is sklearn.linear_model.LinearRegression
any instanciated model

source on GitHub

__init__(binner=None, estimator=None, n_jobs=None, verbose=False)#

Parameters:

binner – transformer or predictor which creates the buckets
estimator – predictor trained on every bucket
n_jobs – number of parallel jobs (for training and predicting)
verbose – boolean or use 'tqdm' to use tqdm to fit the estimators

binner must be filled or must be:

'bins': the model sklearn.preprocessing.KBinsDiscretizer
any instanciated model

estimator allows the following values:

None: the model is sklearn.linear_model.LinearRegression
any instanciated model

source on GitHub

_apply_predict_method(X, method, parallelized, dimout)#

Generic predict method, works for predict_proba and decision_function as well.

source on GitHub

_mapping_train(X, binner)#

fit(X, y, sample_weight=None)#

Trains the binner and an estimator on every bucket.

Parameters:

X – features, X is converted into an array if X is a dataframe
y – target
sample_weight – sample weights

Returns:

self: returns an instance of self.

Fitted attributes:

binner_: binner
estimators_: dictionary of estimators, each of them
mapped to a leave to the tree
mean_estimator_: estimator trained on the whole
datasets in case the binner can find a bucket for a new observation
dim_: dimension of the output
mean_: average targets

source on GitHub

property n_estimators_#

Returns the number of estimators = the number of buckets the data was split in.

source on GitHub

transform_bins(X)#

Maps every row to a tree in self.estimators_.

source on GitHub

class mlinsights.mlmodel.piecewise_estimator.PiecewiseRegressor(binner=None, estimator=None, n_jobs=None, verbose=False)#

Bases: PiecewiseEstimator, RegressorMixin

Uses a decision tree to split the space of features into buckets and trains a linear regression (default) on each of them. The second estimator is usually a sklearn.linear_model.LinearRegression. It can also be sklearn.dummy.DummyRegressor to just get the average on each bucket.

source on GitHub

Parameters: