module mlmodel.decision_tree_logreg

Inheritance diagram of mlinsights.mlmodel.decision_tree_logreg

Short summary

module mlinsights.mlmodel.decision_tree_logreg

Builds a tree of logistic regressions.

source on GitHub

Classes

class

truncated documentation

_DecisionTreeLogisticRegressionNode

Describes the tree structure hold by class DecisionTreeLogisticRegression. See also notebook Decision Tree and Logistic Regression. …

DecisionTreeLogisticRegression

Fits a logistic regression, then fits two other logistic regression for every observation on both sides of the border. …

Functions

function

truncated documentation

likelihood

Computes \sum_i y_i f(\theta (x_i - x_0)) + (1 - y_i) (1 - f(\theta (x_i - x_0))) where f(x_i) is \frac{1}{1 + e^{-x}}. …

logistic

Computes \frac{1}{1 + e^{-x}}.

Properties

property

truncated documentation

_repr_html_

HTML representation of estimator. This is redundant with the logic of _repr_mimebundle_. The latter should …

tree_depth_

Returns the maximum depth of the tree.

tree_depth_

Returns the maximum depth of the tree.

Methods

method

truncated documentation

__init__

constructor

__init__

constructor

decision_function

Calls decision_function.

decision_path

Returns the decision path.

decision_path

Returns the classification probabilities.

enumerate_leaves_index

Returns the leaves index.

fit

Builds the tree model.

fit

Fits a logistic regression, then splits the sample into positive and negative examples, finally tries to fit …

fit_improve

The method only works on a linear classifier, it changes the intercept in order to be within the constraints …

get_leaves_index

Returns the index of every leave.

predict

Runs the predictions.

predict

Predicts

predict_proba

Converts predictions into probabilities.

predict_proba

Returns the classification probabilities.

Documentation

Builds a tree of logistic regressions.

source on GitHub

class mlinsights.mlmodel.decision_tree_logreg.DecisionTreeLogisticRegression(estimator=None, max_depth=20, min_samples_split=2, min_samples_leaf=2, min_weight_fraction_leaf=0.0, fit_improve_algo='auto', p1p2=0.09, gamma=1.0, verbose=0)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

Fits a logistic regression, then fits two other logistic regression for every observation on both sides of the border. It goes one until a tree is built. It only handles a binary classification. The built tree cannot be deeper than the maximum recursion.

Parameters
  • estimator – binary classification estimator, if empty, use a logistic regression, the theoritical model defined with a logistic regression but it could any binary classifier

  • max_depth – int, default=None The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. It must be below the maximum allowed recursion by python.

  • min_samples_split

    int or float, default=2 The minimum number of samples required to split an internal node: - If int, then consider min_samples_split as the minimum number. - If float, then min_samples_split is a fraction and

    ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

  • min_samples_leaf

    int or float, default=1 The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression. - If int, then consider min_samples_leaf as the minimum number. - If float, then min_samples_leaf is a fraction and

    ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

  • min_weight_fraction_leaf – float, default=0.0 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

  • fit_improve_algo

    string, one of the following value: - ‘auto’: chooses the best option below, ‘none’ for

    every non linear model, ‘intercept_sort’ for linear models

    • none’: does not nothing once the binary classifier is fit

    • ’intercept_sort’: if one side of the classifier is too small, the method changes the best intercept possible verifying the constraints

    • ’intercept_sort_always’: always chooses the best intercept possible

  • p1p2 – threshold in [0, 1] for every split, we can define probabilities p_1 p_2 which define the ratio of samples in both splits, if p_1 p_2 is below the threshold, method fit_improve is called

  • gamma – weight before the coefficient p (1-p). When the model tries to improve the linear classifier, it looks a better intercept which maximizes the likelihood and verifies the constraints. In order to force the classifier to choose a value which splits the dataset into 2 almost equal folds, the function maximimes likelihood + \gamma p (1 - p) where p is the proportion of samples falling in the first fold.

  • verbose – prints out information about the training

Fitted attributes:

  • classes_: ndarray of shape (n_classes,) or list of ndarray

    The classes labels (single output problem), or a list of arrays of class labels (multi-output problem).

  • tree_: Tree

    The underlying Tree object.

source on GitHub

constructor

__init__(estimator=None, max_depth=20, min_samples_split=2, min_samples_leaf=2, min_weight_fraction_leaf=0.0, fit_improve_algo='auto', p1p2=0.09, gamma=1.0, verbose=0)[source]

constructor

_fit_improve_algo_values = (None, 'none', 'auto', 'intercept_sort', 'intercept_sort_always')
decision_function(X)[source]

Calls decision_function.

source on GitHub

decision_path(X, check_input=True)[source]

Returns the decision path.

Parameters
  • X – inputs

  • check_input – unused

Returns

sparse matrix

source on GitHub

fit(X, y, sample_weight=None)[source]

Builds the tree model.

Parameters
  • X – numpy array or sparse matrix of shape [n_samples,n_features] Training data

  • y – numpy array of shape [n_samples, n_targets] Target values. Will be cast to X’s dtype if necessary

  • sample_weight – numpy array of shape [n_samples] Individual weights for each sample

Returns

self : returns an instance of self.

Fitted attributes:

source on GitHub

get_leaves_index()[source]

Returns the index of every leave.

source on GitHub

predict(X)[source]

Runs the predictions.

source on GitHub

predict_proba(X)[source]

Converts predictions into probabilities.

source on GitHub

property tree_depth_

Returns the maximum depth of the tree.

source on GitHub

class mlinsights.mlmodel.decision_tree_logreg._DecisionTreeLogisticRegressionNode(estimator, threshold=0.5, depth=1, index=0)[source]

Bases: object

Describes the tree structure hold by class DecisionTreeLogisticRegression. See also notebook Decision Tree and Logistic Regression.

source on GitHub

constructor

Parameters

estimator – binary estimator

source on GitHub

__init__(estimator, threshold=0.5, depth=1, index=0)[source]

constructor

Parameters

estimator – binary estimator

source on GitHub

decision_path(X, mat, indices)[source]

Returns the classification probabilities.

Parameters
  • X – features

  • mat – decision path (allocated matrix)

source on GitHub

enumerate_leaves_index()[source]

Returns the leaves index.

source on GitHub

fit(X, y, sample_weight, dtlr, total_N)[source]

Fits a logistic regression, then splits the sample into positive and negative examples, finally tries to fit logistic regressions on both subsamples. This method only works on a linear classifier.

Parameters
  • X – features

  • y – binary labels

  • sample_weight – weights of every sample

  • dtlrDecisionTreeLogisticRegression

  • total_N – total number of observation

Returns

last index

source on GitHub

fit_improve(dtlr, total_N, X, y, sample_weight)[source]

The method only works on a linear classifier, it changes the intercept in order to be within the constraints imposed by the min_samples_leaf and min_weight_fraction_leaf. The algorithm has a significant cost as it sorts every observation and chooses the best intercept.

Parameters
Returns

probabilities

source on GitHub

predict(X)[source]

Predicts

source on GitHub

predict_proba(X)[source]

Returns the classification probabilities.

Parameters

X – features

Returns

probabilties

source on GitHub

property tree_depth_

Returns the maximum depth of the tree.

source on GitHub

mlinsights.mlmodel.decision_tree_logreg.likelihood(x, y, theta=1.0, th=0.0)[source]

Computes \sum_i y_i f(\theta (x_i - x_0)) + (1 - y_i) (1 - f(\theta (x_i - x_0))) where f(x_i) is \frac{1}{1 + e^{-x}}.

source on GitHub

mlinsights.mlmodel.decision_tree_logreg.logistic(x)[source]

Computes \frac{1}{1 + e^{-x}}.

source on GitHub