Examples

  1. DictVectorizer or CategoriesToIntegers

  2. Stacking de plusieurs learners dans un pipeline scikit-learn.

  3. Use two learners into a same pipeline

DictVectorizer or CategoriesToIntegers

Example which transforms text into integers:

<<<

import pandas
from mlinsights.mlmodel import CategoriesToIntegers
df = pandas.DataFrame([{"cat": "a"}, {"cat": "b"}])
trans = CategoriesToIntegers()
trans.fit(df)
newdf = trans.transform(df)
print(newdf)

>>>

       cat=a  cat=b
    0    1.0    NaN
    1    NaN    1.0

(original entry : categories_to_integers.py:docstring of mlinsights.mlmodel.categories_to_integers.CategoriesToIntegers, line 9)

Stacking de plusieurs learners dans un pipeline scikit-learn.

Ce transform assemble les résultats de plusieurs learners. Ces features servent d’entrée à un modèle de stacking.

<<<

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.pipeline import make_pipeline
from mlinsights.sklapi import SkBaseTransformStacking

data = load_iris()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y)

trans = SkBaseTransformStacking([LogisticRegression(),
                                 DecisionTreeClassifier()])
trans.fit(X_train, y_train)
pred = trans.transform(X_test)
print(pred[3:])

>>>

    [[0 0]
     [2 2]
     [2 2]
     [2 2]
     [1 1]
     [0 0]
     [1 1]
     [1 1]
     [0 0]
     [1 1]
     [2 2]
     [0 0]
     [1 1]
     [0 0]
     [0 0]
     [1 1]
     [2 2]
     [0 0]
     [0 0]
     [1 1]
     [1 1]
     [0 0]
     [1 1]
     [2 2]
     [1 1]
     [1 1]
     [0 0]
     [1 1]
     [1 1]
     [0 0]
     [1 1]
     [0 0]
     [2 2]
     [0 0]
     [1 1]]

(original entry : sklearn_base_transform_stacking.py:docstring of mlinsights.sklapi.sklearn_base_transform_stacking.SkBaseTransformStacking, line 4)

Use two learners into a same pipeline

It is impossible to use two learners into a pipeline unless we use a class such as @see cl SkBaseTransformLearner which disguise a learner into a transform.

<<<

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.pipeline import make_pipeline
from mlinsights.sklapi import SkBaseTransformLearner

data = load_iris()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y)

try:
    pipe = make_pipeline(LogisticRegression(),
                         DecisionTreeClassifier())
except Exception as e:
    print("ERROR:")
    print(e)
    print('.')

pipe = make_pipeline(SkBaseTransformLearner(LogisticRegression()),
                     DecisionTreeClassifier())
pipe.fit(X_train, y_train)
pred = pipe.predict(X_test)
score = accuracy_score(y_test, pred)
print("pipeline avec deux learners :", score)

>>>

    ERROR:
    All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough' 'LogisticRegression()' (type <class 'sklearn.linear_model._logistic.LogisticRegression'>) doesn't
    .
    /usr/local/lib/python3.7/site-packages/sklearn/linear_model/_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):
    STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
    
    Increase the number of iterations (max_iter) or scale the data as shown in:
        https://scikit-learn.org/stable/modules/preprocessing.html
    Please also refer to the documentation for alternative solver options:
        https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
      extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)
    pipeline avec deux learners : 0.9473684210526315

(original entry : sklearn_base_transform_learner.py:docstring of mlinsights.sklapi.sklearn_base_transform_learner.SkBaseTransformLearner, line 7)