Examples

  1. DictVectorizer or CategoriesToIntegers
  2. Stacking de plusieurs learners dans un pipeline scikit-learn.
  3. Use two learners into a same pipeline

DictVectorizer or CategoriesToIntegers

Example which transforms text into integers:

<<<

import pandas
from mlinsights.mlmodel import CategoriesToIntegers
df = pandas.DataFrame([{"cat": "a"}, {"cat": "b"}])
trans = CategoriesToIntegers()
trans.fit(df)
newdf = trans.transform(df)
print(newdf)

>>>

       cat=a  cat=b
    0    1.0    NaN
    1    NaN    1.0

(original entry : categories_to_integers.py:docstring of mlinsights.mlmodel.categories_to_integers.CategoriesToIntegers, line 9)

Stacking de plusieurs learners dans un pipeline scikit-learn.

Ce transform assemble les résultats de plusieurs learners. Ces features servent d’entrée à un modèle de stacking.

<<<

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.pipeline import make_pipeline
from mlinsights.sklapi import SkBaseTransformStacking

data = load_iris()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y)

trans = SkBaseTransformStacking([LogisticRegression(),
                                 DecisionTreeClassifier()])
trans.fit(X_train, y_train)
pred = trans.transform(X_test)
print(pred[3:])

>>>

    [[1 1]
     [2 2]
     [1 1]
     [0 0]
     [2 2]
     [2 2]
     [2 2]
     [1 1]
     [0 0]
     [2 2]
     [0 0]
     [2 1]
     [1 1]
     [2 2]
     [2 2]
     [2 2]
     [2 2]
     [0 0]
     [2 2]
     [1 1]
     [1 1]
     [2 2]
     [0 0]
     [2 2]
     [1 2]
     [0 0]
     [1 1]
     [2 2]
     [2 1]
     [2 2]
     [2 2]
     [0 0]
     [0 0]
     [0 0]
     [1 1]]

(original entry : sklearn_base_transform_stacking.py:docstring of mlinsights.sklapi.sklearn_base_transform_stacking.SkBaseTransformStacking, line 4)

Use two learners into a same pipeline

It is impossible to use two learners into a pipeline unless we use a class such as @see cl SkBaseTransformLearner which disguise a learner into a transform.

<<<

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.pipeline import make_pipeline
from mlinsights.sklapi import SkBaseTransformLearner

data = load_iris()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y)

try:
    pipe = make_pipeline(LogisticRegression(),
                         DecisionTreeClassifier())
except Exception as e:
    print("ERREUR:")
    print(e)
    print('.')

pipe = make_pipeline(SkBaseTransformLearner(LogisticRegression()),
                     DecisionTreeClassifier())
pipe.fit(X_train, y_train)
pred = pipe.predict(X_test)
score = accuracy_score(y_test, pred)
print("pipeline avec deux learners :", score)

>>>

    ERREUR:
    All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough' 'LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                       intercept_scaling=1, l1_ratio=None, max_iter=100,
                       multi_class='warn', n_jobs=None, penalty='l2',
                       random_state=None, solver='warn', tol=0.0001, verbose=0,
                       warm_start=False)' (type <class 'sklearn.linear_model.logistic.LogisticRegression'>) doesn't
    .
    pipeline avec deux learners : 0.9473684210526315

(original entry : sklearn_base_transform_learner.py:docstring of mlinsights.sklapi.sklearn_base_transform_learner.SkBaseTransformLearner, line 7)