module grammar_sklearn.g_sklearn_main

Short summary

module mlprodict.grammar_sklearn.g_sklearn_main

Main functions to convert machine learned model from scikit-learn model.

source on GitHub

Functions

function

truncated documentation

sklearn2graph

Converts any kind of scikit-learn model into a grammar model.

Documentation

Main functions to convert machine learned model from scikit-learn model.

source on GitHub

mlprodict.grammar_sklearn.g_sklearn_main.sklearn2graph(model, output_names=None, **kwargs)

Converts any kind of scikit-learn model into a grammar model.

Parameters
  • model – scikit-learn model

  • output_names – names of the outputs

  • kwargs – additional parameters, sent to the converter

Returns

converter to grammar model

Short list of additional parameters: - with_loop: the pseudo code includes loops,

this option is not available everywhere.

If output_names is None, default values will be given to the inputs and outputs. One example on how to use this function. A scikit-learn model is trained and converted into a graph which implements the prediction function with the grammar language.

<<<

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data[:, :2]
y = iris.target
y[y == 2] = 1
lr = LogisticRegression()
lr.fit(X, y)

# grammar is the expected scoring model.
from mlprodict.grammar_sklearn import sklearn2graph
gr = sklearn2graph(lr, output_names=['Prediction', 'Score'])

# We can even check what the function should produce as a score.
# Types are strict.
import numpy
X = numpy.array([[numpy.float32(1), numpy.float32(2)]])
e2 = gr.execute(Features=X[0, :])
print(e2)

# We display the result in JSON.
ser = gr.export(lang='json', hook={'array': lambda v: v.tolist(),
                                   'float32': lambda v: float(v)})
import json
print(json.dumps(ser, sort_keys=True, indent=2))

>>>

    [  0.    -11.264]
    {
      "action": {
        "name": "LogisticRegression",
        "variants": [
          {
            "action": {
              "name": "return",
              "variants": [
                {
                  "action": {
                    "name": "concat",
                    "variants": [
                      {
                        "action": {
                          "name": "sign",
                          "variants": [
                            {
                              "action": {
                                "name": "+",
                                "variants": [
                                  {
                                    "action": {
                                      "name": "adot",
                                      "variants": [
                                        {
                                          "cache": false,
                                          "comment": null,
                                          "name": "cst",
                                          "value": [
                                            3.3882975578308105,
                                            -3.164527654647827
                                          ]
                                        },
                                        {
                                          "cache": false,
                                          "name": "var",
                                          "value": "Features"
                                        }
                                      ]
                                    },
                                    "cache": false,
                                    "input": [
                                      "float32:(2,)",
                                      "float32:(2,)"
                                    ],
                                    "output": "float32"
                                  },
                                  {
                                    "cache": false,
                                    "comment": null,
                                    "name": "cst",
                                    "value": -8.323304176330566
                                  }
                                ]
                              },
                              "cache": false,
                              "input": [
                                "float32",
                                "float32"
                              ],
                              "output": "float32"
                            }
                          ]
                        },
                        "cache": false,
                        "input": [
                          "float32"
                        ],
                        "output": "float32"
                      },
                      {
                        "action": {
                          "name": "+",
                          "variants": [
                            {
                              "action": {
                                "name": "adot",
                                "variants": [
                                  {
                                    "cache": false,
                                    "comment": null,
                                    "name": "cst",
                                    "value": [
                                      3.3882975578308105,
                                      -3.164527654647827
                                    ]
                                  },
                                  {
                                    "cache": false,
                                    "name": "var",
                                    "value": "Features"
                                  }
                                ]
                              },
                              "cache": false,
                              "input": [
                                "float32:(2,)",
                                "float32:(2,)"
                              ],
                              "output": "float32"
                            },
                            {
                              "cache": false,
                              "comment": null,
                              "name": "cst",
                              "value": -8.323304176330566
                            }
                          ]
                        },
                        "cache": true,
                        "input": [
                          "float32",
                          "float32"
                        ],
                        "output": "float32"
                      }
                    ]
                  },
                  "cache": false,
                  "input": [
                    "float32",
                    "float32"
                  ],
                  "output": "float32:(2,)"
                }
              ]
            },
            "cache": false,
            "input": [
              "float32:(2,)"
            ],
            "output": "float32:(2,)"
          }
        ]
      },
      "cache": false,
      "input": [
        "float32:(2,)"
      ],
      "input_names": [
        "Features"
      ],
      "output": "float32:(2,)",
      "output_names": [
        "Prediction",
        "Score"
      ]
    }

For this particular example, the function is calling sklearn_logistic_regression and the code which produces the model looks like:

model = LogisticRegression()
model.fit(...)

coef = model.coef_.ravel()
bias = numpy.float32(model.intercept_[0])

gr_coef = MLActionCst(coef)
gr_var = MLActionVar(coef, input_names)
gr_bias = MLActionCst(bias)
gr_dot = MLActionTensorDot(gr_coef, gr_var)
gr_dist = MLActionAdd(gr_dot, gr_bias)
gr_sign = MLActionSign(gr_dist)
gr_conc = MLActionConcat(gr_sign, gr_dist)
gr_final = MLModel(gr_conc, output_names, name="LogisticRegression")

The function interal represents any kind of function into a graph. This graph can easily exported in any format, Python or any other programming language. The goal is not to evaluate it as it is slow due to the extra checkings ran all along the evaluation to make sure types are consistent. The current implementation supports conversion into C.

<<<

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data[:, :2]
y = iris.target
y[y == 2] = 1
lr = LogisticRegression()
lr.fit(X, y)

# a grammar tree is the expected scoring model.
from mlprodict.grammar_sklearn import sklearn2graph
gr = sklearn2graph(lr, output_names=['Prediction', 'Score'])

# We display the result in JSON.
ccode = gr.export(lang='c')
# We print after a little bit of cleaning.
print("\n".join(_ for _ in ccode['code'].split("\n") if "//" not in _))

>>>

    int LogisticRegression (float* pred, float* Features)
    {
        float pred0c0c00c0[2] = {(float)3.3882975578308105, (float)-3.164527654647827};
        float* pred0c0c00c1 = Features;
        float pred0c0c00;
        adot_float_float(&pred0c0c00, pred0c0c00c0, pred0c0c00c1, 2);
        float pred0c0c01 = (float)-8.323304176330566;
        float pred0c0c0 = pred0c0c00 + pred0c0c01;
        float pred0c0;
        sign_float(&pred0c0, pred0c0c0);
        float pred0[2];
        concat_float_float(pred0, pred0c0, pred0c0c0);
        memcpy(pred, pred0, 2*sizeof(float));
        return 0;
    }

Function adot, sign, concat are implemented in module mlprodict.grammar_sklearn.cc.c_compilation. Function compile_c_function can compile this with cffi.

from mlprodict.grammar_sklearn.cc.c_compilation import compile_c_function
fct = compile_c_function(code_c, 2)
e2 = fct(X[0, :])
print(e2)

The output is the same as the prediction given by scikit-learn.

source on GitHub