module grammar_sklearn.g_sklearn_main
¶
Short summary¶
module mlprodict.grammar_sklearn.g_sklearn_main
Main functions to convert machine learned model from scikit-learn model.
Functions¶
function |
truncated documentation |
---|---|
Converts any kind of scikit-learn model into a grammar model. |
Documentation¶
Main functions to convert machine learned model from scikit-learn model.
- mlprodict.grammar_sklearn.g_sklearn_main.sklearn2graph(model, output_names=None, **kwargs)¶
Converts any kind of scikit-learn model into a grammar model.
- Parameters
model – scikit-learn model
output_names – names of the outputs
kwargs – additional parameters, sent to the converter
- Returns
converter to grammar model
Short list of additional parameters: - with_loop: the pseudo code includes loops,
this option is not available everywhere.
If output_names is None, default values will be given to the inputs and outputs. One example on how to use this function. A scikit-learn model is trained and converted into a graph which implements the prediction function with the grammar language.
<<<
from sklearn.linear_model import LogisticRegression from sklearn.datasets import load_iris iris = load_iris() X = iris.data[:, :2] y = iris.target y[y == 2] = 1 lr = LogisticRegression() lr.fit(X, y) # grammar is the expected scoring model. from mlprodict.grammar_sklearn import sklearn2graph gr = sklearn2graph(lr, output_names=['Prediction', 'Score']) # We can even check what the function should produce as a score. # Types are strict. import numpy X = numpy.array([[numpy.float32(1), numpy.float32(2)]]) e2 = gr.execute(Features=X[0, :]) print(e2) # We display the result in JSON. ser = gr.export(lang='json', hook={'array': lambda v: v.tolist(), 'float32': lambda v: float(v)}) import json print(json.dumps(ser, sort_keys=True, indent=2))
>>>
[ 0. -11.264] { "action": { "name": "LogisticRegression", "variants": [ { "action": { "name": "return", "variants": [ { "action": { "name": "concat", "variants": [ { "action": { "name": "sign", "variants": [ { "action": { "name": "+", "variants": [ { "action": { "name": "adot", "variants": [ { "cache": false, "comment": null, "name": "cst", "value": [ 3.3882975578308105, -3.164527654647827 ] }, { "cache": false, "name": "var", "value": "Features" } ] }, "cache": false, "input": [ "float32:(2,)", "float32:(2,)" ], "output": "float32" }, { "cache": false, "comment": null, "name": "cst", "value": -8.323304176330566 } ] }, "cache": false, "input": [ "float32", "float32" ], "output": "float32" } ] }, "cache": false, "input": [ "float32" ], "output": "float32" }, { "action": { "name": "+", "variants": [ { "action": { "name": "adot", "variants": [ { "cache": false, "comment": null, "name": "cst", "value": [ 3.3882975578308105, -3.164527654647827 ] }, { "cache": false, "name": "var", "value": "Features" } ] }, "cache": false, "input": [ "float32:(2,)", "float32:(2,)" ], "output": "float32" }, { "cache": false, "comment": null, "name": "cst", "value": -8.323304176330566 } ] }, "cache": true, "input": [ "float32", "float32" ], "output": "float32" } ] }, "cache": false, "input": [ "float32", "float32" ], "output": "float32:(2,)" } ] }, "cache": false, "input": [ "float32:(2,)" ], "output": "float32:(2,)" } ] }, "cache": false, "input": [ "float32:(2,)" ], "input_names": [ "Features" ], "output": "float32:(2,)", "output_names": [ "Prediction", "Score" ] }
For this particular example, the function is calling
sklearn_logistic_regression
and the code which produces the model looks like:model = LogisticRegression() model.fit(...) coef = model.coef_.ravel() bias = numpy.float32(model.intercept_[0]) gr_coef = MLActionCst(coef) gr_var = MLActionVar(coef, input_names) gr_bias = MLActionCst(bias) gr_dot = MLActionTensorDot(gr_coef, gr_var) gr_dist = MLActionAdd(gr_dot, gr_bias) gr_sign = MLActionSign(gr_dist) gr_conc = MLActionConcat(gr_sign, gr_dist) gr_final = MLModel(gr_conc, output_names, name="LogisticRegression")
The function interal represents any kind of function into a graph. This graph can easily exported in any format, Python or any other programming language. The goal is not to evaluate it as it is slow due to the extra checkings ran all along the evaluation to make sure types are consistent. The current implementation supports conversion into C.
<<<
from sklearn.linear_model import LogisticRegression from sklearn.datasets import load_iris iris = load_iris() X = iris.data[:, :2] y = iris.target y[y == 2] = 1 lr = LogisticRegression() lr.fit(X, y) # a grammar tree is the expected scoring model. from mlprodict.grammar_sklearn import sklearn2graph gr = sklearn2graph(lr, output_names=['Prediction', 'Score']) # We display the result in JSON. ccode = gr.export(lang='c') # We print after a little bit of cleaning. print("\n".join(_ for _ in ccode['code'].split("\n") if "//" not in _))
>>>
int LogisticRegression (float* pred, float* Features) { float pred0c0c00c0[2] = {(float)3.3882975578308105, (float)-3.164527654647827}; float* pred0c0c00c1 = Features; float pred0c0c00; adot_float_float(&pred0c0c00, pred0c0c00c0, pred0c0c00c1, 2); float pred0c0c01 = (float)-8.323304176330566; float pred0c0c0 = pred0c0c00 + pred0c0c01; float pred0c0; sign_float(&pred0c0, pred0c0c0); float pred0[2]; concat_float_float(pred0, pred0c0, pred0c0c0); memcpy(pred, pred0, 2*sizeof(float)); return 0; }
Function
adot
,sign
,concat
are implemented in modulemlprodict.grammar_sklearn.cc.c_compilation
. Functioncompile_c_function
can compile this with cffi.from mlprodict.grammar_sklearn.cc.c_compilation import compile_c_function fct = compile_c_function(code_c, 2) e2 = fct(X[0, :]) print(e2)
The output is the same as the prediction given by scikit-learn.