module onnx_conv.convert#

Short summary#

module mlprodict.onnx_conv.convert

Overloads a conversion function.

source on GitHub

Functions#

function

truncated documentation

_fix_opset_skl2onnx

_replace_tensor_type

convert_scorer

Converts a scorer into ONNX assuming there exists a converter associated to it. The function wraps the function …

get_inputs_from_data

Produces input data for onnx runtime.

guess_initial_types

Guesses initial types from an array or a dataframe.

guess_schema_from_data

Guesses initial types from a dataset.

guess_schema_from_model

Guesses initial types from a model.

to_onnx

Converts a model using on sklearn-onnx.

Documentation#

Overloads a conversion function.

source on GitHub

mlprodict.onnx_conv.convert._fix_opset_skl2onnx()#
mlprodict.onnx_conv.convert._replace_tensor_type(schema, tensor_type)#
mlprodict.onnx_conv.convert.convert_scorer(fct, initial_types, name=None, target_opset=None, options=None, custom_conversion_functions=None, custom_shape_calculators=None, custom_parsers=None, white_op=None, black_op=None, final_types=None, verbose=0)#

Converts a scorer into ONNX assuming there exists a converter associated to it. The function wraps the function into a custom transformer, then calls function convert_sklearn from sklearn-onnx.

Parameters:
  • fct – function to convert (or a scorer from scikit-learn)

  • initial_types – types information

  • name – name of the produced model

  • target_opset – to do it with a different target opset

  • options – additional parameters for the conversion

  • custom_conversion_functions – a dictionary for specifying the user customized conversion function, it takes precedence over registered converters

  • custom_shape_calculators – a dictionary for specifying the user customized shape calculator it takes precedence over registered shape calculators.

  • custom_parsers – parsers determine which outputs is expected for which particular task, default parsers are defined for classifiers, regressors, pipeline but they can be rewritten, custom_parsers is a dictionary { type: fct_parser(scope, model, inputs, custom_parsers=None) }

  • white_op – white list of ONNX nodes allowed while converting a pipeline, if empty, all are allowed

  • black_op – black list of ONNX nodes allowed while converting a pipeline, if empty, none are blacklisted

  • final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.

  • verbose – displays information while converting

Returns:

ONNX graph

source on GitHub

mlprodict.onnx_conv.convert.get_inputs_from_data(X, schema=None)#

Produces input data for onnx runtime.

Parameters:
Returns:

input data

source on GitHub

mlprodict.onnx_conv.convert.guess_initial_types(X, initial_types)#

Guesses initial types from an array or a dataframe.

Parameters:
  • X – array or dataframe

  • initial_types – hints about X

Returns:

data types

source on GitHub

mlprodict.onnx_conv.convert.guess_schema_from_data(X, tensor_type=None, schema=None)#

Guesses initial types from a dataset.

Parameters:
  • X – dataset (dataframe, array)

  • tensor_type – if not None, replaces every FloatTensorType or DoubleTensorType by this one

  • schema – known schema

Returns:

schema (list of typed and named columns)

source on GitHub

mlprodict.onnx_conv.convert.guess_schema_from_model(model, tensor_type=None, schema=None)#

Guesses initial types from a model.

Parameters:
  • model – model

  • tensor_type – if not None, replaces every FloatTensorType or DoubleTensorType by this one

  • schema – known schema

Returns:

schema (list of typed and named columns)

source on GitHub

mlprodict.onnx_conv.convert.to_onnx(model, X=None, name=None, initial_types=None, target_opset=None, options=None, rewrite_ops=False, white_op=None, black_op=None, final_types=None, rename_strategy=None, verbose=0)#

Converts a model using on sklearn-onnx.

Parameters:
  • model – model to convert or a function wrapped into _PredictScorer with function make_scorer

  • X – training set (at least one row), can be None, it is used to infered the input types (initial_types)

  • initial_types – if X is None, then initial_types must be defined

  • name – name of the produced model

  • target_opset – to do it with a different target opset

  • options – additional parameters for the conversion

  • rewrite_ops – rewrites some existing converters, the changes are permanent

  • white_op – white list of ONNX nodes allowed while converting a pipeline, if empty, all are allowed

  • black_op – black list of ONNX nodes allowed while converting a pipeline, if empty, none are blacklisted

  • final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.

  • rename_strategy – rename any name in the graph, select shorter names, see onnx_rename_names

  • verbose – display information while converting the model

Returns:

converted model

The function rewrites function to_onnx from sklearn-onnx but may changes a few converters if rewrite_ops is True. For example, ONNX only supports TreeEnsembleRegressor for float but not for double. It becomes available if rewrite_ops=True.

How to deal with a dataframe as input?

Each column of the dataframe is considered as an named input. The first step is to make sure that every column type is correct. pandas tends to select the least generic type to hold the content of one column. ONNX does not automatically cast the data it receives. The data must have the same type with the model is converted and when the converted model receives the data to predict.

<<<

from io import StringIO
from textwrap import dedent
import numpy
import pandas
from pyquickhelper.pycode import ExtTestCase
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from mlprodict.onnx_conv import to_onnx
from mlprodict.onnxrt import OnnxInference

text = dedent('''
    __SCHEMA__
    7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,red
    7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5,red
    7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5,red
    11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6,red
    ''')
text = text.replace(
    "__SCHEMA__",
    "fixed_acidity,volatile_acidity,citric_acid,residual_sugar,chlorides,"
    "free_sulfur_dioxide,total_sulfur_dioxide,density,pH,sulphates,"
    "alcohol,quality,color")

X_train = pandas.read_csv(StringIO(text))
for c in X_train.columns:
    if c != 'color':
        X_train[c] = X_train[c].astype(numpy.float32)
numeric_features = [c for c in X_train if c != 'color']

pipe = Pipeline([
    ("prep", ColumnTransformer([
        ("color", Pipeline([
            ('one', OneHotEncoder()),
            ('select', ColumnTransformer(
                [('sel1', 'passthrough', [0])]))
        ]), ['color']),
        ("others", "passthrough", numeric_features)
    ])),
])

pipe.fit(X_train)
pred = pipe.transform(X_train)
print(pred)

model_onnx = to_onnx(pipe, X_train, target_opset=12)
oinf = OnnxInference(model_onnx)

# The dataframe is converted into a dictionary,
# each key is a column name, each value is a numpy array.
inputs = {c: X_train[c].values for c in X_train.columns}
inputs = {c: v.reshape((v.shape[0], 1)) for c, v in inputs.items()}

onxp = oinf.run(inputs)
print(onxp)

>>>

    [[1.000e+00 7.400e+00 7.000e-01 0.000e+00 1.900e+00 7.600e-02 1.100e+01
      3.400e+01 9.978e-01 3.510e+00 5.600e-01 9.400e+00 5.000e+00]
     [1.000e+00 7.800e+00 8.800e-01 0.000e+00 2.600e+00 9.800e-02 2.500e+01
      6.700e+01 9.968e-01 3.200e+00 6.800e-01 9.800e+00 5.000e+00]
     [1.000e+00 7.800e+00 7.600e-01 4.000e-02 2.300e+00 9.200e-02 1.500e+01
      5.400e+01 9.970e-01 3.260e+00 6.500e-01 9.800e+00 5.000e+00]
     [1.000e+00 1.120e+01 2.800e-01 5.600e-01 1.900e+00 7.500e-02 1.700e+01
      6.000e+01 9.980e-01 3.160e+00 5.800e-01 9.800e+00 6.000e+00]]
    {'transformed_column': array([[1.000e+00, 7.400e+00, 7.000e-01, 0.000e+00, 1.900e+00, 7.600e-02,
            1.100e+01, 3.400e+01, 9.978e-01, 3.510e+00, 5.600e-01, 9.400e+00,
            5.000e+00],
           [1.000e+00, 7.800e+00, 8.800e-01, 0.000e+00, 2.600e+00, 9.800e-02,
            2.500e+01, 6.700e+01, 9.968e-01, 3.200e+00, 6.800e-01, 9.800e+00,
            5.000e+00],
           [1.000e+00, 7.800e+00, 7.600e-01, 4.000e-02, 2.300e+00, 9.200e-02,
            1.500e+01, 5.400e+01, 9.970e-01, 3.260e+00, 6.500e-01, 9.800e+00,
            5.000e+00],
           [1.000e+00, 1.120e+01, 2.800e-01, 5.600e-01, 1.900e+00, 7.500e-02,
            1.700e+01, 6.000e+01, 9.980e-01, 3.160e+00, 5.800e-01, 9.800e+00,
            6.000e+00]], dtype=float32)}

Changed in version 0.7: Parameter rename_strategy was added.

source on GitHub