module onnx_conv.convert#

Short summary#

module mlprodict.onnx_conv.convert

Overloads a conversion function.

source on GitHub

Classes#

class

truncated documentation

_ParamEncoder

Functions#

function

truncated documentation

_fix_opset_skl2onnx

_guess_s2o_type

_guess_type_

_merge_initial_types

_new_options

_replace_tensor_type

_to_onnx_function_column_transformer

_to_onnx_function_pipeline

convert_scorer

Converts a scorer into ONNX assuming there exists a converter associated to it. The function wraps the function …

get_column_index

Returns a tuples (variable index, column index in that variable). The function has two different behaviours, one when …

get_column_indices

Returns the requested graph inpudes based on their indices or names. See get_column_index().

get_inputs_from_data

Produces input data for onnx runtime.

get_sklearn_json_params

Retrieves all the parameters of a scikit-learn model.

guess_initial_types

Guesses initial types from an array or a dataframe.

guess_schema_from_data

Guesses initial types from a dataset.

guess_schema_from_model

Guesses initial types from a model.

to_onnx

Converts a model using on sklearn-onnx.

to_onnx_function

Converts a model using on sklearn-onnx. The functions works as the same as function to_onnx() but …

Methods#

method

truncated documentation

default

Documentation#

Overloads a conversion function.

source on GitHub

class mlprodict.onnx_conv.convert._ParamEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)#

Bases: JSONEncoder

Constructor for JSONEncoder, with sensible defaults.

If skipkeys is false, then it is a TypeError to attempt encoding of keys that are not str, int, float or None. If skipkeys is True, such items are simply skipped.

If ensure_ascii is true, the output is guaranteed to be str objects with all incoming non-ASCII characters escaped. If ensure_ascii is false, the output can contain non-ASCII characters.

If check_circular is true, then lists, dicts, and custom encoded objects will be checked for circular references during encoding to prevent an infinite recursion (which would cause an OverflowError). Otherwise, no such check takes place.

If allow_nan is true, then NaN, Infinity, and -Infinity will be encoded as such. This behavior is not JSON specification compliant, but is consistent with most JavaScript based encoders and decoders. Otherwise, it will be a ValueError to encode such floats.

If sort_keys is true, then the output of dictionaries will be sorted by key; this is useful for regression tests to ensure that JSON serializations can be compared on a day-to-day basis.

If indent is a non-negative integer, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0 will only insert newlines. None is the most compact representation.

If specified, separators should be an (item_separator, key_separator) tuple. The default is (’, ‘, ‘: ‘) if indent is None and (‘,’, ‘: ‘) otherwise. To get the most compact JSON representation, you should specify (‘,’, ‘:’) to eliminate whitespace.

If specified, default is a function that gets called for objects that can’t otherwise be serialized. It should return a JSON encodable version of the object or raise a TypeError.

default(obj)#

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)
mlprodict.onnx_conv.convert._fix_opset_skl2onnx()#
mlprodict.onnx_conv.convert._guess_s2o_type(vtype: ValueInfoProto)#
mlprodict.onnx_conv.convert._guess_type_(X, itype, dtype)#
mlprodict.onnx_conv.convert._merge_initial_types(i_types, transform_inputs, merge)#
mlprodict.onnx_conv.convert._new_options(options, prefix, sklop)#
mlprodict.onnx_conv.convert._replace_tensor_type(schema, tensor_type)#
mlprodict.onnx_conv.convert._to_onnx_function_column_transformer(model, X=None, name=None, initial_types=None, target_opset=None, options=None, rewrite_ops=False, white_op=None, black_op=None, final_types=None, rename_strategy=None, verbose=0, prefix_name=None, run_shape=False, single_function=True)#
mlprodict.onnx_conv.convert._to_onnx_function_pipeline(model, X=None, name=None, initial_types=None, target_opset=None, options=None, rewrite_ops=False, white_op=None, black_op=None, final_types=None, rename_strategy=None, verbose=0, prefix_name=None, run_shape=False, single_function=True)#
mlprodict.onnx_conv.convert.convert_scorer(fct, initial_types, name=None, target_opset=None, options=None, custom_conversion_functions=None, custom_shape_calculators=None, custom_parsers=None, white_op=None, black_op=None, final_types=None, verbose=0)#

Converts a scorer into ONNX assuming there exists a converter associated to it. The function wraps the function into a custom transformer, then calls function convert_sklearn from sklearn-onnx.

Parameters:
  • fct – function to convert (or a scorer from scikit-learn)

  • initial_types – types information

  • name – name of the produced model

  • target_opset – to do it with a different target opset

  • options – additional parameters for the conversion

  • custom_conversion_functions – a dictionary for specifying the user customized conversion function, it takes precedence over registered converters

  • custom_shape_calculators – a dictionary for specifying the user customized shape calculator it takes precedence over registered shape calculators.

  • custom_parsers – parsers determine which outputs is expected for which particular task, default parsers are defined for classifiers, regressors, pipeline but they can be rewritten, custom_parsers is a dictionary { type: fct_parser(scope, model, inputs, custom_parsers=None) }

  • white_op – white list of ONNX nodes allowed while converting a pipeline, if empty, all are allowed

  • black_op – black list of ONNX nodes allowed while converting a pipeline, if empty, none are blacklisted

  • final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.

  • verbose – displays information while converting

Returns:

ONNX graph

source on GitHub

mlprodict.onnx_conv.convert.get_column_index(i, inputs)#

Returns a tuples (variable index, column index in that variable). The function has two different behaviours, one when i (column index) is an integer, another one when i is a string (column name). If i is a string, the function looks for input name with this name and returns (index, 0). If i is an integer, let’s assume first we have two inputs I0 = FloatTensorType([None, 2]) and I1 = FloatTensorType([None, 3]), in this case, here are the results:

get_column_index(0, inputs) -> (0, 0)
get_column_index(1, inputs) -> (0, 1)
get_column_index(2, inputs) -> (1, 0)
get_column_index(3, inputs) -> (1, 1)
get_column_index(4, inputs) -> (1, 2)

source on GitHub

mlprodict.onnx_conv.convert.get_column_indices(indices, inputs, multiple)#

Returns the requested graph inpudes based on their indices or names. See get_column_index.

Parameters:
  • indices – variables indices or names

  • inputs – graph inputs

  • multiple – allows column to come from multiple variables

Returns:

a tuple (variable name, list of requested indices) if multiple is False, a dictionary { var_index: [ list of requested indices ] } if multiple is True

source on GitHub

mlprodict.onnx_conv.convert.get_inputs_from_data(X, schema=None)#

Produces input data for onnx runtime.

Parameters:
Returns:

input data

source on GitHub

mlprodict.onnx_conv.convert.get_sklearn_json_params(model)#

Retrieves all the parameters of a scikit-learn model.

source on GitHub

mlprodict.onnx_conv.convert.guess_initial_types(X, initial_types)#

Guesses initial types from an array or a dataframe.

Parameters:
  • X – array or dataframe

  • initial_types – hints about X

Returns:

data types

source on GitHub

mlprodict.onnx_conv.convert.guess_schema_from_data(X, tensor_type=None, schema=None)#

Guesses initial types from a dataset.

Parameters:
  • X – dataset (dataframe, array)

  • tensor_type – if not None, replaces every FloatTensorType or DoubleTensorType by this one

  • schema – known schema

Returns:

schema (list of typed and named columns)

source on GitHub

mlprodict.onnx_conv.convert.guess_schema_from_model(model, tensor_type=None, schema=None)#

Guesses initial types from a model.

Parameters:
  • model – model

  • tensor_type – if not None, replaces every FloatTensorType or DoubleTensorType by this one

  • schema – known schema

Returns:

schema (list of typed and named columns)

source on GitHub

mlprodict.onnx_conv.convert.to_onnx(model, X=None, name=None, initial_types=None, target_opset=None, options=None, rewrite_ops=False, white_op=None, black_op=None, final_types=None, rename_strategy=None, verbose=0, as_function=False, prefix_name=None, run_shape=False, single_function=True)#

Converts a model using on sklearn-onnx.

Parameters:
  • model – model to convert or a function wrapped into _PredictScorer with function make_scorer

  • X – training set (at least one row), can be None, it is used to infered the input types (initial_types)

  • initial_types – if X is None, then initial_types must be defined

  • name – name of the produced model

  • target_opset – to do it with a different target opset

  • options – additional parameters for the conversion

  • rewrite_ops – rewrites some existing converters, the changes are permanent

  • white_op – white list of ONNX nodes allowed while converting a pipeline, if empty, all are allowed

  • black_op – black list of ONNX nodes allowed while converting a pipeline, if empty, none are blacklisted

  • final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.

  • rename_strategy – rename any name in the graph, select shorter names, see onnx_rename_names

  • verbose – display information while converting the model

  • as_function – exposes every model in a pipeline as a function, the main graph contains the pipeline structure, see Use function when converting into ONNX for an example

  • prefix_name – used if as_function is True, to give a prefix to variable in a pipeline

  • run_shape – run shape inference

  • single_function – if as_function is True, the function returns one graph with one call to the main function if single_function is True or a list of node corresponding to the graph structure

Returns:

converted model

The function rewrites function to_onnx from sklearn-onnx but may changes a few converters if rewrite_ops is True. For example, ONNX only supports TreeEnsembleRegressor for float but not for double. It becomes available if rewrite_ops=True.

How to deal with a dataframe as input?

Each column of the dataframe is considered as an named input. The first step is to make sure that every column type is correct. pandas tends to select the least generic type to hold the content of one column. ONNX does not automatically cast the data it receives. The data must have the same type with the model is converted and when the converted model receives the data to predict.

<<<

from io import StringIO
from textwrap import dedent
import numpy
import pandas
from pyquickhelper.pycode import ExtTestCase
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from mlprodict.onnx_conv import to_onnx
from mlprodict.onnxrt import OnnxInference

text = dedent('''
    __SCHEMA__
    7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,red
    7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5,red
    7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5,red
    11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6,red
    ''')
text = text.replace(
    "__SCHEMA__",
    "fixed_acidity,volatile_acidity,citric_acid,residual_sugar,chlorides,"
    "free_sulfur_dioxide,total_sulfur_dioxide,density,pH,sulphates,"
    "alcohol,quality,color")

X_train = pandas.read_csv(StringIO(text))
for c in X_train.columns:
    if c != 'color':
        X_train[c] = X_train[c].astype(numpy.float32)
numeric_features = [c for c in X_train if c != 'color']

pipe = Pipeline([
    ("prep", ColumnTransformer([
        ("color", Pipeline([
            ('one', OneHotEncoder()),
            ('select', ColumnTransformer(
                [('sel1', 'passthrough', [0])]))
        ]), ['color']),
        ("others", "passthrough", numeric_features)
    ])),
])

pipe.fit(X_train)
pred = pipe.transform(X_train)
print(pred)

model_onnx = to_onnx(pipe, X_train, target_opset=12)
oinf = OnnxInference(model_onnx)

# The dataframe is converted into a dictionary,
# each key is a column name, each value is a numpy array.
inputs = {c: X_train[c].values for c in X_train.columns}
inputs = {c: v.reshape((v.shape[0], 1)) for c, v in inputs.items()}

onxp = oinf.run(inputs)
print(onxp)

>>>

    [[1.000e+00 7.400e+00 7.000e-01 0.000e+00 1.900e+00 7.600e-02 1.100e+01
      3.400e+01 9.978e-01 3.510e+00 5.600e-01 9.400e+00 5.000e+00]
     [1.000e+00 7.800e+00 8.800e-01 0.000e+00 2.600e+00 9.800e-02 2.500e+01
      6.700e+01 9.968e-01 3.200e+00 6.800e-01 9.800e+00 5.000e+00]
     [1.000e+00 7.800e+00 7.600e-01 4.000e-02 2.300e+00 9.200e-02 1.500e+01
      5.400e+01 9.970e-01 3.260e+00 6.500e-01 9.800e+00 5.000e+00]
     [1.000e+00 1.120e+01 2.800e-01 5.600e-01 1.900e+00 7.500e-02 1.700e+01
      6.000e+01 9.980e-01 3.160e+00 5.800e-01 9.800e+00 6.000e+00]]
    {'transformed_column': array([[1.000e+00, 7.400e+00, 7.000e-01, 0.000e+00, 1.900e+00, 7.600e-02,
            1.100e+01, 3.400e+01, 9.978e-01, 3.510e+00, 5.600e-01, 9.400e+00,
            5.000e+00],
           [1.000e+00, 7.800e+00, 8.800e-01, 0.000e+00, 2.600e+00, 9.800e-02,
            2.500e+01, 6.700e+01, 9.968e-01, 3.200e+00, 6.800e-01, 9.800e+00,
            5.000e+00],
           [1.000e+00, 7.800e+00, 7.600e-01, 4.000e-02, 2.300e+00, 9.200e-02,
            1.500e+01, 5.400e+01, 9.970e-01, 3.260e+00, 6.500e-01, 9.800e+00,
            5.000e+00],
           [1.000e+00, 1.120e+01, 2.800e-01, 5.600e-01, 1.900e+00, 7.500e-02,
            1.700e+01, 6.000e+01, 9.980e-01, 3.160e+00, 5.800e-01, 9.800e+00,
            6.000e+00]], dtype=float32)}

Changed in version 0.9: Parameter as_function was added.

source on GitHub

mlprodict.onnx_conv.convert.to_onnx_function(model, X=None, name=None, initial_types=None, target_opset=None, options=None, rewrite_ops=False, white_op=None, black_op=None, final_types=None, rename_strategy=None, verbose=0, prefix_name=None, run_shape=False, single_function=True)#

Converts a model using on sklearn-onnx. The functions works as the same as function to_onnx but every model is exported as a single function and the main graph represents the pipeline structure.

Parameters:
  • model – model to convert or a function wrapped into _PredictScorer with function make_scorer

  • X – training set (at least one row), can be None, it is used to infered the input types (initial_types)

  • initial_types – if X is None, then initial_types must be defined

  • name – name of the produced model

  • target_opset – to do it with a different target opset

  • options – additional parameters for the conversion

  • rewrite_ops – rewrites some existing converters, the changes are permanent

  • white_op – white list of ONNX nodes allowed while converting a pipeline, if empty, all are allowed

  • black_op – black list of ONNX nodes allowed while converting a pipeline, if empty, none are blacklisted

  • final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.

  • rename_strategy – rename any name in the graph, select shorter names, see onnx_rename_names

  • verbose – display information while converting the model

  • prefix_name – prefix for variable names

  • run_shape – run shape inference on the final onnx model

  • single_function – if True, the main graph only includes one node calling the main function

Returns:

converted model

source on GitHub