scikit-learn API and ONNX graph in pipelines#

This is the main class which makes it easy to insert to use the prediction from an ONNX files into a scikit-learn pipeline.

OnnxPipeline#

mlprodict.sklapi.OnnxPipeline (self, steps, memory = None, verbose = False, output_name = None, enforce_float32 = True, runtime = ‘python’, options = None, white_op = None, black_op = None, final_types = None, op_version = None)

The pipeline overwrites method fit, it trains and converts every steps into ONNX before training the next step in order to minimize discrepencies. By default, ONNX is using float and not double which is the default for scikit-learn. It may introduce discrepencies when a non-continuous model (mathematical definition) such as tree ensemble and part of the pipeline.

fit (self, X, y = None, fit_params)

Fits the model, fits all the transforms one after the other and transform the data, then fit the transformed data using the final estimator.

OnnxTransformer#

mlprodict.sklapi.OnnxTransformer (self, onnx_bytes, output_name = None, enforce_float32 = True, runtime = ‘python’, change_batch_size = None, reshape = False)

Calls onnxruntime or the runtime implemented in this package to transform input based on a ONNX graph. It follows scikit-learn API so that it can be included in a scikit-learn pipeline. See notebook Transfer Learning with ONNX for an example.

enumerate_create (onnx_bytes, output_names = None, enforce_float32 = True)

Creates multiple OnnxTransformer, one for each requested intermediate node.

onnx_bytes : bytes output_names: string

requested output names or None to request all and have method transform to store all of them in a dataframe

enforce_float32boolean

onnxruntime only supports float32, scikit-learn usually uses double floats, this parameter ensures that every array of double floats is converted into single floats

fit (self, X = None, y = None, fit_params)

Loads the ONNX model.

fit_transform (self, X, y = None, inputs)

Loads the ONNX model and runs the predictions.

onnx_converter (self)

Returns a converter for this model. If not overloaded, it fetches the converter mapped to the first scikit-learn parent it can find.

onnx_parser (self)

Returns a parser for this model.

onnx_shape_calculator (self)

transform (self, X, y = None, inputs)

Runs the predictions. If X is a dataframe, the function assumes every columns is a separate input, otherwise, X is considered as a first input and inputs can be used to specify extra inputs.

Speedup scikit-learn pipeline with ONNX#

These classes wraps an existing pipeline from scikit-learn and replaces the inference (transform, predict, predict_proba) by another runtime built after the model was converted into ONNX. See example Compares numba, numpy, onnxruntime for simple functions for further details.

mlprodict.sklapi.OnnxSpeedupClassifier (self, estimator, runtime = ‘python’, enforce_float32 = True, target_opset = None, conv_options = None, nopython = True)

Trains with scikit-learn, transform with ONNX.

assert_almost_equal (self, X, kwargs)

Checks that ONNX and scikit-learn produces the same outputs.

fit (self, X, y, sample_weight = None)

Trains based estimator.

predict (self, X)

Transforms with ONNX.

predict_proba (self, X)

Transforms with ONNX.

raw_predict (self, X)

Transforms with scikit-learn.

raw_predict_proba (self, X)

Transforms with scikit-learn.

mlprodict.sklapi.OnnxSpeedupRegressor (self, estimator, runtime = ‘python’, enforce_float32 = True, target_opset = None, conv_options = None, nopython = True)

Trains with scikit-learn, transform with ONNX.

assert_almost_equal (self, X, kwargs)

Checks that ONNX and scikit-learn produces the same outputs.

fit (self, X, y, sample_weight = None)

Trains based estimator.

predict (self, X)

Transforms with ONNX.

raw_predict (self, X)

Transforms with scikit-learn.

mlprodict.sklapi.OnnxSpeedupTransformer (self, estimator, runtime = ‘python’, enforce_float32 = True, target_opset = None, conv_options = None, nopython = True)

Trains with scikit-learn, transform with ONNX.

assert_almost_equal (self, X, kwargs)

Checks that ONNX and scikit-learn produces the same outputs.

fit (self, X, y = None, sample_weight = None)

Trains based estimator.

raw_transform (self, X)

Transforms with scikit-learn.

transform (self, X)

Transforms with ONNX.

Tokenizers#

mlprodict.sklapi.onnx_tokenizer.GPT2TokenizerTransformer (self, vocab, merges, padding_length = -1, opset = None)

Wraps GPT2Tokenizer into a scikit-learn transformer.

fit (self, X, y = None, sample_weight = None)

The model is not trains this method is still needed to set the instance up and ready to transform.

transform (self, X)

Applies the tokenizers on an array of strings.

mlprodict.sklapi.onnx_tokenizer.SentencePieceTokenizerTransformer (self, model, nbest_size = 1, alpha = 0.5, reverse = False, add_bos = False, add_eos = False, opset = None)

Wraps SentencePieceTokenizer into a scikit-learn transformer.

fit (self, X, y = None, sample_weight = None)

The model is not trains this method is still needed to set the instance up and ready to transform.

transform (self, X)

Applies the tokenizers on an array of strings.