# Python Runtime for ONNX¶

This runtime does not take any dependency on scikit-learn, only on numpy, scipy, and has custom implementations in C++ (cython, pybind11).

## Inference¶

The main class reads an ONNX file and may computes predictions based on a runtime implementated in Python. The ONNX model relies on the following operators Python Runtime for ONNX operators.

`mlprodict.onnxrt.OnnxInference`

(*self*, *onnx_or_bytes_or_stream*, *runtime* = None, *skip_run* = False, *inplace* = True, *input_inplace* = False, *ir_version* = None, *target_opset* = None, *runtime_options* = None)

Loads an ONNX file or object or stream. Computes the output of the ONNX graph. Several runtimes are available.

`'python'`

: the runtime implements every onnx operator needed to run a scikit-learn model by using numpy or C++ code.

`'python_compiled'`

: it is the same runtime than the previous one except every operator is called from a compiled function (`_build_compile_run`

) instead for a method going through the list of operator

`'onnxruntime1'`

: uses onnxruntime

`'onnxruntime2'`

: this mode is mostly used to debug as python handles calling every operator but onnxruntime is called for every of them

`build_intermediate`

(self)Builds every possible ONNX file which computes one specific intermediate output from the inputs.

`check_model`

(self)Checks the model follow ONNX conventions.

`display_sequence`

(self,verbose= 1)Shows the sequence of nodes to run if

`runtime=='python'`

.

`global_index`

(self,name)Maps every name to one integer to avoid using dictionaries when running the predictions.

`reduce_size`

(self,pickable= False)Reduces the memory footprint as much as possible.

`run`

(self,inputs,clean_right_away= False,intermediate= False,verbose= 0,node_time= False,fLOG= None)Computes the predictions for this onnx graph.

`shape_inference`

(self)Infers the shape of the outputs with onnx package.

`switch_initializers_dtype`

(self,model= None,dtype_in= <class ‘numpy.float32’>,dtype_out= <class ‘numpy.float64’>)Switches all initializers to

`numpy.float64`

. Ifmodelis None, a simple cast is done. Otherwise, the function assumes the model is a scikit-learn pipeline. This only works if the runtime is`'python'`

.

`to_sequence`

(self)Produces a graph to facilitate the execution.

One example…

## Python to ONNX¶

`mlprodict.onnx_grammar.translate_fct2onnx`

(*fct*, *context* = None, *cpl* = False, *context_cpl* = None, *output_names* = None, *dtype* = <class ‘numpy.float32’>, *verbose* = 0, *fLOG* = None)

Translates a function into ONNX. The code it produces is using classes

OnnxAbs,OnnxAdd, …

## ONNX Export¶

`mlprodict.onnxrt.onnx_inference_exports.OnnxInferenceExport`

(*self*, *oinf*)

Implements methods to export a instance of

`OnnxInference`

into json or dot.

## ONNX Structure¶

`mlprodict.onnxrt.onnx_inference_manipulations.enumerate_model_node_outputs`

(*model*, *add_node* = False)

Enumerates all the nodes of a model.

`mlprodict.onnxrt.onnx_inference_manipulations.select_model_inputs_outputs`

(*model*, *outputs* = None, *inputs* = None)

Takes a model and changes its outputs.

## Validation¶

`mlprodict.onnxrt.validate.enumerate_validated_operator_opsets`

(*verbose* = 0, *opset_min* = -1, *opset_max* = -1, *check_runtime* = True, *debug* = False, *runtime* = ‘python’, *models* = None, *dump_folder* = None, *store_models* = False, *benchmark* = False, *skip_models* = None, *assume_finite* = True, *node_time* = False, *fLOG* = <built-in function print>, *filter_exp* = None, *versions* = False, *extended_list* = False, *time_kwargs* = None, *dump_all* = False, *n_features* = None, *skip_long_test* = True, *fail_bad_results* = False, *filter_scenario* = None, *time_kwargs_fact* = None, *time_limit* = 4, *n_jobs* = None)

Tests all possible configurations for all possible operators and returns the results.

`mlprodict.onnxrt.validate.side_by_side.side_by_side_by_values`

(*sessions*, *args*, *inputs* = None, *kwargs*)

Compares the execution of two sessions. It calls method

`OnnxInference.run`

with value`intermediate=True`

and compares the results.

`mlprodict.onnxrt.validate.summary_report`

(*df*, *add_cols* = None, *add_index* = None)

Finalizes the results computed by function

`enumerate_validated_operator_opsets`

.

`mlprodict.onnxrt.model_checker.onnx_shaker`

(*oinf*, *inputs*, *output_fct*, *n* = 100, *dtype* = <class ‘numpy.float32’>, *force* = 1)

Shakes a model ONNX. Explores the ranges for every prediction. Uses

`astype_range`

`mlprodict.onnxrt.validate.validate_graph.plot_validate_benchmark`

## C++ classes¶

**Gather**

`mlprodict.onnxrt.ops_cpu.op_gather_.GatherDouble`

(*self*, *arg0*)

Implements runtime for operator Gather. The code is inspired from tfidfvectorizer.cc in onnxruntime.

`mlprodict.onnxrt.ops_cpu.op_gather_.GatherFloat`

(*self*, *arg0*)

Implements runtime for operator Gather. The code is inspired from tfidfvectorizer.cc in onnxruntime.

`mlprodict.onnxrt.ops_cpu.op_gather_.GatherInt64`

(*self*, *arg0*)

Implements runtime for operator Gather. The code is inspired from tfidfvectorizer.cc in onnxruntime.

**ArrayFeatureExtractor**

`mlprodict.onnxrt.ops_cpu._op_onnx_numpy.array_feature_extractor_double`

(*arg0*, *arg1*)

array_feature_extractor_double(arg0: numpy.ndarray[float64], arg1: numpy.ndarray[int64]) -> numpy.ndarray[float64]

C++ implementation of operator ArrayFeatureExtractor for float64. The function only works with contiguous arrays.

`mlprodict.onnxrt.ops_cpu._op_onnx_numpy.array_feature_extractor_float`

(*arg0*, *arg1*)

array_feature_extractor_float(arg0: numpy.ndarray[float32], arg1: numpy.ndarray[int64]) -> numpy.ndarray[float32]

C++ implementation of operator ArrayFeatureExtractor for float32. The function only works with contiguous arrays.

`mlprodict.onnxrt.ops_cpu._op_onnx_numpy.array_feature_extractor_int64`

(*arg0*, *arg1*)

array_feature_extractor_int64(arg0: numpy.ndarray[int64], arg1: numpy.ndarray[int64]) -> numpy.ndarray[int64]

C++ implementation of operator ArrayFeatureExtractor for int64. The function only works with contiguous arrays.

**SVM**

`mlprodict.onnxrt.ops_cpu.op_svm_classifier_.RuntimeSVMClassifier`

`mlprodict.onnxrt.ops_cpu.op_svm_regressor_.RuntimeSVMRegressor`

**Tree Ensemble**

`mlprodict.onnxrt.ops_cpu.op_tree_ensemble_classifier_.RuntimeTreeEnsembleClassifierDouble`

(*self*)

Implements runtime for operator TreeEnsembleClassifier. The code is inspired from tree_ensemble_classifier.cc in onnxruntime. Supports double only.

`mlprodict.onnxrt.ops_cpu.op_tree_ensemble_classifier_.RuntimeTreeEnsembleClassifierFloat`

(*self*)

Implements runtime for operator TreeEnsembleClassifier. The code is inspired from tree_ensemble_classifier.cc in onnxruntime. Supports float only.

`mlprodict.onnxrt.ops_cpu.op_tree_ensemble_regressor_.RuntimeTreeEnsembleRegressorDouble`

(*self*)

Implements double runtime for operator TreeEnsembleRegressor. The code is inspired from tree_ensemble_regressor.cc in onnxruntime. Supports double only.

`mlprodict.onnxrt.ops_cpu.op_tree_ensemble_regressor_.RuntimeTreeEnsembleRegressorFloat`

(*self*)

Implements float runtime for operator TreeEnsembleRegressor. The code is inspired from tree_ensemble_regressor.cc in onnxruntime. Supports float only.

**Still tree ensembles but refactored.**

`mlprodict.onnxrt.ops_cpu.op_tree_ensemble_classifier_p_.RuntimeTreeEnsembleClassifierPDouble`

(*self*, *arg0*, *arg1*)

Implements double runtime for operator TreeEnsembleClassifier. The code is inspired from tree_ensemble_Classifier.cc in onnxruntime. Supports double only.

`mlprodict.onnxrt.ops_cpu.op_tree_ensemble_classifier_p_.RuntimeTreeEnsembleClassifierPFloat`

(*self*, *arg0*, *arg1*)

Implements float runtime for operator TreeEnsembleClassifier. The code is inspired from tree_ensemble_Classifier.cc in onnxruntime. Supports float only.

`mlprodict.onnxrt.ops_cpu.op_tree_ensemble_regressor_p_.RuntimeTreeEnsembleRegressorPDouble`

(*self*, *arg0*, *arg1*)

Implements double runtime for operator TreeEnsembleRegressor. The code is inspired from tree_ensemble_regressor.cc in onnxruntime. Supports double only.

`mlprodict.onnxrt.ops_cpu.op_tree_ensemble_regressor_p_.RuntimeTreeEnsembleRegressorPFloat`

(*self*, *arg0*, *arg1*)

Implements float runtime for operator TreeEnsembleRegressor. The code is inspired from tree_ensemble_regressor.cc in onnxruntime. Supports float only.

**Topk**

`mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_max_double`

(*arg0*, *arg1*, *arg2*, *arg3*)

topk_element_max_double(arg0: numpy.ndarray[float64], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[int64]

C++ implementation of operator TopK for float32. The function only works with contiguous arrays. The function is parallelized for more than

th_pararows. It only does it on the last axis.

`mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_max_float`

(*arg0*, *arg1*, *arg2*, *arg3*)

topk_element_max_float(arg0: numpy.ndarray[float32], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[int64]

C++ implementation of operator TopK for float32. The function only works with contiguous arrays. The function is parallelized for more than

th_pararows. It only does it on the last axis.

`mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_max_int64`

(*arg0*, *arg1*, *arg2*, *arg3*)

topk_element_max_int64(arg0: numpy.ndarray[int64], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[int64]

C++ implementation of operator TopK for float32. The function only works with contiguous arrays. The function is parallelized for more than

th_pararows. It only does it on the last axis.

`mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_min_double`

(*arg0*, *arg1*, *arg2*, *arg3*)

topk_element_min_double(arg0: numpy.ndarray[float64], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[int64]

th_pararows. It only does it on the last axis.

`mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_min_float`

(*arg0*, *arg1*, *arg2*, *arg3*)

topk_element_min_float(arg0: numpy.ndarray[float32], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[int64]

th_pararows. It only does it on the last axis.

`mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_min_int64`

(*arg0*, *arg1*, *arg2*, *arg3*)

topk_element_min_int64(arg0: numpy.ndarray[int64], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[int64]

th_pararows. It only does it on the last axis.

`mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_fetch_double`

(*arg0*, *arg1*)

topk_element_fetch_double(arg0: numpy.ndarray[float64], arg1: numpy.ndarray[int64]) -> numpy.ndarray[float64]

Fetches the top k element knowing their indices on each row (= last dimension for a multi dimension array).

`mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_fetch_float`

(*arg0*, *arg1*)

topk_element_fetch_float(arg0: numpy.ndarray[float32], arg1: numpy.ndarray[int64]) -> numpy.ndarray[float32]

Fetches the top k element knowing their indices on each row (= last dimension for a multi dimension array).

`mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_fetch_int64`

(*arg0*, *arg1*)

topk_element_fetch_int64(arg0: numpy.ndarray[int64], arg1: numpy.ndarray[int64]) -> numpy.ndarray[int64]

Fetches the top k element knowing their indices on each row (= last dimension for a multi dimension array).

## Optimisation¶

The following functions reduce the number of ONNX operators in a graph while keeping the same results. The optimized graph is left unchanged.

`mlprodict.onnxrt.optim.onnx_remove_node`

(*onnx_model*, *recursive* = True, *debug_info* = None)

Removes as many nodes as possible without changing the outcome. It applies

`onnx_remove_node_identity`

, then`onnx_remove_node_redundant`

.

`mlprodict.onnxrt.optim.onnx_remove_node_identity`

(*onnx_model*, *recursive* = True, *debug_info* = None)

Removes as many

Identitynodes as possible. The function looks into every node and subgraphs ifrecursiveis True for identity node. Unless such a node directy connects one input to one output, it will be removed and every other node gets its inputs or outputs accordingly renamed.

`mlprodict.onnxrt.optim.onnx_remove_node_redundant`

(*onnx_model*, *recursive* = True, *debug_info* = None, *max_hash_size* = 1000)

Removes redundant part of the graph. A redundant part is a set of nodes which takes the same inputs and produces the same outputs. It first starts by looking into duplicated initializers, then looks into nodes taking the same inputs and sharing the same type and parameters.

## Shapes¶

The computation of the predictions through epkg:ONNX may
be optimized if the shape of every nodes is known. For example,
one possible optimisation is to do inplace computation every time
it is possible but this is only possible if the size of
the input and output are the same. We could compute the predictions
for a sample and check the sizes are the same
but that could be luck. We could also guess from a couple of samples
with different sizes and assume sizes and polynomial functions
of the input size. But in rare occasions, that could be luck too.
So one way of doing it is to implement a method
`_set_shape_inference_runtime`

which works the same say as method `_run_sequence_runtime`

but handles shapes instead. Following class tries to implement
a way to keep track of shape along the shape.

`mlprodict.onnxrt.shape_object.ShapeObject`

(*self*, *shape*, *dtype* = None, *use_n1* = False, *name* = None)

Handles mathematical operations around shapes. It stores a type (numpy type), and a name to somehow have an idea of where the shape comes from in the ONNX graph. The shape itself is defined by a list of

`DimensionObject`

or`ShapeOperator`

orNoneif the shape is unknown. A dimension is an integer or a variable encoded as a string. This variable is a way to tell the dimension may vary.