Python Runtime for ONNX

This runtime does not take any dependency on scikit-learn, only on numpy, scipy, and has custom implementations in C++ (cython, pybind11).

Inference

The main class reads an ONNX file and may computes predictions based on a runtime implementated in Python. The ONNX model relies on the following operators Python Runtime for ONNX operators.

mlprodict.onnxrt.OnnxInference (self, onnx_or_bytes_or_stream, runtime = None, skip_run = False, inplace = True, input_inplace = False, ir_version = None, target_opset = None, runtime_options = None)

Loads an ONNX file or object or stream. Computes the output of the ONNX graph. Several runtimes are available.

  • 'python': the runtime implements every onnx operator needed to run a scikit-learn model by using numpy or C++ code.

  • 'python_compiled': it is the same runtime than the previous one except every operator is called from a compiled function (_build_compile_run) instead for a method going through the list of operator

  • 'onnxruntime1': uses onnxruntime

  • 'onnxruntime2': this mode is mostly used to debug as python handles calling every operator but onnxruntime is called for every of them

build_intermediate (self)

Builds every possible ONNX file which computes one specific intermediate output from the inputs.

check_model (self)

Checks the model follow ONNX conventions.

display_sequence (self, verbose = 1)

Shows the sequence of nodes to run if runtime=='python'.

global_index (self, name)

Maps every name to one integer to avoid using dictionaries when running the predictions.

reduce_size (self, pickable = False)

Reduces the memory footprint as much as possible.

run (self, inputs, clean_right_away = False, intermediate = False, verbose = 0, node_time = False, fLOG = None)

Computes the predictions for this onnx graph.

shape_inference (self)

Infers the shape of the outputs with onnx package.

switch_initializers_dtype (self, model = None, dtype_in = <class ‘numpy.float32’>, dtype_out = <class ‘numpy.float64’>)

Switches all initializers to numpy.float64. If model is None, a simple cast is done. Otherwise, the function assumes the model is a scikit-learn pipeline. This only works if the runtime is 'python'.

to_sequence (self)

Produces a graph to facilitate the execution.

One example…

Python to ONNX

mlprodict.onnx_grammar.translate_fct2onnx (fct, context = None, cpl = False, context_cpl = None, output_names = None, dtype = <class ‘numpy.float32’>, verbose = 0, fLOG = None)

Translates a function into ONNX. The code it produces is using classes OnnxAbs, OnnxAdd, …

ONNX Export

mlprodict.onnxrt.onnx_inference_exports.OnnxInferenceExport (self, oinf)

Implements methods to export a instance of OnnxInference into json or dot.

ONNX Structure

mlprodict.onnxrt.onnx_inference_manipulations.enumerate_model_node_outputs (model, add_node = False)

Enumerates all the nodes of a model.

mlprodict.onnxrt.onnx_inference_manipulations.select_model_inputs_outputs (model, outputs = None, inputs = None)

Takes a model and changes its outputs.

Validation

mlprodict.onnxrt.validate.enumerate_validated_operator_opsets (verbose = 0, opset_min = -1, opset_max = -1, check_runtime = True, debug = False, runtime = ‘python’, models = None, dump_folder = None, store_models = False, benchmark = False, skip_models = None, assume_finite = True, node_time = False, fLOG = <built-in function print>, filter_exp = None, versions = False, extended_list = False, time_kwargs = None, dump_all = False, n_features = None, skip_long_test = True, fail_bad_results = False, filter_scenario = None, time_kwargs_fact = None, time_limit = 4, n_jobs = None)

Tests all possible configurations for all possible operators and returns the results.

mlprodict.onnxrt.validate.side_by_side.side_by_side_by_values (sessions, args, inputs = None, kwargs)

Compares the execution of two sessions. It calls method OnnxInference.run with value intermediate=True and compares the results.

mlprodict.onnxrt.validate.summary_report (df, add_cols = None, add_index = None)

Finalizes the results computed by function enumerate_validated_operator_opsets.

mlprodict.onnxrt.model_checker.onnx_shaker (oinf, inputs, output_fct, n = 100, dtype = <class ‘numpy.float32’>, force = 1)

Shakes a model ONNX. Explores the ranges for every prediction. Uses astype_range

mlprodict.onnxrt.validate.validate_graph.plot_validate_benchmark

C++ classes

Gather

mlprodict.onnxrt.ops_cpu.op_gather_.GatherDouble (self, arg0)

Implements runtime for operator Gather. The code is inspired from tfidfvectorizer.cc in onnxruntime.

mlprodict.onnxrt.ops_cpu.op_gather_.GatherFloat (self, arg0)

Implements runtime for operator Gather. The code is inspired from tfidfvectorizer.cc in onnxruntime.

mlprodict.onnxrt.ops_cpu.op_gather_.GatherInt64 (self, arg0)

Implements runtime for operator Gather. The code is inspired from tfidfvectorizer.cc in onnxruntime.

ArrayFeatureExtractor

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.array_feature_extractor_double (arg0, arg1)

array_feature_extractor_double(arg0: numpy.ndarray[float64], arg1: numpy.ndarray[int64]) -> numpy.ndarray[float64]

C++ implementation of operator ArrayFeatureExtractor for float64. The function only works with contiguous arrays.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.array_feature_extractor_float (arg0, arg1)

array_feature_extractor_float(arg0: numpy.ndarray[float32], arg1: numpy.ndarray[int64]) -> numpy.ndarray[float32]

C++ implementation of operator ArrayFeatureExtractor for float32. The function only works with contiguous arrays.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.array_feature_extractor_int64 (arg0, arg1)

array_feature_extractor_int64(arg0: numpy.ndarray[int64], arg1: numpy.ndarray[int64]) -> numpy.ndarray[int64]

C++ implementation of operator ArrayFeatureExtractor for int64. The function only works with contiguous arrays.

SVM

mlprodict.onnxrt.ops_cpu.op_svm_classifier_.RuntimeSVMClassifier

mlprodict.onnxrt.ops_cpu.op_svm_regressor_.RuntimeSVMRegressor

Tree Ensemble

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_classifier_.RuntimeTreeEnsembleClassifierDouble (self)

Implements runtime for operator TreeEnsembleClassifier. The code is inspired from tree_ensemble_classifier.cc in onnxruntime. Supports double only.

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_classifier_.RuntimeTreeEnsembleClassifierFloat (self)

Implements runtime for operator TreeEnsembleClassifier. The code is inspired from tree_ensemble_classifier.cc in onnxruntime. Supports float only.

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_regressor_.RuntimeTreeEnsembleRegressorDouble (self)

Implements double runtime for operator TreeEnsembleRegressor. The code is inspired from tree_ensemble_regressor.cc in onnxruntime. Supports double only.

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_regressor_.RuntimeTreeEnsembleRegressorFloat (self)

Implements float runtime for operator TreeEnsembleRegressor. The code is inspired from tree_ensemble_regressor.cc in onnxruntime. Supports float only.

Still tree ensembles but refactored.

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_classifier_p_.RuntimeTreeEnsembleClassifierPDouble (self, arg0, arg1)

Implements double runtime for operator TreeEnsembleClassifier. The code is inspired from tree_ensemble_Classifier.cc in onnxruntime. Supports double only.

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_classifier_p_.RuntimeTreeEnsembleClassifierPFloat (self, arg0, arg1)

Implements float runtime for operator TreeEnsembleClassifier. The code is inspired from tree_ensemble_Classifier.cc in onnxruntime. Supports float only.

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_regressor_p_.RuntimeTreeEnsembleRegressorPDouble (self, arg0, arg1)

Implements double runtime for operator TreeEnsembleRegressor. The code is inspired from tree_ensemble_regressor.cc in onnxruntime. Supports double only.

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_regressor_p_.RuntimeTreeEnsembleRegressorPFloat (self, arg0, arg1)

Implements float runtime for operator TreeEnsembleRegressor. The code is inspired from tree_ensemble_regressor.cc in onnxruntime. Supports float only.

Topk

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_max_double (arg0, arg1, arg2, arg3)

topk_element_max_double(arg0: numpy.ndarray[float64], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[int64]

C++ implementation of operator TopK for float32. The function only works with contiguous arrays. The function is parallelized for more than th_para rows. It only does it on the last axis.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_max_float (arg0, arg1, arg2, arg3)

topk_element_max_float(arg0: numpy.ndarray[float32], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[int64]

C++ implementation of operator TopK for float32. The function only works with contiguous arrays. The function is parallelized for more than th_para rows. It only does it on the last axis.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_max_int64 (arg0, arg1, arg2, arg3)

topk_element_max_int64(arg0: numpy.ndarray[int64], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[int64]

C++ implementation of operator TopK for float32. The function only works with contiguous arrays. The function is parallelized for more than th_para rows. It only does it on the last axis.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_min_double (arg0, arg1, arg2, arg3)

topk_element_min_double(arg0: numpy.ndarray[float64], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[int64]

C++ implementation of operator TopK for float32. The function only works with contiguous arrays. The function is parallelized for more than th_para rows. It only does it on the last axis.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_min_float (arg0, arg1, arg2, arg3)

topk_element_min_float(arg0: numpy.ndarray[float32], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[int64]

C++ implementation of operator TopK for float32. The function only works with contiguous arrays. The function is parallelized for more than th_para rows. It only does it on the last axis.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_min_int64 (arg0, arg1, arg2, arg3)

topk_element_min_int64(arg0: numpy.ndarray[int64], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[int64]

C++ implementation of operator TopK for float32. The function only works with contiguous arrays. The function is parallelized for more than th_para rows. It only does it on the last axis.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_fetch_double (arg0, arg1)

topk_element_fetch_double(arg0: numpy.ndarray[float64], arg1: numpy.ndarray[int64]) -> numpy.ndarray[float64]

Fetches the top k element knowing their indices on each row (= last dimension for a multi dimension array).

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_fetch_float (arg0, arg1)

topk_element_fetch_float(arg0: numpy.ndarray[float32], arg1: numpy.ndarray[int64]) -> numpy.ndarray[float32]

Fetches the top k element knowing their indices on each row (= last dimension for a multi dimension array).

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_fetch_int64 (arg0, arg1)

topk_element_fetch_int64(arg0: numpy.ndarray[int64], arg1: numpy.ndarray[int64]) -> numpy.ndarray[int64]

Fetches the top k element knowing their indices on each row (= last dimension for a multi dimension array).

Optimisation

The following functions reduce the number of ONNX operators in a graph while keeping the same results. The optimized graph is left unchanged.

mlprodict.onnxrt.optim.onnx_remove_node (onnx_model, recursive = True, debug_info = None)

Removes as many nodes as possible without changing the outcome. It applies onnx_remove_node_identity, then onnx_remove_node_redundant.

mlprodict.onnxrt.optim.onnx_remove_node_identity (onnx_model, recursive = True, debug_info = None)

Removes as many Identity nodes as possible. The function looks into every node and subgraphs if recursive is True for identity node. Unless such a node directy connects one input to one output, it will be removed and every other node gets its inputs or outputs accordingly renamed.

mlprodict.onnxrt.optim.onnx_remove_node_redundant (onnx_model, recursive = True, debug_info = None, max_hash_size = 1000)

Removes redundant part of the graph. A redundant part is a set of nodes which takes the same inputs and produces the same outputs. It first starts by looking into duplicated initializers, then looks into nodes taking the same inputs and sharing the same type and parameters.

Shapes

The computation of the predictions through epkg:ONNX may be optimized if the shape of every nodes is known. For example, one possible optimisation is to do inplace computation every time it is possible but this is only possible if the size of the input and output are the same. We could compute the predictions for a sample and check the sizes are the same but that could be luck. We could also guess from a couple of samples with different sizes and assume sizes and polynomial functions of the input size. But in rare occasions, that could be luck too. So one way of doing it is to implement a method _set_shape_inference_runtime which works the same say as method _run_sequence_runtime but handles shapes instead. Following class tries to implement a way to keep track of shape along the shape.

mlprodict.onnxrt.shape_object.ShapeObject (self, shape, dtype = None, use_n1 = False, name = None)

Handles mathematical operations around shapes. It stores a type (numpy type), and a name to somehow have an idea of where the shape comes from in the ONNX graph. The shape itself is defined by a list of DimensionObject or ShapeOperator or None if the shape is unknown. A dimension is an integer or a variable encoded as a string. This variable is a way to tell the dimension may vary.