Python Runtime for ONNX

This runtime does not take any dependency on scikit-learn, only on numpy, scipy, and has custom implementations in C++ (cython, pybind11).

Inference

The main class reads an ONNX file and may computes predictions based on a runtime implementated in Python. The ONNX model relies on the following operators Python Runtime for ONNX operators.

mlprodict.onnxrt.OnnxInference (self, onnx_or_bytes_or_stream, runtime = None, skip_run = False, inplace = True, input_inplace = False, ir_version = None, target_opset = None, runtime_options = None, session_options = None, inside_loop = False, static_inputs = None, new_outputs = None, new_opset = None, device = None)

Loads an ONNX file or object or stream. Computes the output of the ONNX graph. Several runtimes are available.

  • 'python': the runtime implements every onnx operator needed to run a scikit-learn model by using numpy or C++ code.

  • 'python_compiled': it is the same runtime than the previous one except every operator is called from a compiled function (_build_compile_run) instead for a method going through the list of operator

  • 'onnxruntime1': uses onnxruntime

  • 'onnxruntime2': this mode is mostly used to debug as python handles calling every operator but onnxruntime is called for every of them, this process may fail due to wrong inference type specially of the graph includes custom nodes, in that case, it is better to compute the output of intermediates nodes. It is much slower as fo every output, every node is computed but more robust.

build_intermediate (self, outputs = None, verbose = 0, overwrite_types = None, fLOG = None)

Builds every possible ONNX file which computes one specific intermediate output from the inputs.

check_model (self)

Checks the model follow ONNX conventions.

display_sequence (self, verbose = 1)

Shows the sequence of nodes to run if runtime=='python'.

get_execution_order (self)

This function returns a dictionary {(kind, name): (order, op)}, name can be a node name or a result name. In that case, it gets the execution order than the node which created it. The function returns None if the order is not available (the selected runtime does not return it). kind is either ‘node’ or ‘node’. If two nodes have the same name, returned order is the last one. Initializers gets an execution order equal to -1, inputs to 0, all others results are >= 1.

get_profiling (self, as_df = False)

Returns the profiling after a couple of execution.

global_index (self, name)

Maps every name to one integer to avoid using dictionaries when running the predictions.

infer_shapes (self)

Computes expected shapes.

infer_sizes (self, inputs, context = None)

Computes expected sizes.

infer_types (self)

Computes expected shapes.

reduce_size (self, pickable = False)

Reduces the memory footprint as much as possible.

run (self, inputs, clean_right_away = False, intermediate = False, verbose = 0, node_time = False, overwrite_types = None, yield_ops = None, fLOG = None)

Computes the predictions for this onnx graph.

run2onnx (self, inputs, verbose = 0, fLOG = None, as_parameter = True, suffix = ‘_DBG’, param_name = None, node_type = ‘DEBUG’, domain = ‘DEBUG’, domain_opset = 1)

Executes the graphs with the given inputs, then adds the intermediate results into ONNX nodes in the original graph. Once saved, it can be looked with a tool such as netron.

shape_inference (self)

Infers the shape of the outputs with onnx package.

switch_initializers_dtype (self, model = None, dtype_in = <class ‘numpy.float32’>, dtype_out = <class ‘numpy.float64’>)

Switches all initializers to numpy.float64. If model is None, a simple cast is done. Otherwise, the function assumes the model is a scikit-learn pipeline. This only works if the runtime is 'python'.

to_sequence (self)

Produces a graph to facilitate the execution.

One example…

Python to ONNX

mlprodict.onnx_tools.onnx_grammar.translate_fct2onnx (fct, context = None, cpl = False, context_cpl = None, output_names = None, dtype = <class ‘numpy.float32’>, verbose = 0, fLOG = None)

Translates a function into ONNX. The code it produces is using classes OnnxAbs, OnnxAdd, …

ONNX Export

mlprodict.onnxrt.onnx_inference_exports.OnnxInferenceExport (self, oinf)

Implements methods to export a instance of OnnxInference into json, dot, text, python.

ONNX Structure

mlprodict.onnx_tools.onnx_manipulations.enumerate_model_node_outputs (model, add_node = False, order = False)

Enumerates all the nodes of a model.

mlprodict.onnx_tools.onnx_manipulations.select_model_inputs_outputs (model, outputs = None, inputs = None, infer_shapes = False, overwrite = None, remove_unused = True, verbose = 0, fLOG = None)

Takes a model and changes its outputs.

onnxruntime

mlprodict.onnxrt.onnx_inference_ort.device_to_providers

mlprodict.onnxrt.onnx_inference_ort.get_ort_device

Validation

mlprodict.onnxrt.validate.enumerate_validated_operator_opsets (verbose = 0, opset_min = -1, opset_max = -1, check_runtime = True, debug = False, runtime = ‘python’, models = None, dump_folder = None, store_models = False, benchmark = False, skip_models = None, assume_finite = True, node_time = False, fLOG = <built-in function print>, filter_exp = None, versions = False, extended_list = False, time_kwargs = None, dump_all = False, n_features = None, skip_long_test = True, fail_bad_results = False, filter_scenario = None, time_kwargs_fact = None, time_limit = 4, n_jobs = None)

Tests all possible configurations for all possible operators and returns the results.

mlprodict.onnxrt.validate.side_by_side.side_by_side_by_values (sessions, args, inputs = None, return_results = False, kwargs)

Compares the execution of two sessions. It calls method OnnxInference.run with value intermediate=True and compares the results.

mlprodict.onnxrt.validate.summary_report (df, add_cols = None, add_index = None)

Finalizes the results computed by function enumerate_validated_operator_opsets.

mlprodict.onnxrt.validate.validate_graph.plot_validate_benchmark

C++ classes

Gather

mlprodict.onnxrt.ops_cpu.op_gather_.GatherDouble (self, arg0)

Implements runtime for operator Gather. The code is inspired from tfidfvectorizer.cc in onnxruntime.

mlprodict.onnxrt.ops_cpu.op_gather_.GatherFloat (self, arg0)

Implements runtime for operator Gather. The code is inspired from tfidfvectorizer.cc in onnxruntime.

mlprodict.onnxrt.ops_cpu.op_gather_.GatherInt64 (self, arg0)

Implements runtime for operator Gather. The code is inspired from tfidfvectorizer.cc in onnxruntime.

ArrayFeatureExtractor

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.array_feature_extractor_double (arg0, arg1)

array_feature_extractor_double(arg0: numpy.ndarray[numpy.float64], arg1: numpy.ndarray[numpy.int64]) -> numpy.ndarray[numpy.float64]

C++ implementation of operator ArrayFeatureExtractor for float64. The function only works with contiguous arrays.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.array_feature_extractor_float (arg0, arg1)

array_feature_extractor_float(arg0: numpy.ndarray[numpy.float32], arg1: numpy.ndarray[numpy.int64]) -> numpy.ndarray[numpy.float32]

C++ implementation of operator ArrayFeatureExtractor for float32. The function only works with contiguous arrays.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.array_feature_extractor_int64 (arg0, arg1)

array_feature_extractor_int64(arg0: numpy.ndarray[numpy.int64], arg1: numpy.ndarray[numpy.int64]) -> numpy.ndarray[numpy.int64]

C++ implementation of operator ArrayFeatureExtractor for int64. The function only works with contiguous arrays.

SVM

mlprodict.onnxrt.ops_cpu.op_svm_classifier_.RuntimeSVMClassifier

mlprodict.onnxrt.ops_cpu.op_svm_regressor_.RuntimeSVMRegressor

Tree Ensemble

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_classifier_.RuntimeTreeEnsembleClassifierDouble (self)

Implements runtime for operator TreeEnsembleClassifier. The code is inspired from tree_ensemble_classifier.cc in onnxruntime. Supports double only.

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_classifier_.RuntimeTreeEnsembleClassifierFloat (self)

Implements runtime for operator TreeEnsembleClassifier. The code is inspired from tree_ensemble_classifier.cc in onnxruntime. Supports float only.

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_regressor_.RuntimeTreeEnsembleRegressorDouble (self)

Implements double runtime for operator TreeEnsembleRegressor. The code is inspired from tree_ensemble_regressor.cc in onnxruntime. Supports double only.

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_regressor_.RuntimeTreeEnsembleRegressorFloat (self)

Implements float runtime for operator TreeEnsembleRegressor. The code is inspired from tree_ensemble_regressor.cc in onnxruntime. Supports float only.

Still tree ensembles but refactored.

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_classifier_p_.RuntimeTreeEnsembleClassifierPDouble (self, arg0, arg1, arg2, arg3)

Implements double runtime for operator TreeEnsembleClassifier. The code is inspired from tree_ensemble_Classifier.cc in onnxruntime. Supports double only.

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_classifier_p_.RuntimeTreeEnsembleClassifierPFloat (self, arg0, arg1, arg2, arg3)

Implements float runtime for operator TreeEnsembleClassifier. The code is inspired from tree_ensemble_Classifier.cc in onnxruntime. Supports float only.

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_regressor_p_.RuntimeTreeEnsembleRegressorPDouble (self, arg0, arg1, arg2, arg3)

Implements double runtime for operator TreeEnsembleRegressor. The code is inspired from tree_ensemble_regressor.cc in onnxruntime. Supports double only.

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_regressor_p_.RuntimeTreeEnsembleRegressorPFloat (self, arg0, arg1, arg2, arg3)

Implements float runtime for operator TreeEnsembleRegressor. The code is inspired from tree_ensemble_regressor.cc in onnxruntime. Supports float only.

Topk

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_max_double (arg0, arg1, arg2, arg3)

topk_element_max_double(arg0: numpy.ndarray[numpy.float64], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[numpy.int64]

C++ implementation of operator TopK for float32. The function only works with contiguous arrays. The function is parallelized for more than th_para rows. It only does it on the last axis.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_max_float (arg0, arg1, arg2, arg3)

topk_element_max_float(arg0: numpy.ndarray[numpy.float32], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[numpy.int64]

C++ implementation of operator TopK for float32. The function only works with contiguous arrays. The function is parallelized for more than th_para rows. It only does it on the last axis.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_max_int64 (arg0, arg1, arg2, arg3)

topk_element_max_int64(arg0: numpy.ndarray[numpy.int64], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[numpy.int64]

C++ implementation of operator TopK for float32. The function only works with contiguous arrays. The function is parallelized for more than th_para rows. It only does it on the last axis.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_min_double (arg0, arg1, arg2, arg3)

topk_element_min_double(arg0: numpy.ndarray[numpy.float64], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[numpy.int64]

C++ implementation of operator TopK for float32. The function only works with contiguous arrays. The function is parallelized for more than th_para rows. It only does it on the last axis.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_min_float (arg0, arg1, arg2, arg3)

topk_element_min_float(arg0: numpy.ndarray[numpy.float32], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[numpy.int64]

C++ implementation of operator TopK for float32. The function only works with contiguous arrays. The function is parallelized for more than th_para rows. It only does it on the last axis.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_min_int64 (arg0, arg1, arg2, arg3)

topk_element_min_int64(arg0: numpy.ndarray[numpy.int64], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[numpy.int64]

C++ implementation of operator TopK for float32. The function only works with contiguous arrays. The function is parallelized for more than th_para rows. It only does it on the last axis.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_fetch_double (arg0, arg1)

topk_element_fetch_double(arg0: numpy.ndarray[numpy.float64], arg1: numpy.ndarray[numpy.int64]) -> numpy.ndarray[numpy.float64]

Fetches the top k element knowing their indices on each row (= last dimension for a multi dimension array).

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_fetch_float (arg0, arg1)

topk_element_fetch_float(arg0: numpy.ndarray[numpy.float32], arg1: numpy.ndarray[numpy.int64]) -> numpy.ndarray[numpy.float32]

Fetches the top k element knowing their indices on each row (= last dimension for a multi dimension array).

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_fetch_int64 (arg0, arg1)

topk_element_fetch_int64(arg0: numpy.ndarray[numpy.int64], arg1: numpy.ndarray[numpy.int64]) -> numpy.ndarray[numpy.int64]

Fetches the top k element knowing their indices on each row (= last dimension for a multi dimension array).

Shapes

The computation of the predictions through epkg:ONNX may be optimized if the shape of every nodes is known. For example, one possible optimisation is to do inplace computation every time it is possible but this is only possible if the size of the input and output are the same. We could compute the predictions for a sample and check the sizes are the same but that could be luck. We could also guess from a couple of samples with different sizes and assume sizes and polynomial functions of the input size. But in rare occasions, that could be luck too. So one way of doing it is to implement a method _set_shape_inference_runtime which works the same say as method _run_sequence_runtime but handles shapes instead. Following class tries to implement a way to keep track of shape along the shape.

mlprodict.onnxrt.shape_object.ShapeObject (self, shape, dtype = None, use_n1 = False, name = None, subtype = None)

Handles mathematical operations around shapes. It stores a type (numpy type), and a name to somehow have an idea of where the shape comes from in the ONNX graph. The shape itself is defined by a list of DimensionObject or ShapeOperator or None if the shape is unknown. A dimension is an integer or a variable encoded as a string. This variable is a way to tell the dimension may vary.