Python Runtime for ONNX#

This runtime does not take any dependency on scikit-learn, only on numpy, scipy, and has custom implementations in C++ (cython, pybind11).

Inference#

The main class reads an ONNX file and may computes predictions based on a runtime implementated in Python. The ONNX model relies on the following operators Python Runtime for ONNX operators.

mlprodict.onnxrt.OnnxInference (self, onnx_or_bytes_or_stream, runtime = None, skip_run = False, inplace = True, input_inplace = False, ir_version = None, target_opset = None, runtime_options = None, session_options = None, inside_loop = False, static_inputs = None, new_outputs = None, new_opset = None, existing_functions = None)

Loads an ONNX file or object or stream. Computes the output of the ONNX graph. Several runtimes are available.

  • 'python': the runtime implements every onnx operator needed to run a scikit-learn model by using numpy or C++ code.

  • 'python_compiled': it is the same runtime than the previous one except every operator is called from a compiled function (_build_compile_run) instead for a method going through the list of operator

  • 'onnxruntime1': uses onnxruntime (or onnxruntime1-cuda, …)

  • 'onnxruntime2': this mode is mostly used to debug as python handles calling every operator but onnxruntime is called for every of them, this process may fail due to wrong inference type specially of the graph includes custom nodes, in that case, it is better to compute the output of intermediates nodes. It is much slower as fo every output, every node is computed but more robust.

get_profiling (self, as_df = False)

Returns the profiling after a couple of execution.

run (self, inputs, clean_right_away = False, intermediate = False, verbose = 0, node_time = False, overwrite_types = None, yield_ops = None, fLOG = None, context = None, attributes = None)

Computes the predictions for this onnx graph.

run2onnx (self, inputs, verbose = 0, fLOG = None, as_parameter = True, suffix = ‘_DBG’, param_name = None, node_type = ‘DEBUG’, domain = ‘DEBUG’, domain_opset = 1, attributes = None)

Executes the graphs with the given inputs, then adds the intermediate results into ONNX nodes in the original graph. Once saved, it can be looked with a tool such as netron.

shape_inference (self)

Infers the shape of the outputs with onnx package.

mlprodict.onnxrt.onnx_micro_inference.OnnxMicroRuntime

The following is technically implemented as a runtime but it does shape inference.

mlprodict.onnxrt.OnnxShapeInference (self, model_onnx)

Implements a micro runtime for ONNX graphs. It does not implements all the operator types.

run (self, inputs = None)

Runs shape inference and type given known inputs.

The execution produces a result of type:

mlprodict.onnxrt.ops_shape.shape_container.ShapeContainer (self)

Stores all infered shapes as ShapeResult.

Attributes:

  • shapes: dictionary { result name: ShapeResult }

  • names: some dimensions are unknown and represented as

    variables, this dictionary keeps track of them

  • names_rev: reverse dictionary of names

get (self)

Returns the value of attribute resolved_ (method resolve() must have been called first).

Methods get returns a dictionary mapping result name and the following type:

mlprodict.onnxrt.ops_shape.shape_result.ShapeResult (self, name, shape = None, dtype = None, sparse = False, mtype = OnnxKind.Tensor, constraints = None)

Contains information about shape and type of a result in an onnx graph.

broadcast (sh1, sh2, name = None, dtype = None, same_type = True)

Broadcasts dimensions for an element wise operator.

copy (self, deep = False)

Returns a copy for the result.

is_compatible (self, shape)

Tells if this shape is compatible with the given tuple.

merge (self, other_result)

Merges constraints from other_results into self.

n_dims (self)

Returns the number of dimensions if it is a tensor. Raises an exception otherwise.

resolve (self, variables)

Results variables in a shape using values stored in variables. It does not copy any constraints.

Backend validation#

mlprodict.tools.onnx_backend.enumerate_onnx_tests

mlprodict.tools.onnx_backend.OnnxBackendTest

Python to ONNX#

mlprodict.onnx_tools.onnx_grammar.translate_fct2onnx (fct, context = None, cpl = False, context_cpl = None, output_names = None, dtype = <class ‘numpy.float32’>, verbose = 0, fLOG = None)

Translates a function into ONNX. The code it produces is using classes OnnxAbs, OnnxAdd, …

ONNX Export#

mlprodict.onnxrt.onnx_inference_exports.OnnxInferenceExport (self, oinf)

Implements methods to export a instance of OnnxInference into json, dot, text, python.

ONNX Structure#

mlprodict.onnx_tools.onnx_manipulations.enumerate_model_node_outputs (model, add_node = False, order = False)

Enumerates all the nodes of a model.

mlprodict.onnx_tools.onnx_manipulations.select_model_inputs_outputs (model, outputs = None, inputs = None, infer_shapes = False, overwrite = None, remove_unused = True, verbose = 0, fLOG = None)

Takes a model and changes its outputs.

onnxruntime#

mlprodict.onnxrt.onnx_inference_ort.device_to_providers

mlprodict.onnxrt.onnx_inference_ort.get_ort_device

Validation of scikit-learn models#

mlprodict.onnxrt.validate.enumerate_validated_operator_opsets (verbose = 0, opset_min = -1, opset_max = -1, check_runtime = True, debug = False, runtime = ‘python’, models = None, dump_folder = None, store_models = False, benchmark = False, skip_models = None, assume_finite = True, node_time = False, fLOG = <built-in function print>, filter_exp = None, versions = False, extended_list = False, time_kwargs = None, dump_all = False, n_features = None, skip_long_test = True, fail_bad_results = False, filter_scenario = None, time_kwargs_fact = None, time_limit = 4, n_jobs = None)

Tests all possible configurations for all possible operators and returns the results.

mlprodict.onnxrt.validate.side_by_side.side_by_side_by_values (sessions, args, inputs = None, return_results = False, kwargs)

Compares the execution of two sessions. It calls method OnnxInference.run with value intermediate=True and compares the results.

mlprodict.onnxrt.validate.summary_report (df, add_cols = None, add_index = None)

Finalizes the results computed by function enumerate_validated_operator_opsets.

mlprodict.onnxrt.validate.validate_graph.plot_validate_benchmark

C++ classes#

Conv

mlprodict.onnxrt.ops_cpu.op_conv_helper_.im2col_1d_inplace_float (result, data, kernel_shape, fill_value)

im2col_1d_inplace_float(result: numpy.ndarray[numpy.float32], data: numpy.ndarray[numpy.float32], kernel_shape: numpy.ndarray[numpy.int64], fill_value: float) -> None

Applies im2col_1d on a single vector. The function duplicates the one dimensional tensor so that the convolution can be done through a matrix multiplication. It returns a matrix Nxk where N is the tensor dimension and k the kernal shape.

Gather

mlprodict.onnxrt.ops_cpu.op_gather_.GatherDouble (self, arg0)

Implements runtime for operator Gather. The code is inspired from tfidfvectorizer.cc in onnxruntime.

mlprodict.onnxrt.ops_cpu.op_gather_.GatherFloat (self, arg0)

Implements runtime for operator Gather. The code is inspired from tfidfvectorizer.cc in onnxruntime.

mlprodict.onnxrt.ops_cpu.op_gather_.GatherInt64 (self, arg0)

Implements runtime for operator Gather. The code is inspired from tfidfvectorizer.cc in onnxruntime.

ArrayFeatureExtractor

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.array_feature_extractor_double (arg0, arg1)

array_feature_extractor_double(arg0: numpy.ndarray[numpy.float64], arg1: numpy.ndarray[numpy.int64]) -> numpy.ndarray[numpy.float64]

C++ implementation of operator ArrayFeatureExtractor for float64. The function only works with contiguous arrays.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.array_feature_extractor_float (arg0, arg1)

array_feature_extractor_float(arg0: numpy.ndarray[numpy.float32], arg1: numpy.ndarray[numpy.int64]) -> numpy.ndarray[numpy.float32]

C++ implementation of operator ArrayFeatureExtractor for float32. The function only works with contiguous arrays.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.array_feature_extractor_int64 (arg0, arg1)

array_feature_extractor_int64(arg0: numpy.ndarray[numpy.int64], arg1: numpy.ndarray[numpy.int64]) -> numpy.ndarray[numpy.int64]

C++ implementation of operator ArrayFeatureExtractor for int64. The function only works with contiguous arrays.

SVM

mlprodict.onnxrt.ops_cpu.op_svm_classifier_.RuntimeSVMClassifier

mlprodict.onnxrt.ops_cpu.op_svm_regressor_.RuntimeSVMRegressor

Tree Ensemble

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_classifier_.RuntimeTreeEnsembleClassifierDouble (self)

Implements runtime for operator TreeEnsembleClassifier. The code is inspired from tree_ensemble_classifier.cc in onnxruntime. Supports double only.

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_classifier_.RuntimeTreeEnsembleClassifierFloat (self)

Implements runtime for operator TreeEnsembleClassifier. The code is inspired from tree_ensemble_classifier.cc in onnxruntime. Supports float only.

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_regressor_.RuntimeTreeEnsembleRegressorDouble (self)

Implements double runtime for operator TreeEnsembleRegressor. The code is inspired from tree_ensemble_regressor.cc in onnxruntime. Supports double only.

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_regressor_.RuntimeTreeEnsembleRegressorFloat (self)

Implements float runtime for operator TreeEnsembleRegressor. The code is inspired from tree_ensemble_regressor.cc in onnxruntime. Supports float only.

Still tree ensembles but refactored.

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_classifier_p_.RuntimeTreeEnsembleClassifierPDouble (self, arg0, arg1, arg2, arg3, arg4)

Implements double runtime for operator TreeEnsembleClassifier. The code is inspired from tree_ensemble_Classifier.cc in onnxruntime. Supports double only.

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_classifier_p_.RuntimeTreeEnsembleClassifierPFloat (self, arg0, arg1, arg2, arg3, arg4)

Implements float runtime for operator TreeEnsembleClassifier. The code is inspired from tree_ensemble_Classifier.cc in onnxruntime. Supports float only.

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_regressor_p_.RuntimeTreeEnsembleRegressorPDouble (self, arg0, arg1, arg2, arg3, arg4)

Implements double runtime for operator TreeEnsembleRegressor. The code is inspired from tree_ensemble_regressor.cc in onnxruntime. Supports double only.

mlprodict.onnxrt.ops_cpu.op_tree_ensemble_regressor_p_.RuntimeTreeEnsembleRegressorPFloat (self, arg0, arg1, arg2, arg3, arg4)

Implements float runtime for operator TreeEnsembleRegressor. The code is inspired from tree_ensemble_regressor.cc in onnxruntime. Supports float only.

Topk

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_max_double (arg0, arg1, arg2, arg3)

topk_element_max_double(arg0: numpy.ndarray[numpy.float64], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[numpy.int64]

C++ implementation of operator TopK for float32. The function only works with contiguous arrays. The function is parallelized for more than th_para rows. It only does it on the last axis.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_max_float (arg0, arg1, arg2, arg3)

topk_element_max_float(arg0: numpy.ndarray[numpy.float32], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[numpy.int64]

C++ implementation of operator TopK for float32. The function only works with contiguous arrays. The function is parallelized for more than th_para rows. It only does it on the last axis.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_max_int64 (arg0, arg1, arg2, arg3)

topk_element_max_int64(arg0: numpy.ndarray[numpy.int64], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[numpy.int64]

C++ implementation of operator TopK for float32. The function only works with contiguous arrays. The function is parallelized for more than th_para rows. It only does it on the last axis.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_min_double (arg0, arg1, arg2, arg3)

topk_element_min_double(arg0: numpy.ndarray[numpy.float64], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[numpy.int64]

C++ implementation of operator TopK for float32. The function only works with contiguous arrays. The function is parallelized for more than th_para rows. It only does it on the last axis.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_min_float (arg0, arg1, arg2, arg3)

topk_element_min_float(arg0: numpy.ndarray[numpy.float32], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[numpy.int64]

C++ implementation of operator TopK for float32. The function only works with contiguous arrays. The function is parallelized for more than th_para rows. It only does it on the last axis.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_min_int64 (arg0, arg1, arg2, arg3)

topk_element_min_int64(arg0: numpy.ndarray[numpy.int64], arg1: int, arg2: bool, arg3: int) -> numpy.ndarray[numpy.int64]

C++ implementation of operator TopK for float32. The function only works with contiguous arrays. The function is parallelized for more than th_para rows. It only does it on the last axis.

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_fetch_double (arg0, arg1)

topk_element_fetch_double(arg0: numpy.ndarray[numpy.float64], arg1: numpy.ndarray[numpy.int64]) -> numpy.ndarray[numpy.float64]

Fetches the top k element knowing their indices on each row (= last dimension for a multi dimension array).

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_fetch_float (arg0, arg1)

topk_element_fetch_float(arg0: numpy.ndarray[numpy.float32], arg1: numpy.ndarray[numpy.int64]) -> numpy.ndarray[numpy.float32]

Fetches the top k element knowing their indices on each row (= last dimension for a multi dimension array).

mlprodict.onnxrt.ops_cpu._op_onnx_numpy.topk_element_fetch_int64 (arg0, arg1)

topk_element_fetch_int64(arg0: numpy.ndarray[numpy.int64], arg1: numpy.ndarray[numpy.int64]) -> numpy.ndarray[numpy.int64]

Fetches the top k element knowing their indices on each row (= last dimension for a multi dimension array).