module onnxrt.validate.validate#

Short summary#

module mlprodict.onnxrt.validate.validate

Validates runtime for many :scikit-learn: operators. The submodule relies on onnxconverter_common, sklearn-onnx.

source on GitHub

Functions#

function

truncated documentation

_call_conv_runtime_opset

_call_runtime

Private.

_check_run_benchmark

_dofit_model

_enumerate_validated_operator_opsets_ops

_enumerate_validated_operator_opsets_version

_retrieve_problems_extra

Use by enumerate_compatible_opset().

_run_skl_prediction

enumerate_compatible_opset

Lists all compatible opsets for a specific model.

enumerate_validated_operator_opsets

Tests all possible configurations for all possible operators and returns the results.

Documentation#

Validates runtime for many :scikit-learn: operators. The submodule relies on onnxconverter_common, sklearn-onnx.

source on GitHub

mlprodict.onnxrt.validate.validate._call_conv_runtime_opset(obs, opsets, debug, new_conv_options, model, prob, scenario, extra, extras, conv_options, init_types, inst, optimisations, verbose, benchmark, runtime, filter_scenario, check_runtime, X_test, y_test, ypred, Xort_test, method_name, output_index, kwargs, time_limit, fLOG)#
mlprodict.onnxrt.validate.validate._call_runtime(obs_op, conv, opset, debug, inst, runtime, X_test, y_test, init_types, method_name, output_index, ypred, Xort_test, model, dump_folder, benchmark, node_time, fLOG, verbose, store_models, time_kwargs, dump_all, skip_long_test, time_limit)#

Private.

source on GitHub

mlprodict.onnxrt.validate.validate._check_run_benchmark(benchmark, stat_onnx, bench_memo, runtime)#
mlprodict.onnxrt.validate.validate._dofit_model(dofit, obs, inst, X_train, y_train, X_test, y_test, Xort_test, init_types, store_models, debug, verbose, fLOG)#
mlprodict.onnxrt.validate.validate._enumerate_validated_operator_opsets_ops(extended_list, models, skip_models)#
mlprodict.onnxrt.validate.validate._enumerate_validated_operator_opsets_version(runtime)#
mlprodict.onnxrt.validate.validate._retrieve_problems_extra(model, verbose, fLOG, extended_list)#

Use by enumerate_compatible_opset.

source on GitHub

mlprodict.onnxrt.validate.validate._run_skl_prediction(obs, check_runtime, assume_finite, inst, method_name, predict_kwargs, X_test, benchmark, debug, verbose, time_kwargs, skip_long_test, time_kwargs_fact, fLOG)#
mlprodict.onnxrt.validate.validate.enumerate_compatible_opset(model, opset_min=-1, opset_max=-1, check_runtime=True, debug=False, runtime='python', dump_folder=None, store_models=False, benchmark=False, assume_finite=True, node_time=False, fLOG=<built-in function print>, filter_exp=None, verbose=0, time_kwargs=None, extended_list=False, dump_all=False, n_features=None, skip_long_test=True, filter_scenario=None, time_kwargs_fact=None, time_limit=4, n_jobs=None)#

Lists all compatible opsets for a specific model.

Parameters:
  • model – operator class

  • opset_min – starts with this opset

  • opset_max – ends with this opset (None to use current onnx opset)

  • check_runtime – checks that runtime can consume the model and compute predictions

  • debug – catch exception (True) or not (False)

  • runtime – test a specific runtime, by default 'python'

  • dump_folder – dump information to replicate in case of mismatch

  • dump_all – dump all models not only the one which fail

  • store_models – if True, the function also stores the fitted model and its conversion into ONNX

  • benchmark – if True, measures the time taken by each function to predict for different number of rows

  • fLOG – logging function

  • filter_exp – function which tells if the experiment must be run, None to run all, takes model, problem as an input

  • filter_scenario – second function which tells if the experiment must be run, None to run all, takes model, problem, scenario, extra, options as an input

  • node_time – collect time for each node in the ONNX graph

  • assume_finite – See config_context, If True, validation for finiteness will be skipped, saving time, but leading to potential crashes. If False, validation for finiteness will be performed, avoiding error.

  • verbose – verbosity

  • extended_list – extends the list to custom converters and problems

  • time_kwargs – to define a more precise way to measure a model

  • n_features – modifies the shorts datasets used to train the models to use exactly this number of features, it can also be a list to test multiple datasets

  • skip_long_test – skips tests for high values of N if they seem too long

  • time_kwargs_fact – see _multiply_time_kwargs

  • time_limit – to stop benchmarking after this amount of time was spent

  • n_jobsn_jobs is set to the number of CPU by default unless this value is changed

Returns:

dictionaries, each row has the following keys: opset, exception if any, conversion time, problem chosen to test the conversion…

The function requires sklearn-onnx. The outcome can be seen at pages references by scikit-learn Converters and Benchmarks. The parameter time_kwargs is a dictionary which defines the number of times to repeat the same predictions in order to give more precise figures. The default value (if None) is returned by the following code:

<<<

from mlprodict.onnxrt.validate.validate_helper import default_time_kwargs
import pprint
pprint.pprint(default_time_kwargs())

>>>

    {1: {'number': 15, 'repeat': 20},
     10: {'number': 10, 'repeat': 20},
     100: {'number': 4, 'repeat': 10},
     1000: {'number': 4, 'repeat': 4},
     10000: {'number': 2, 'repeat': 2}}

Parameter time_kwargs_fact multiples these values for some specific models. 'lin' multiplies by 10 when the model is linear.

source on GitHub

mlprodict.onnxrt.validate.validate.enumerate_validated_operator_opsets(verbose=0, opset_min=-1, opset_max=-1, check_runtime=True, debug=False, runtime='python', models=None, dump_folder=None, store_models=False, benchmark=False, skip_models=None, assume_finite=True, node_time=False, fLOG=<built-in function print>, filter_exp=None, versions=False, extended_list=False, time_kwargs=None, dump_all=False, n_features=None, skip_long_test=True, fail_bad_results=False, filter_scenario=None, time_kwargs_fact=None, time_limit=4, n_jobs=None)#

Tests all possible configurations for all possible operators and returns the results.

Parameters:
  • verbose – integer 0, 1, 2

  • opset_min – checks conversion starting from the opset, -1 to get the last one

  • opset_max – checks conversion up to this opset, None means __max_supported_opset__

  • check_runtime – checks the python runtime

  • models – only process a small list of operators, set of model names

  • debug – stops whenever an exception is raised

  • runtime – test a specific runtime, by default 'python'

  • dump_folder – dump information to replicate in case of mismatch

  • dump_all – dump all models not only the one which fail

  • store_models – if True, the function also stores the fitted model and its conversion into ONNX

  • benchmark – if True, measures the time taken by each function to predict for different number of rows

  • filter_exp – function which tells if the experiment must be run, None to run all, takes model, problem as an input

  • filter_scenario – second function which tells if the experiment must be run, None to run all, takes model, problem, scenario, extra, options as an input

  • skip_models – models to skip

  • assume_finite

    See config_context, If True, validation for finiteness will be skipped, saving time, but leading to potential crashes. If False, validation for finiteness will be performed, avoiding error.

  • node_time – measure time execution for every node in the graph

  • versions – add columns with versions of used packages, numpy, scikit-learn, onnx, onnxruntime, sklearn-onnx

  • extended_list – also check models this module implements a converter for

  • time_kwargs – to define a more precise way to measure a model

  • n_features – modifies the shorts datasets used to train the models to use exactly this number of features, it can also be a list to test multiple datasets

  • skip_long_test – skips tests for high values of N if they seem too long

  • fail_bad_results – fails if the results are aligned with scikit-learn

  • time_kwargs_fact – see _multiply_time_kwargs

  • time_limit – to skip the rest of the test after this limit (in second)

  • n_jobsn_jobs is set to the number of CPU by default unless this value is changed

  • fLOG – logging function

Returns:

list of dictionaries

The function is available through command line validate_runtime. The default for time_kwargs is the following:

<<<

from mlprodict.onnxrt.validate.validate_helper import default_time_kwargs
import pprint
pprint.pprint(default_time_kwargs())

>>>

    {1: {'number': 15, 'repeat': 20},
     10: {'number': 10, 'repeat': 20},
     100: {'number': 4, 'repeat': 10},
     1000: {'number': 4, 'repeat': 4},
     10000: {'number': 2, 'repeat': 2}}

source on GitHub