module onnxrt.validate.validate_problems#

Short summary#

module mlprodict.onnxrt.validate.validate_problems

Validates runtime for many :scikit-learn: operators. The submodule relies on onnxconverter_common, sklearn-onnx.

source on GitHub

Functions#

function

truncated documentation

_modify_dimension

Modifies the number of features to increase or reduce the number of features.

_problem_for_cl_decision_function

Returns X, y, intial_types, method, name, X runtime for a scoring problem. It is based on Iris dataset.

_problem_for_cl_decision_function_binary

Returns X, y, intial_types, method, name, X runtime for a scoring problem. Binary classification. It is based …

_problem_for_clnoproba

Returns X, y, intial_types, method, name, X runtime for a scoring problem. It is based on Iris dataset.

_problem_for_clnoproba_binary

Returns X, y, intial_types, method, name, X runtime for a scoring problem. Binary classification. It is based …

_problem_for_clustering

Returns X, intial_types, method, name, X runtime for a clustering problem. It is based on Iris dataset.

_problem_for_clustering_scores

Returns X, intial_types, method, name, X runtime for a clustering problem, the score part, not the cluster. It …

_problem_for_dict_vectorizer

Returns a problem for the sklearn.feature_extraction.DictVectorizer.

_problem_for_feature_hasher

Returns a problem for the sklearn.feature_extraction.DictVectorizer.

_problem_for_label_encoder

Returns a problem for the sklearn.preprocessing.LabelEncoder.

_problem_for_mixture

Returns X, y, intial_types, method, node name, X runtime for a m-cl classification problem. It is based on Iris …

_problem_for_numerical_scoring

Returns X, y, intial_types, method, name, X runtime for a scoring problem. It is based on Iris dataset.

_problem_for_numerical_trainable_transform

Returns X, intial_types, method, name, X runtime for a transformation problem. It is based on Iris dataset.

_problem_for_numerical_trainable_transform_cl

Returns X, intial_types, method, name, X runtime for a transformation problem. It is based on Iris dataset.

_problem_for_numerical_transform

Returns X, intial_types, method, name, X runtime for a transformation problem. It is based on Iris dataset.

_problem_for_numerical_transform_positive

Returns X, intial_types, method, name, X runtime for a transformation problem. It is based on Iris dataset.

_problem_for_one_hot_encoder

Returns a problem for the sklearn.preprocessing.OneHotEncoder.

_problem_for_outlier

Returns X, intial_types, method, name, X runtime for a transformation problem. It is based on Iris dataset.

_problem_for_predictor_binary_classification

Returns X, y, intial_types, method, node name, X runtime for a binary classification problem. It is based on Iris …

_problem_for_predictor_multi_classification

Returns X, y, intial_types, method, node name, X runtime for a m-cl classification problem. It is based on Iris …

_problem_for_predictor_multi_classification_label

Returns X, y, intial_types, method, node name, X runtime for a m-cl classification problem. It is based on Iris …

_problem_for_predictor_multi_regression

Returns X, y, intial_types, method, name, X runtime for a mregression problem. It is based on Iris dataset.

_problem_for_predictor_regression

Returns X, y, intial_types, method, name, X runtime for a regression problem. It is based on Iris dataset.

_problem_for_tfidf_transformer

Returns a problem for the :epkg:`sklearn:feature_extraction:text:TfidfTransformer`.

_problem_for_tfidf_vectorizer

Returns a problem for the :epkg:`sklearn:feature_extraction:text:TfidfVectorizer`.

find_suitable_problem

Determines problems suitable for a given scikit-learn operator. It may be

Documentation#

Validates runtime for many :scikit-learn: operators. The submodule relies on onnxconverter_common, sklearn-onnx.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._modify_dimension(X, n_features, seed=19)#

Modifies the number of features to increase or reduce the number of features.

Parameters:
  • X – features matrix

  • n_features – number of features

  • seed – random seed (to get the same dataset at each call)

Returns:

new featurs matrix

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_cl_decision_function(dtype=<class 'numpy.float32'>, n_features=None)#

Returns X, y, intial_types, method, name, X runtime for a scoring problem. It is based on Iris dataset.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_cl_decision_function_binary(dtype=<class 'numpy.float32'>, n_features=None)#

Returns X, y, intial_types, method, name, X runtime for a scoring problem. Binary classification. It is based on Iris dataset.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_clnoproba(dtype=<class 'numpy.float32'>, n_features=None)#

Returns X, y, intial_types, method, name, X runtime for a scoring problem. It is based on Iris dataset.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_clnoproba_binary(dtype=<class 'numpy.float32'>, n_features=None, add_nan=False)#

Returns X, y, intial_types, method, name, X runtime for a scoring problem. Binary classification. It is based on Iris dataset.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_clustering(dtype=<class 'numpy.float32'>, n_features=None)#

Returns X, intial_types, method, name, X runtime for a clustering problem. It is based on Iris dataset.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_clustering_scores(dtype=<class 'numpy.float32'>, n_features=None)#

Returns X, intial_types, method, name, X runtime for a clustering problem, the score part, not the cluster. It is based on Iris dataset.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_dict_vectorizer(dtype=<class 'numpy.float32'>, n_features=None)#

Returns a problem for the sklearn.feature_extraction.DictVectorizer.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_feature_hasher(dtype=<class 'numpy.float32'>, n_features=None)#

Returns a problem for the sklearn.feature_extraction.DictVectorizer.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_label_encoder(dtype=<class 'numpy.int64'>, n_features=None)#

Returns a problem for the sklearn.preprocessing.LabelEncoder.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_mixture(dtype=<class 'numpy.float32'>, n_features=None)#

Returns X, y, intial_types, method, node name, X runtime for a m-cl classification problem. It is based on Iris dataset.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_numerical_scoring(dtype=<class 'numpy.float32'>, n_features=None)#

Returns X, y, intial_types, method, name, X runtime for a scoring problem. It is based on Iris dataset.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_numerical_trainable_transform(dtype=<class 'numpy.float32'>, n_features=None)#

Returns X, intial_types, method, name, X runtime for a transformation problem. It is based on Iris dataset.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_numerical_trainable_transform_cl(dtype=<class 'numpy.float32'>, n_features=None)#

Returns X, intial_types, method, name, X runtime for a transformation problem. It is based on Iris dataset.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_numerical_transform(dtype=<class 'numpy.float32'>, n_features=None)#

Returns X, intial_types, method, name, X runtime for a transformation problem. It is based on Iris dataset.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_numerical_transform_positive(dtype=<class 'numpy.float32'>, n_features=None)#

Returns X, intial_types, method, name, X runtime for a transformation problem. It is based on Iris dataset.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_one_hot_encoder(dtype=<class 'numpy.float32'>, n_features=None)#

Returns a problem for the sklearn.preprocessing.OneHotEncoder.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_outlier(dtype=<class 'numpy.float32'>, n_features=None)#

Returns X, intial_types, method, name, X runtime for a transformation problem. It is based on Iris dataset.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_predictor_binary_classification(dtype=<class 'numpy.float32'>, n_features=None, add_nan=False)#

Returns X, y, intial_types, method, node name, X runtime for a binary classification problem. It is based on Iris dataset.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_predictor_multi_classification(dtype=<class 'numpy.float32'>, n_features=None)#

Returns X, y, intial_types, method, node name, X runtime for a m-cl classification problem. It is based on Iris dataset.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_predictor_multi_classification_label(dtype=<class 'numpy.float32'>, n_features=None)#

Returns X, y, intial_types, method, node name, X runtime for a m-cl classification problem. It is based on Iris dataset.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_predictor_multi_regression(many_output=False, options=None, n_features=None, nbrows=None, dtype=<class 'numpy.float32'>, **kwargs)#

Returns X, y, intial_types, method, name, X runtime for a mregression problem. It is based on Iris dataset.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_predictor_regression(many_output=False, options=None, n_features=None, nbrows=None, dtype=<class 'numpy.float32'>, add_nan=False, **kwargs)#

Returns X, y, intial_types, method, name, X runtime for a regression problem. It is based on Iris dataset.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_tfidf_transformer(dtype=<class 'numpy.float32'>, n_features=None)#

Returns a problem for the :epkg:`sklearn:feature_extraction:text:TfidfTransformer`.

source on GitHub

mlprodict.onnxrt.validate.validate_problems._problem_for_tfidf_vectorizer(dtype=<class 'numpy.float32'>, n_features=None)#

Returns a problem for the :epkg:`sklearn:feature_extraction:text:TfidfVectorizer`.

source on GitHub

mlprodict.onnxrt.validate.validate_problems.find_suitable_problem(model)#

Determines problems suitable for a given scikit-learn operator. It may be

  • b-cl: binary classification

  • m-cl: m-cl classification

  • m-label: classification m-label (multiple labels possible at the same time)

  • reg: regression

  • m-reg: regression multi-output

  • num-tr: transform numerical features

  • num-tr-pos: transform numerical positive features

  • scoring: transform numerical features, target is usually needed

  • outlier: outlier prediction

  • linearsvc: classifier without predict_proba

  • cluster: similar to transform

  • num+y-tr: similar to transform with targets

  • num+y-tr-cl: similar to transform with classes

  • num-tr-clu: similar to cluster, but returns

    scores or distances instead of cluster

  • key-col: list of dictionaries

  • text-col: one column of text

Suffix nofit indicates the predictions happens without the model being fitted. This is the case for sklearn.gaussian_process.GaussianProcessRegressor. The suffix -cov indicates the method predict was called with parameter return_cov=True, -std tells method predict was called with parameter return_std=True. The suffix -NSV creates an input variable like the following [('X', FloatTensorType([None, None]))]. That’s a way to bypass onnxruntime shape checking as one part of the graph is designed to handle any kind of dimensions but apparently, if the input shape is precise, every part of the graph has to be precise. The strings used variables which means it is at the same time precise and unprecise. Suffix '-64' means the model will do double computations. Suffix -nop means the classifier does not implement method predict_proba. Suffix -1d means a one dimension problem (one feature). Suffix -dec checks method decision_function.

The following script gives the list of scikit-learn models and the problem they can be fitted on.

<<<

from mlprodict.onnxrt.validate.validate import (
    sklearn_operators, find_suitable_problem)
from pyquickhelper.pandashelper import df2rst
from pandas import DataFrame
res = sklearn_operators()
rows = []
for model in res[:20]:
    name = model['name']
    row = dict(name=name)
    try:
        prob = find_suitable_problem(model['cl'])
        if prob is None:
            continue
        for p in prob:
            row[p] = 'X'
    except RuntimeError:
        pass
    rows.append(row)
df = DataFrame(rows).set_index('name')
df = df.sort_index()
print(df2rst(df, index=True))

>>>

name

b-cl

m-cl

cluster

~b-clu-64

~num-tr-clu

~num-tr-clu-64

b-reg

m-reg

~b-reg-64

~m-reg-64

outlier

num+y-tr

num-tr

AffinityPropagation

X

X

Birch

X

X

X

X

BisectingKMeans

X

X

X

X

CCA

X

X

X

X

X

CalibratedClassifierCV

X

X

DictionaryLearning

X

EllipticEnvelope

X

FastICA

X

IncrementalPCA

X

KMeans

X

X

X

X

KernelPCA

X

MeanShift

X

X

MiniBatchDictionaryLearning

X

MiniBatchKMeans

X

X

X

X

MiniBatchNMF

X

MiniBatchSparsePCA

X

PLSCanonical

X

X

X

X

X

PLSRegression

X

X

X

X

X

PLSSVD

X

TransformedTargetRegressor

X

X

X

X

The list is truncated. The full list can be found at scikit-learn Converters and Benchmarks.

source on GitHub