module onnxrt.validate.validate_problems
#
Short summary#
module mlprodict.onnxrt.validate.validate_problems
Validates runtime for many :scikit-learn: operators. The submodule relies on onnxconverter_common, sklearn-onnx.
Functions#
function |
truncated documentation |
---|---|
Modifies the number of features to increase or reduce the number of features. |
|
Returns X, y, intial_types, method, name, X runtime for a scoring problem. It is based on Iris dataset. |
|
Returns X, y, intial_types, method, name, X runtime for a scoring problem. Binary classification. It is based … |
|
Returns X, y, intial_types, method, name, X runtime for a scoring problem. It is based on Iris dataset. |
|
Returns X, y, intial_types, method, name, X runtime for a scoring problem. Binary classification. It is based … |
|
Returns X, intial_types, method, name, X runtime for a clustering problem. It is based on Iris dataset. |
|
Returns X, intial_types, method, name, X runtime for a clustering problem, the score part, not the cluster. It … |
|
Returns a problem for the sklearn.feature_extraction.DictVectorizer. |
|
Returns a problem for the sklearn.feature_extraction.DictVectorizer. |
|
Returns a problem for the sklearn.preprocessing.LabelEncoder. |
|
Returns X, y, intial_types, method, node name, X runtime for a m-cl classification problem. It is based on Iris … |
|
Returns X, y, intial_types, method, name, X runtime for a scoring problem. It is based on Iris dataset. |
|
Returns X, intial_types, method, name, X runtime for a transformation problem. It is based on Iris dataset. |
|
Returns X, intial_types, method, name, X runtime for a transformation problem. It is based on Iris dataset. |
|
Returns X, intial_types, method, name, X runtime for a transformation problem. It is based on Iris dataset. |
|
Returns X, intial_types, method, name, X runtime for a transformation problem. It is based on Iris dataset. |
|
Returns a problem for the sklearn.preprocessing.OneHotEncoder. |
|
Returns X, intial_types, method, name, X runtime for a transformation problem. It is based on Iris dataset. |
|
Returns X, y, intial_types, method, node name, X runtime for a binary classification problem. It is based on Iris … |
|
Returns X, y, intial_types, method, node name, X runtime for a m-cl classification problem. It is based on Iris … |
|
Returns X, y, intial_types, method, node name, X runtime for a m-cl classification problem. It is based on Iris … |
|
Returns X, y, intial_types, method, name, X runtime for a mregression problem. It is based on Iris dataset. |
|
Returns X, y, intial_types, method, name, X runtime for a regression problem. It is based on Iris dataset. |
|
Returns a problem for the :epkg:`sklearn:feature_extraction:text:TfidfTransformer`. |
|
Returns a problem for the :epkg:`sklearn:feature_extraction:text:TfidfVectorizer`. |
|
Determines problems suitable for a given scikit-learn operator. It may be |
Documentation#
Validates runtime for many :scikit-learn: operators. The submodule relies on onnxconverter_common, sklearn-onnx.
- mlprodict.onnxrt.validate.validate_problems._modify_dimension(X, n_features, seed=19)#
Modifies the number of features to increase or reduce the number of features.
- Parameters:
X – features matrix
n_features – number of features
seed – random seed (to get the same dataset at each call)
- Returns:
new featurs matrix
- mlprodict.onnxrt.validate.validate_problems._problem_for_cl_decision_function(dtype=<class 'numpy.float32'>, n_features=None)#
Returns X, y, intial_types, method, name, X runtime for a scoring problem. It is based on Iris dataset.
- mlprodict.onnxrt.validate.validate_problems._problem_for_cl_decision_function_binary(dtype=<class 'numpy.float32'>, n_features=None)#
Returns X, y, intial_types, method, name, X runtime for a scoring problem. Binary classification. It is based on Iris dataset.
- mlprodict.onnxrt.validate.validate_problems._problem_for_clnoproba(dtype=<class 'numpy.float32'>, n_features=None)#
Returns X, y, intial_types, method, name, X runtime for a scoring problem. It is based on Iris dataset.
- mlprodict.onnxrt.validate.validate_problems._problem_for_clnoproba_binary(dtype=<class 'numpy.float32'>, n_features=None, add_nan=False)#
Returns X, y, intial_types, method, name, X runtime for a scoring problem. Binary classification. It is based on Iris dataset.
- mlprodict.onnxrt.validate.validate_problems._problem_for_clustering(dtype=<class 'numpy.float32'>, n_features=None)#
Returns X, intial_types, method, name, X runtime for a clustering problem. It is based on Iris dataset.
- mlprodict.onnxrt.validate.validate_problems._problem_for_clustering_scores(dtype=<class 'numpy.float32'>, n_features=None)#
Returns X, intial_types, method, name, X runtime for a clustering problem, the score part, not the cluster. It is based on Iris dataset.
- mlprodict.onnxrt.validate.validate_problems._problem_for_dict_vectorizer(dtype=<class 'numpy.float32'>, n_features=None)#
Returns a problem for the sklearn.feature_extraction.DictVectorizer.
- mlprodict.onnxrt.validate.validate_problems._problem_for_feature_hasher(dtype=<class 'numpy.float32'>, n_features=None)#
Returns a problem for the sklearn.feature_extraction.DictVectorizer.
- mlprodict.onnxrt.validate.validate_problems._problem_for_label_encoder(dtype=<class 'numpy.int64'>, n_features=None)#
Returns a problem for the sklearn.preprocessing.LabelEncoder.
- mlprodict.onnxrt.validate.validate_problems._problem_for_mixture(dtype=<class 'numpy.float32'>, n_features=None)#
Returns X, y, intial_types, method, node name, X runtime for a m-cl classification problem. It is based on Iris dataset.
- mlprodict.onnxrt.validate.validate_problems._problem_for_numerical_scoring(dtype=<class 'numpy.float32'>, n_features=None)#
Returns X, y, intial_types, method, name, X runtime for a scoring problem. It is based on Iris dataset.
- mlprodict.onnxrt.validate.validate_problems._problem_for_numerical_trainable_transform(dtype=<class 'numpy.float32'>, n_features=None)#
Returns X, intial_types, method, name, X runtime for a transformation problem. It is based on Iris dataset.
- mlprodict.onnxrt.validate.validate_problems._problem_for_numerical_trainable_transform_cl(dtype=<class 'numpy.float32'>, n_features=None)#
Returns X, intial_types, method, name, X runtime for a transformation problem. It is based on Iris dataset.
- mlprodict.onnxrt.validate.validate_problems._problem_for_numerical_transform(dtype=<class 'numpy.float32'>, n_features=None)#
Returns X, intial_types, method, name, X runtime for a transformation problem. It is based on Iris dataset.
- mlprodict.onnxrt.validate.validate_problems._problem_for_numerical_transform_positive(dtype=<class 'numpy.float32'>, n_features=None)#
Returns X, intial_types, method, name, X runtime for a transformation problem. It is based on Iris dataset.
- mlprodict.onnxrt.validate.validate_problems._problem_for_one_hot_encoder(dtype=<class 'numpy.float32'>, n_features=None)#
Returns a problem for the sklearn.preprocessing.OneHotEncoder.
- mlprodict.onnxrt.validate.validate_problems._problem_for_outlier(dtype=<class 'numpy.float32'>, n_features=None)#
Returns X, intial_types, method, name, X runtime for a transformation problem. It is based on Iris dataset.
- mlprodict.onnxrt.validate.validate_problems._problem_for_predictor_binary_classification(dtype=<class 'numpy.float32'>, n_features=None, add_nan=False)#
Returns X, y, intial_types, method, node name, X runtime for a binary classification problem. It is based on Iris dataset.
- mlprodict.onnxrt.validate.validate_problems._problem_for_predictor_multi_classification(dtype=<class 'numpy.float32'>, n_features=None)#
Returns X, y, intial_types, method, node name, X runtime for a m-cl classification problem. It is based on Iris dataset.
- mlprodict.onnxrt.validate.validate_problems._problem_for_predictor_multi_classification_label(dtype=<class 'numpy.float32'>, n_features=None)#
Returns X, y, intial_types, method, node name, X runtime for a m-cl classification problem. It is based on Iris dataset.
- mlprodict.onnxrt.validate.validate_problems._problem_for_predictor_multi_regression(many_output=False, options=None, n_features=None, nbrows=None, dtype=<class 'numpy.float32'>, **kwargs)#
Returns X, y, intial_types, method, name, X runtime for a mregression problem. It is based on Iris dataset.
- mlprodict.onnxrt.validate.validate_problems._problem_for_predictor_regression(many_output=False, options=None, n_features=None, nbrows=None, dtype=<class 'numpy.float32'>, add_nan=False, **kwargs)#
Returns X, y, intial_types, method, name, X runtime for a regression problem. It is based on Iris dataset.
- mlprodict.onnxrt.validate.validate_problems._problem_for_tfidf_transformer(dtype=<class 'numpy.float32'>, n_features=None)#
Returns a problem for the :epkg:`sklearn:feature_extraction:text:TfidfTransformer`.
- mlprodict.onnxrt.validate.validate_problems._problem_for_tfidf_vectorizer(dtype=<class 'numpy.float32'>, n_features=None)#
Returns a problem for the :epkg:`sklearn:feature_extraction:text:TfidfVectorizer`.
- mlprodict.onnxrt.validate.validate_problems.find_suitable_problem(model)#
Determines problems suitable for a given scikit-learn operator. It may be
b-cl: binary classification
m-cl: m-cl classification
m-label: classification m-label (multiple labels possible at the same time)
reg: regression
m-reg: regression multi-output
num-tr: transform numerical features
num-tr-pos: transform numerical positive features
scoring: transform numerical features, target is usually needed
outlier: outlier prediction
linearsvc: classifier without predict_proba
cluster: similar to transform
num+y-tr: similar to transform with targets
num+y-tr-cl: similar to transform with classes
- num-tr-clu: similar to cluster, but returns
scores or distances instead of cluster
key-col: list of dictionaries
text-col: one column of text
Suffix nofit indicates the predictions happens without the model being fitted. This is the case for sklearn.gaussian_process.GaussianProcessRegressor. The suffix -cov indicates the method predict was called with parameter
return_cov=True
, -std tells method predict was called with parameterreturn_std=True
. The suffix-NSV
creates an input variable like the following[('X', FloatTensorType([None, None]))]
. That’s a way to bypass onnxruntime shape checking as one part of the graph is designed to handle any kind of dimensions but apparently, if the input shape is precise, every part of the graph has to be precise. The strings used variables which means it is at the same time precise and unprecise. Suffix'-64'
means the model will do double computations. Suffix-nop
means the classifier does not implement method predict_proba. Suffix-1d
means a one dimension problem (one feature). Suffix-dec
checks method decision_function.The following script gives the list of scikit-learn models and the problem they can be fitted on.
<<<
from mlprodict.onnxrt.validate.validate import ( sklearn_operators, find_suitable_problem) from pyquickhelper.pandashelper import df2rst from pandas import DataFrame res = sklearn_operators() rows = [] for model in res[:20]: name = model['name'] row = dict(name=name) try: prob = find_suitable_problem(model['cl']) if prob is None: continue for p in prob: row[p] = 'X' except RuntimeError: pass rows.append(row) df = DataFrame(rows).set_index('name') df = df.sort_index() print(df2rst(df, index=True))
>>>
name
b-cl
m-cl
cluster
~b-clu-64
~num-tr-clu
~num-tr-clu-64
b-reg
m-reg
~b-reg-64
~m-reg-64
outlier
num+y-tr
num-tr
AffinityPropagation
X
X
Birch
X
X
X
X
BisectingKMeans
X
X
X
X
CCA
X
X
X
X
X
CalibratedClassifierCV
X
X
DictionaryLearning
X
EllipticEnvelope
X
FastICA
X
IncrementalPCA
X
KMeans
X
X
X
X
KernelPCA
X
MeanShift
X
X
MiniBatchDictionaryLearning
X
MiniBatchKMeans
X
X
X
X
MiniBatchNMF
X
MiniBatchSparsePCA
X
PLSCanonical
X
X
X
X
X
PLSRegression
X
X
X
X
X
PLSSVD
X
TransformedTargetRegressor
X
X
X
X
The list is truncated. The full list can be found at scikit-learn Converters and Benchmarks.