ONNX Converters Coverage and Benchmarks

sklearn-onnx converts many scikit-learn models into ONNX. Every of them is tested against a couple of runtimes. The following pages shows which models are correctly converted and compares the predictions obtained by every runtime (see Runtimes for ONNX). It also displays some figures on how the runtime behave compare to scikit-learn in term of speed processing. The benchmark evaluates every model on a dataset inspired from the Iris dataset, so with four features, and different number of observations N= 1, 10, 100, 1.000, 10.000, 100.000. The measures for high values of N may be missing because the first one took too long.

Another benchmark based on asv is available and shows similar results but also measure the memory peaks : ASV Benchmark.

Versions

All results were obtained using out the following versions of modules below:

<<<

from mlprodict.onnxrt.validate.validate_helper import modules_list
from pyquickhelper.pandashelper import df2rst
from pandas import DataFrame
print(df2rst(DataFrame(modules_list())))

>>>

name

version

mlprodict

0.4.1266

numpy

1.19.2

onnx

1.7.1076

onnxmltools

1.7.92

onnxruntime

1.5.99

pandas

1.1.2

scipy

1.5.2

skl2onnx

1.7.1081

sklearn

0.23.2

On:

onnxruntime is compiled with the following options:

Supported models

Every model is tested through a defined list of standard problems created from the Iris dataset. Function find_suitable_problem describes the list of considered problems.

<<<

from mlprodict.onnxrt.validate.validate import sklearn_operators, find_suitable_problem
from pyquickhelper.pandashelper import df2rst
from pandas import DataFrame
res = sklearn_operators(extended=True)
rows = []
for model in res:
    name = model['name']
    row = dict(name=name)
    try:
        prob = find_suitable_problem(model['cl'])
        if prob is None:
            continue
        for p in prob:
            row[p] = 'X'
    except RuntimeError:
        pass
    rows.append(row)
df = DataFrame(rows).set_index('name')
df = df.sort_index()
cols = list(sorted(df.columns))
df = df[cols]
print(df2rst(df, index=True))

>>>

name

b-cl

b-reg

bow

cluster

int-col

key-int-col

key-str-col

m-cl

m-reg

num+y-tr

num+y-tr-cl

num-tr

num-tr-pos

one-hot

outlier

text-col

~b-cl-64

~b-cl-dec

~b-cl-f100

~b-cl-nan

~b-cl-nop

~b-cl-nop-64

~b-clu-64

~b-reg-1d

~b-reg-64

~b-reg-NF-64

~b-reg-NF-cov-64

~b-reg-NF-std-64

~b-reg-NSV-64

~b-reg-cov-64

~b-reg-f100

~b-reg-nan

~b-reg-nan-64

~b-reg-std-NSV-64

~m-cl-dec

~m-cl-nop

~m-label

~m-reg-64

~num+y-tr-1d

~num-tr-clu

~num-tr-clu-64

ARDRegression

X

X

AdaBoostClassifier

X

X

X

AdaBoostRegressor

X

X

AdditiveChi2Sampler

X

AffinityPropagation

X

X

BaggingClassifier

X

X

BaggingRegressor

X

X

X

X

BayesianGaussianMixture

X

X

X

BayesianRidge

X

X

BernoulliNB

X

X

BernoulliRBM

X

Binarizer

X

Birch

X

X

X

X

Booster

CCA

X

X

X

X

X

CalibratedClassifierCV

X

X

CategoricalNB

X

X

X

ClassifierChain

X

X

X

ComplementNB

X

X

CountVectorizer

X

DecisionTreeClassifier

X

X

X

X

X

DecisionTreeRegressor

X

X

X

X

X

DictVectorizer

X

DictionaryLearning

X

ElasticNet

X

X

X

X

ElasticNetCV

X

X

EllipticEnvelope

X

ExtraTreeClassifier

X

X

X

X

X

ExtraTreeRegressor

X

X

X

X

ExtraTreesClassifier

X

X

X

ExtraTreesRegressor

X

X

X

X

FactorAnalysis

X

FastICA

X

FeatureHasher

X

FunctionTransformer

X

GammaRegressor

X

X

X

X

GaussianMixture

X

X

X

GaussianNB

X

X

GaussianProcessClassifier

X

X

X

GaussianProcessRegressor

X

X

X

X

X

X

X

X

GaussianRandomProjection

X

GenericUnivariateSelect

X

GradientBoostingClassifier

X

X

GradientBoostingRegressor

X

X

GridSearchCV

X

X

X

X

X

X

X

X

X

HashingVectorizer

X

HistGradientBoostingClassifier

X

X

X

X

HistGradientBoostingRegressor

X

X

X

X

HuberRegressor

X

X

IncrementalPCA

X

IsolationForest

X

IsotonicRegression

X

X

IterativeImputer

X

KBinsDiscretizer

X

KMeans

X

X

X

X

KNNImputer

X

KNeighborsClassifier

X

X

X

KNeighborsRegressor

X

X

X

X

KNeighborsTransformer

X

KernelCenterer

X

KernelPCA

X

KernelRidge

X

X

X

X

LGBMClassifier

X

X

X

LGBMRegressor

X

X

LabelBinarizer

X

LabelEncoder

X

LabelPropagation

X

X

LabelSpreading

X

X

Lars

X

X

X

X

LarsCV

X

X

Lasso

X

X

X

X

LassoCV

X

X

LassoLars

X

X

X

X

LassoLarsCV

X

X

LassoLarsIC

X

X

LatentDirichletAllocation

X

LinearDiscriminantAnalysis

X

X

LinearRegression

X

X

X

X

LinearSVC

X

X

LinearSVR

X

X

LocalOutlierFactor

X

LogisticRegression

X

X

X

X

X

LogisticRegressionCV

X

X

MLPClassifier

X

X

X

MLPRegressor

X

X

X

X

MaxAbsScaler

X

MeanShift

X

X

MinMaxScaler

X

MiniBatchDictionaryLearning

X

MiniBatchKMeans

X

X

X

X

MiniBatchSparsePCA

X

MissingIndicator

X

MultiLabelBinarizer

X

MultiOutputClassifier

X

X

MultiOutputRegressor

X

MultiTaskElasticNet

X

MultiTaskElasticNetCV

X

MultiTaskLasso

X

MultiTaskLassoCV

X

MultinomialNB

X

X

NMF

X

NearestCentroid

X

X

NeighborhoodComponentsAnalysis

X

Normalizer

X

NuSVC

X

X

X

X

NuSVR

X

X

Nystroem

X

OneClassSVM

X

OneHotEncoder

X

OneVsOneClassifier

X

X

OneVsRestClassifier

X

X

OrdinalEncoder

X

OrthogonalMatchingPursuit

X

X

X

X

OrthogonalMatchingPursuitCV

X

X

OutputCodeClassifier

X

X

PCA

X

PLSCanonical

X

X

X

X

X

PLSRegression

X

X

X

X

X

PLSSVD

X

PassiveAggressiveClassifier

X

X

PassiveAggressiveRegressor

X

X

Perceptron

X

X

X

X

PoissonRegressor

X

X

X

X

PolynomialFeatures

X

PowerTransformer

X

QuadraticDiscriminantAnalysis

X

X

QuantileTransformer

X

RANSACRegressor

X

X

X

X

RBFSampler

X

RFE

X

RFECV

X

RadiusNeighborsClassifier

X

X

RadiusNeighborsRegressor

X

X

X

X

RadiusNeighborsTransformer

X

RandomForestClassifier

X

X

X

X

RandomForestRegressor

X

X

X

X

RandomTreesEmbedding

X

RandomizedSearchCV

X

X

RegressorChain

X

X

X

X

Ridge

X

X

X

X

RidgeCV

X

X

X

X

RidgeClassifier

X

X

X

RidgeClassifierCV

X

X

X

RobustScaler

X

SGDClassifier

X

X

X

X

SGDRegressor

X

X

SVC

X

X

X

X

SVR

X

X

SelectFdr

X

SelectFpr

X

SelectFromModel

X

SelectFwe

X

SelectKBest

X

SelectPercentile

X

SimpleImputer

X

SkewedChi2Sampler

X

SparseCoder

X

SparsePCA

X

SparseRandomProjection

X

StackingClassifier

X

StackingRegressor

X

StandardScaler

X

TfidfTransformer

X

TfidfVectorizer

X

TheilSenRegressor

X

X

TransferTransformer

X

TransformedTargetRegressor

X

X

X

X

TruncatedSVD

X

TweedieRegressor

X

X

X

X

VarianceThreshold

X

VotingClassifier

X

X

VotingRegressor

X

X

X

X

XGBClassifier

X

X

X

XGBRegressor

X

X

Summary graph

The following graph summarizes the performance for every supported models and compares python runtime and onnxruntime to scikit-learn in the same condition. It displays a ratio r. Above 1, it is r times slower than scikit-learn. Below 1, it is 1/r faster than scikit-learn.

import pandas
import matplotlib.pyplot as plt
import numpy
from mlprodict.tools.asv_options_helper import get_opset_number_from_onnx
from mlprodict.plotting.plotting_validate_graph import _model_name

df1 = pandas.read_excel("bench_sum_python_compiled.xlsx")
df2 = pandas.read_excel("bench_sum_onnxruntime1.xlsx")

if 'n_features' not in df1.columns:
    df1["n_features"] = 4
if 'n_features' not in df2.columns:
    df2["n_features"] = 4
df1['optim'] = df1['optim'].fillna('')
df2['optim'] = df2['optim'].fillna('')

last_opset = max(int(_[5:]) for _ in list(df1.columns) if _.startswith("opset"))
opset_col = 'opset%d' % last_opset

df1['opset'] = df1[opset_col].fillna('')
df2['opset'] = df2[opset_col].fillna('')

df1['opset'] = df1['opset'].apply(lambda x: str(last_opset) if "OK %d" % last_opset in x else "")
df2['opset'] = df2['opset'].apply(lambda x: str(last_opset) if "OK %d" % last_opset in x else "")
sops = str(get_opset_number_from_onnx())
oksops = "OK " + str(get_opset_number_from_onnx())
df1['opset'] = df1['opset'].apply(lambda x: sops if oksops in x else "")
df2['opset'] = df2['opset'].apply(lambda x: sops if oksops in x else "")

fmt = "{} [{}-{}|{}] D{}-o{}"
df1["label"] = df1.apply(
    lambda row: fmt.format(
        row["name"], row["problem"], row["scenario"], row["optim"],
        row["n_features"], row["opset"]).replace("-default|", "-*]"), axis=1)
df2["label"] = df2.apply(
    lambda row: fmt.format(
        row["name"], row["problem"], row["scenario"], row["optim"],
        row["n_features"], row["opset"]).replace("-default|", "-*]"), axis=1)
indices = ['label']
values = ['RT/SKL-N=1', 'N=10', 'N=100', 'N=1000', 'N=10000', 'N=100000']
df1 = df1[indices + values]
df2 = df2[indices + values]
df = df1.merge(df2, on="label", suffixes=("__pyrtc", "__ort"), how='outer')

na = df["RT/SKL-N=1__pyrtc"].isnull() & df["RT/SKL-N=1__ort"].isnull()
dfp = df[~na].sort_values("label", ascending=False).reset_index(drop=True)

# dfp = dfp[-10:]

# We add the runtime name as model.
ncol = (dfp.shape[1] - 1) // 2
dfp_legend = dfp.iloc[:3, :].copy()
dfp_legend.iloc[:, 1:] = numpy.nan
dfp_legend.iloc[1, 1:1+ncol] = dfp.iloc[:, 1:1+ncol].mean()
dfp_legend.iloc[2, 1+ncol:] = dfp.iloc[:, 1+ncol:].mean()
dfp_legend.iloc[1, 0] = "avg_" + dfp_legend.columns[1].split('__')[-1]
dfp_legend.iloc[2, 0] = "avg_" + dfp_legend.columns[1+ncol].split('__')[-1]
dfp_legend.iloc[0, 0] = "------"

rleg = dfp_legend.iloc[::-1, :].copy()
rleg.iloc[1, 1:1+ncol] = dfp.iloc[:, 1:1+ncol].median()
rleg.iloc[0, 1+ncol:] = dfp.iloc[:, 1+ncol:].median()
rleg.iloc[1, 0] = "med_" + dfp_legend.columns[1].split('__')[-1]
rleg.iloc[0, 0] = "med_" + dfp_legend.columns[1+ncol].split('__')[-1]

# draw lines between models
dfp = dfp.sort_values('label', ascending=False).copy()
vals = dfp.iloc[:, 1:].values.ravel()
xlim = [max(1e-3, min(0.5, min(vals))), min(1000, max(2, max(vals)))]
i = 0
while i < dfp.shape[0] - 1:
    i += 1
    label = dfp.iloc[i, 0]
    if '[' not in label:
        continue
    prev = dfp.iloc[i-1, 0]
    if '[' not in label:
        continue
    label = label.split()[0]
    prev = prev.split()[0]
    if _model_name(label) == _model_name(prev):
        continue

    blank = dfp.iloc[:1,:].copy()
    blank.iloc[0, 0] = '------'
    blank.iloc[0, 1:] = xlim[0]
    dfp = pandas.concat([dfp[:i], blank, dfp[i:]])
    i += 1
dfp = dfp.reset_index(drop=True).copy()

# add exhaustive statistics
dfp = pandas.concat([rleg, dfp, dfp_legend]).reset_index(drop=True)
dfp["x"] = numpy.arange(0, dfp.shape[0])

# plot
total = dfp.shape[0] * 0.5
fig = plt.figure(figsize=(14, total))

ax = list(None for c in range((dfp.shape[1]-1) // 2))
p = 1.2
b = 0.35
for i in range(len(ax)):
    x1 = i * 1. / len(ax)
    x2 = (i + 0.95) * 1. / len(ax)
    x1 = x1 ** p
    x2 = x2 ** p
    x1 = b + (0.99 - b) * x1
    x2 = b + (0.99 - b) * x2
    bo = [x1, 0.1, x2 - x1, 0.8]
    if True or i == 0:
        ax[i] = fig.add_axes(bo)
    else:
        # Does not work because all graph shows the same
        # labels.
        ax[i] = fig.add_axes(bo, sharey=ax[i-1])

# fig, ax = plt.subplots(1, (dfp.shape[1]-1) // 2, figsize=(14, total),
#                        sharex=False, sharey=True)
x = dfp['x']
height = total / dfp.shape[0] * 0.65
for c in df.columns[1:]:
    place, runtime = c.split('__')
    dec = {'pyrtc': 1, 'ort': -1}
    index = values.index(place)
    yl = dfp.loc[:, c].fillna(0)
    xl = xl = x + dec[runtime] * height / 2
    ax[index].barh(xl, yl, label=runtime, height=height)
    ax[index].set_title(place)
for i in range(len(ax)):
    ax[i].plot([1, 1], [min(x), max(x)], 'g-')
    ax[i].plot([2, 2], [min(x), max(x)], 'r--')
    ax[i].plot([5, 5], [min(x), max(x)], 'r--', lw=3)
    ax[i].set_xscale('log')
    ax[i].set_xlim(xlim)
    ax[i].set_ylim([min(x) - 2, max(x) + 1])

for i in range(1, len(ax)):
    ax[i].set_yticklabels([])

ax[0].set_yticks(x)
ax[0].set_yticklabels(dfp['label'])
fig.subplots_adjust(left=0.35)

plt.show()

(png, hires.png, pdf)

_images/onnx_bench-1.png