Benchmark (ONNX) for common datasets (regression) with k-NN#

Overview#

The following graph plots the ratio between onnxruntime and scikit-learn. It looks into multiple models for a couple of datasets: diabetes. It computes the prediction time for the following models:

  • ADA: AdaBoostRegressor()

  • DT: DecisionTreeRegressor(max_depth=6)

  • GBT: GradientBoostingRegressor(max_depth=6, n_estimators=100)

  • KNN: KNeighborsRegressor()

  • KNN-cdist: KNeighborsRegressor(), the conversion to ONNX is run with option {'optim': 'cdist'} to use a specific operator to compute pairwise distances

  • LGB: LGBMRegressor(max_depth=6, n_estimators=100)

  • LR: LinearRegression(solver="liblinear", penalty="l2")

  • MLP: MLPRegressor()

  • NuSVR: NuSVC(probability=True)

  • RF: RandomForestRegressor(max_depth=6, n_estimators=100)

  • SVR: SVC(probability=True)

  • XGB: XGBRegressor(max_depth=6, n_estimators=100)

The predictor follows a StandardScaler (or a MinMaxScaler if the model is a Naive Bayes one) in a pipeline if norm=True or is the only object is norm=False. The pipeline looks like make_pipeline(StandardScaler(), estimator()). Three runtimes are tested:

  • skl: scikit-learn,

  • ort: onnxruntime,

  • pyrt: mlprodict, it relies on numpy for most of the operators except trees and svm which use a modified version of the C++ code embedded in onnxruntime,

  • pyrtc: same runtime as pyrt but the graph logic is replaced by a function dynamically compiled when the ONNX file is loaded.

(Source code, png, hires.png, pdf)

../_images/onnxruntime_datasets_num_reg_knn-1.png

Graph X = number of observations to predict#

(Source code, png, hires.png, pdf)

../_images/onnxruntime_datasets_num_reg_knn-2.png

Graph computing time per observations#

The following graph shows the computing cost per observations depending on the batch size. scikit-learn is clearly optimized for batch predictions (= training).

(Source code, png, hires.png, pdf)

../_images/onnxruntime_datasets_num_reg_knn-3.png

Graph of differences between scikit-learn and onnxruntime#

(Source code)

Graph of differences between scikit-learn and python runtime#

(Source code, png, hires.png, pdf)

../_images/onnxruntime_datasets_num_reg_knn-5.png

Configuration#

<<<

from pyquickhelper.pandashelper import df2rst
import pandas
name = os.path.join(
    __WD__, "../../onnx/results/bench_plot_datasets_num_reg_knn.time.csv")
df = pandas.read_csv(name)
print(df2rst(df, number_format=4))

>>>

name

version

value

date

2019-12-20

python

3.7.2 (default, Mar 1 2019, 18:34:21) [GCC 6.3.0 20170516]

platform

linux

OS

Linux-4.9.0-8-amd64-x86_64-with-debian-9.6

machine

x86_64

processor

release

4.9.0-8-amd64

architecture

(β€˜64bit’, β€˜β€™)

mlprodict

0.3

numpy

1.17.4

openblas, language=c

onnx

1.6.34

opset=12

onnxruntime

1.1.995

CPU-DNNL-MKL-ML

pandas

0.25.3

skl2onnx

1.6.994

sklearn

0.22

Raw results#

bench_plot_datasets_num.csv

<<<

from pyquickhelper.pandashelper import df2rst
from pymlbenchmark.benchmark.bench_helper import bench_pivot
import pandas
name = os.path.join(
    __WD__, "../../onnx/results/bench_plot_datasets_num_reg_knn.perf.csv")
df = pandas.read_csv(name)
print(df2rst(df, number_format=4))