Deploy machine learned models with ONNX

Links: notebook, html, python, slides, slides(2), GitHub

Xavier Dupré - Senior Data Scientist at Microsoft - Computer Science Teacher at ENSAE

Most of machine learning libraries are optimized to train models and not necessarily to use them for fast predictions in online web services. ONNX is one solution started last year by Microsoft and Facebook. This presentation describes the concept and shows some examples with scikit-learn and ML.net.

La plupart des libraires de machine learning sont optimisées pour entraîner des modèles et pas nécessairement les utiliser dans des sites internet online où l’exigence de rapidité est importante. ONNX, une initiative open source proposée l’année dernière par Microsoft et Facebook est une réponse à ce problème. Ce talk illustrera ce concept avec un démo mêlant deep learning, scikit-learn et ML.net, la librairie de machine learning open source écrite en C# et développée par Microsoft.

from jyquickhelper import add_notebook_menu
add_notebook_menu(last_level=2)
from pyquickhelper.helpgen import NbImage

Open source tools in this talk

import keras, lightgbm, nimbusml, onnx, onnxmltools, onnxruntime, sklearn, torch, xgboost
mods = [keras, lightgbm, nimbusml, onnx, onnxmltools, onnxruntime, sklearn, torch, xgboost]
for m in mods:
    print(m.__name__, m.__version__)
Using TensorFlow backend.
keras 2.2.4
lightgbm 2.2.1
nimbusml 0.6.2
onnx 1.3.0
onnxmltools 1.3.0.1000
onnxruntime 0.1.3
sklearn 0.20.0
torch 0.4.1
xgboost 0.80

ML.net

  • Open source in 2018
  • ML.net
  • Machine learning library written in C#
  • Used in many places in Microsoft Services (Bing, …)
  • Working on it for three years
NbImage("mlnet.png", width=500)
../_images/onnx_deploy_8_0.png

nimbusml

  • Open source in 2018
  • nimbusml Python wrapper on ML.net
  • Working on it for a year
NbImage("nimbusml.png", width=300)
../_images/onnx_deploy_10_0.png

onnx

  • Serialisation library specialized for machine learning based on Google.Protobuf
  • Open source in 2017
  • onnx
NbImage("onnx.png", width=500)
../_images/onnx_deploy_12_0.png

onnxmltools

  • Open source in 2017
  • Converters for some machine learning including scikit-learn
  • onnxmltools
  • Working on it for a year
NbImage("onnxmltools.png")
../_images/onnx_deploy_14_0.png

onnxruntime

NbImage("onnxruntime.png", width=400)
../_images/onnx_deploy_16_0.png

The problem about deployment

Learn and predict

  • Two different purposes not necessarily aligned for optimization
  • Learn : computation optimized for large number of observations (batch prediction)
  • Predict : computation optimized for one observation (one-off prediction)
  • Machine learning libraries optimize the learn scenario.

Illustration with a linear regression

We consider a datasets available in scikit-learn: diabetes

measures_lr = []
from sklearn.datasets import load_diabetes
diabetes = load_diabetes()
diabetes_X_train = diabetes.data[:-20]
diabetes_X_test  = diabetes.data[-20:]
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test  = diabetes.target[-20:]
diabetes_X_train[:1]
array([[ 0.03807591,  0.05068012,  0.06169621,  0.02187235, -0.0442235 ,
        -0.03482076, -0.04340085, -0.00259226,  0.01990842, -0.01764613]])

scikit-learn

from sklearn.linear_model import LinearRegression
clr = LinearRegression()
clr.fit(diabetes_X_train, diabetes_y_train)
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)
clr.predict(diabetes_X_test[:1])
array([197.61846908])
from jupytalk.benchmark import timeexec
measures_lr += [timeexec("sklearn",
                         "clr.predict(diabetes_X_test[:1])",
                         context=globals())]
Average: 33.72 µs deviation 9.62 µs (with 50 runs) in [27.32 µs, 54.97 µs]

pure python

def python_prediction(X, coef, intercept):
    s = intercept
    for a, b in zip(X, coef):
        s += a * b
    return s

python_prediction(diabetes_X_test[0], clr.coef_, clr.intercept_)
197.61846907503298
measures_lr += [timeexec("python", "python_prediction(diabetes_X_test[0], clr.coef_, clr.intercept_)",
                         context=globals())]
Average: 4.75 µs deviation 2.13 µs (with 50 runs) in [3.75 µs, 8.79 µs]

nimbusml

nimbusml was released on 11/3/2018 and wraps ML.net for Python.

from nimbusml.linear_model import OrdinaryLeastSquaresRegressor
nlr = OrdinaryLeastSquaresRegressor()
nlr.fit(diabetes_X_train, diabetes_y_train)
Automatically adding a MinMax normalization transform, use 'norm=Warn' or 'norm=No' to turn this behavior off.
Trainer solving for 11 parameters across 422 examples
Coefficient of determination R2 = 0.512226221150138, or 0.500358245995641 (adjusted)
Not training a calibrator because it is not needed.
Elapsed time: 00:00:01.1302834
OrdinaryLeastSquaresRegressor(caching='Auto', feature=None, l2_weight=1e-06,
               label=None, normalize='Auto',
               per_parameter_significance=True, weight=None)
nlr.predict(diabetes_X_test[:1])
0    197.618484
Name: Score, dtype: float32
measures_lr += [timeexec("nimbusml", "nlr.predict(diabetes_X_test[:1])",
                         context=globals(), number=5, repeat=5)]
Average: 247.79 ms deviation 8.08 ms (with 5 runs) in [238.47 ms, 261.94 ms]

ml.net (C#)

The code is longer, some subleties are hidden.

from jupytalk.benchmark import make_dataframe
df = make_dataframe(diabetes_y_train, diabetes_X_train)
df.head(n=2)
Label F0 F1 F2 F3 F4 F5 F6 F7 F8 F9
0 151.0 0.038076 0.050680 0.061696 0.021872 -0.044223 -0.034821 -0.043401 -0.002592 0.019908 -0.017646
1 75.0 -0.001882 -0.044642 -0.051474 -0.026328 -0.008449 -0.019163 0.074412 -0.039493 -0.068330 -0.092204
df.to_csv("diabetes.csv", index=False)
%load_ext csharpyml
%%mlnet ReturnMLClass5

public class TrainTestDiabetes
{
    string _dataset;
    ScikitPipeline _pipeline;

    public TrainTestDiabetes(string ds)
    {
        _dataset = ds;
    }

    public void Train()
    {
        using (var env = new ConsoleEnvironment())
        {
            var df = DataFrameIO.ReadCsv(_dataset, sep: ',',
                                         dtypes: new ColumnType[] { NumberType.R4 });
            var concat = "Concat{col=Features:F0,F1,F2,F3,F4,F5,F6,F7,F8,F9}";
            var pipe = new ScikitPipeline(new[] { concat }, "ols");
            pipe.Train(df, "Features", "Label");
            _pipeline = pipe;
        }
    }

    public DataFrame Predict(double[] features)
    {
        DataFrame pred = null;
        var df = new DataFrame();
        df.AddColumn("Label", new float[] { 0f });
        for (int i = 0; i < features.Length; ++i)
            df.AddColumn(string.Format("F{0}", i), new float[] { (float)features[i] });
        _pipeline.Predict(df, ref pred);
        return pred;
    }
}

public static TrainTestDiabetes ReturnMLClass5(string ds)
{
    return new TrainTestDiabetes(ds);
}
<function csharpy.runtime.compile.create_cs_function.<locals>.<lambda>(*params)>
tt = ReturnMLClass5("diabetes.csv")
tt.Train()
cs = tt.Predict(diabetes_X_test[0])
from csharpyml.binaries import CSDataFrame
csdf = CSDataFrame(cs)
csdf.to_df()
Label F0 F1 F2 F3 F4 F5 F6 F7 F8 ... Features.1 Features.2 Features.3 Features.4 Features.5 Features.6 Features.7 Features.8 Features.9 Score
0 0.0 -0.078165 0.05068 0.077863 0.052858 0.078236 0.064447 0.02655 -0.002592 0.040672 ... 0.05068 0.077863 0.052858 0.078236 0.064447 0.02655 -0.002592 0.040672 -0.009362 197.6185

1 rows × 22 columns

measures_lr += [timeexec("mlnet (+python)", "tt.Predict(diabetes_X_test[0])",
                         context=globals())]
Average: 27.80 µs deviation 12.47 µs (with 50 runs) in [18.97 µs, 42.76 µs]

Summary

import pandas
df = pandas.DataFrame(data=measures_lr)
df = df.set_index("legend").sort_values("average")
df
average code deviation first first3 last3 max5 min5 repeat run
legend
python 0.000005 python_prediction(diabetes_X_test[0], clr.coef... 0.000002 0.000010 0.000007 0.000004 0.000009 0.000004 200 50
mlnet (+python) 0.000028 tt.Predict(diabetes_X_test[0]) 0.000012 0.000067 0.000047 0.000033 0.000043 0.000019 200 50
sklearn 0.000034 clr.predict(diabetes_X_test[:1]) 0.000010 0.000049 0.000051 0.000029 0.000055 0.000027 200 50
nimbusml 0.247790 nlr.predict(diabetes_X_test[:1]) 0.008081 0.261936 0.250285 0.243303 0.261936 0.238469 5 5
%matplotlib inline
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1, figsize=(10,3))
df[["average", "deviation"]].plot(kind="barh", logx=True, ax=ax, xerr="deviation",
                                  legend=False, fontsize=12, width=0.8)
ax.set_ylabel("")
ax.grid(b=True, which="major")
ax.grid(b=True, which="minor")
ax.set_title("Prediction time for one observation\nLinear Regression");
../_images/onnx_deploy_46_0.png

Illustration with a random forest

measures_rf = []

scikit-learn

from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators=10)
rf.fit(diabetes_X_train, diabetes_y_train)
RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None,
           oob_score=False, random_state=None, verbose=0, warm_start=False)
measures_rf += [timeexec("sklearn", "rf.predict(diabetes_X_test[:1])",
                         context=globals())]
Average: 533.16 µs deviation 108.76 µs (with 50 runs) in [452.52 µs, 782.06 µs]

XGBoost

from xgboost import XGBRegressor
xg = XGBRegressor(n_estimators=10)
xg.fit(diabetes_X_train, diabetes_y_train)
XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0,
       max_depth=3, min_child_weight=1, missing=None, n_estimators=10,
       n_jobs=1, nthread=None, objective='reg:linear', random_state=0,
       reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
       silent=True, subsample=1)
measures_rf += [timeexec("xgboost", "xg.predict(diabetes_X_test[:1])",
                         context=globals())]
Average: 45.95 µs deviation 14.17 µs (with 50 runs) in [36.94 µs, 71.66 µs]

LightGBM

from lightgbm import LGBMRegressor
lg = LGBMRegressor(n_estimators=10)
lg.fit(diabetes_X_train, diabetes_y_train)
LGBMRegressor(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,
       importance_type='split', learning_rate=0.1, max_depth=-1,
       min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0,
       n_estimators=10, n_jobs=-1, num_leaves=31, objective=None,
       random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True,
       subsample=1.0, subsample_for_bin=200000, subsample_freq=0)
measures_rf += [timeexec("lightgbm", "lg.predict(diabetes_X_test[:1])",
                         context=globals())]
Average: 83.82 µs deviation 13.27 µs (with 50 runs) in [72.53 µs, 107.26 µs]

pure python

This would require to reimplement the prediction function.

nimbusml

from nimbusml.ensemble import FastTreesRegressor
nrf = FastTreesRegressor(num_trees=10)
nrf.fit(diabetes_X_train, diabetes_y_train)
Not adding a normalizer.
Making per-feature arrays
Changing data from row-wise to column-wise
Processed 422 instances
Binning and forming Feature objects
Reserved memory for tree learner: 170508 bytes
Starting to train ...
Not training a calibrator because it is not needed.
Elapsed time: 00:00:00.4688640
FastTreesRegressor(allow_empty_trees=True, bagging_size=0,
          baseline_alpha_risk=None, baseline_scores_formula=None,
          best_step_trees=False, bias=0.0, bundling='None', caching='Auto',
          categorical_split=False, compress_ensemble=False,
          disk_transpose=None, dropout_rate=0.0, early_stopping_metrics=1,
          early_stopping_rule=None, enable_pruning=False,
          entropy_coefficient=0.0, example_fraction=0.7,
          execution_times=False, feature=None, feature_compression_level=1,
          feature_flocks=True, feature_fraction=1.0,
          feature_reuse_penalty=0.0, feature_select_seed=123,
          filter_zero_lambdas=False, first_use_penalty=0.0,
          gain_conf_level=0.0, get_derivatives_sample_rate=1,
          group_id=None, histogram_pool_size=-1, label=None,
          learning_rate=0.2, max_categorical_groups_per_node=64,
          max_categorical_split_points=64, max_tree_output=100.0,
          max_trees_after_compression=-1,
          min_docs_for_categorical_split=100,
          min_docs_percentage_split=0.001, min_split=10, min_step_size=0.0,
          normalize='Auto', num_bins=255, num_leaves=20,
          num_post_bracket_steps=0, num_trees=10,
          optimizer='GradientDescent', parallel_trainer=None,
          position_discount_freeform=None, pruning_threshold=0.004,
          pruning_window_size=5, random_start=False, random_state=123,
          shrinkage=1.0, smoothing=0.0, softmax_temperature=0.0,
          sparsify_threshold=0.7, split_fraction=1.0,
          test_frequency=2147483647, train_threads=None,
          use_line_search=False, use_tolerant_pruning=False, weight=None,
          write_last_ensemble=False)
measures_rf += [timeexec("nimbusml", "nrf.predict(diabetes_X_test[:1])", context=globals(), number=5, repeat=5)]
Average: 200.72 ms deviation 10.10 ms (with 5 runs) in [191.44 ms, 219.30 ms]

ml.net

%%mlnet ReturnMLClassRF

public class TrainTestDiabetesRF
{
    string _dataset;
    ScikitPipeline _pipeline;

    public TrainTestDiabetesRF(string ds)
    {
        _dataset = ds;
    }

    public void Train()
    {
        using (var env = new ConsoleEnvironment())
        {
            var df = DataFrameIO.ReadCsv(_dataset, sep: ',',
                                         dtypes: new ColumnType[] { NumberType.R4 });
            var concat = "Concat{col=Features:F0,F1,F2,F3,F4,F5,F6,F7,F8,F9}";
            var pipe = new ScikitPipeline(new[] { concat }, "ftr{iter=10}");
            pipe.Train(df, "Features", "Label");
            _pipeline = pipe;
        }
    }

    public DataFrame Predict(double[] features)
    {
        DataFrame pred = null;
        var df = new DataFrame();
        df.AddColumn("Label", new float[] { 0f });
        for (int i = 0; i < features.Length; ++i)
            df.AddColumn(string.Format("F{0}", i), new float[] { (float)features[i] });
        _pipeline.Predict(df, ref pred);
        return pred;
    }

    public DataFrame PredictBatch(int nf, double[] features)
    {
        DataFrame pred = null;
        var df = new DataFrame();
        int N = features.Length / nf;
        df.AddColumn("Label", Enumerable.Range(0, N).Select(i => (float)features[nf * i]).ToArray());
        for (int i = 0; i < nf; ++i)
            df.AddColumn(string.Format("F{0}", i),
                         Enumerable.Range(0, N).Select(k => (float)features[nf * k + i]).ToArray());
        _pipeline.Predict(df, ref pred);
        return pred;
    }

    public void Read(string name)
    {
        _pipeline = new ScikitPipeline(name);
    }

    public void Save(string name)
    {
        _pipeline.Save(name, true);
    }
}

public static TrainTestDiabetesRF ReturnMLClassRF(string ds)
{
    return new TrainTestDiabetesRF(ds);
}
<function csharpy.runtime.compile.create_cs_function.<locals>.<lambda>(*params)>
trf = ReturnMLClassRF("diabetes.csv")
trf.Train()
measures_rf += [timeexec("mlnet (+python)",
                         "trf.Predict(diabetes_X_test[0])",
                         context=globals())]
Average: 26.33 µs deviation 24.78 µs (with 50 runs) in [20.42 µs, 34.38 µs]

Summary

df = pandas.DataFrame(data=measures_rf)
df = df.set_index("legend").sort_values("average")
df
average code deviation first first3 last3 max5 min5 repeat run
legend
mlnet (+python) 0.000026 trf.Predict(diabetes_X_test[0]) 0.000025 0.000371 0.000137 0.000028 0.000034 0.000020 200 50
xgboost 0.000046 xg.predict(diabetes_X_test[:1]) 0.000014 0.000111 0.000078 0.000038 0.000072 0.000037 200 50
lightgbm 0.000084 lg.predict(diabetes_X_test[:1]) 0.000013 0.000151 0.000107 0.000072 0.000107 0.000073 200 50
sklearn 0.000533 rf.predict(diabetes_X_test[:1]) 0.000109 0.000560 0.000524 0.000499 0.000782 0.000453 200 50
nimbusml 0.200717 nrf.predict(diabetes_X_test[:1]) 0.010102 0.203112 0.195962 0.203012 0.219300 0.191441 5 5
fig, ax = plt.subplots(1, 1, figsize=(10,3))
df[["average", "deviation"]].plot(kind="barh", logx=True, ax=ax, xerr="deviation",
                                  legend=False, fontsize=12, width=0.8)
ax.set_ylabel("")
ax.grid(b=True, which="major")
ax.grid(b=True, which="minor")
ax.set_title("Prediction time for one observation\nRandom Forest (10 trees)");
../_images/onnx_deploy_68_0.png

Keep in mind

  • Trained trees are not necessarily the same.
  • Performance is not compared.
  • Order of magnitude is important here.

What is batch prediction?

  • Instead of running N times 1 prediction
  • We run 1 time N predictions
import numpy
memo = []
batch = [1, 2, 5, 7, 8, 10, 100, 200, 500, 1000, 2000,
         3000, 4000, 5000, 10000, 20000, 50000,
         100000, 200000, 400000, ]

number = 10
repeat = 10
for i in batch:
    if i <= diabetes_X_test.shape[0]:
        mx = diabetes_X_test[:i]
    else:
        mxs = [diabetes_X_test] * (i // diabetes_X_test.shape[0] + 1)
        mx = numpy.vstack(mxs)
        mx = mx[:i]

    print("batch", "=", i)
    number = 10 if i <= 10000 else 2

    memo.append(timeexec("sklearn %d" % i, "rf.predict(mx)",
                         context=globals(), number=number, repeat=repeat))
    memo[-1]["batch"] = i
    memo[-1]["lib"] = "sklearn"

    memo.append(timeexec("xgboost %d" % i, "xg.predict(mx)",
                         context=globals(), number=number, repeat=repeat))
    memo[-1]["batch"] = i
    memo[-1]["lib"] = "xgboost"

    memo.append(timeexec("lightgbm %d" % i, "lg.predict(mx)",
                         context=globals(), number=number, repeat=repeat))
    memo[-1]["batch"] = i
    memo[-1]["lib"] = "lightgbm"

    memo.append(timeexec("nimbusml %d" % i, "nrf.predict(mx)",
                         repeat=2, number=2, context=globals()))
    memo[-1]["batch"] = i
    memo[-1]["lib"] = "nimbusml"

    memo.append(timeexec("mlnet %d" % i, "trf.PredictBatch(10, mx.ravel())",
                         repeat=min(4, repeat), number=min(4, number), context=globals()))
    memo[-1]["batch"] = i
    memo[-1]["lib"] = "mlnet (+python)"
batch = 1
Average: 1.09 ms deviation 362.34 µs (with 10 runs) in [449.86 µs, 1.63 ms]
Average: 44.75 µs deviation 22.57 µs (with 10 runs) in [36.23 µs, 112.32 µs]
Average: 118.07 µs deviation 43.36 µs (with 10 runs) in [73.76 µs, 170.39 µs]
Average: 183.65 ms deviation 4.88 ms (with 2 runs) in [178.77 ms, 188.53 ms]
Average: 423.51 µs deviation 677.27 µs (with 4 runs) in [25.68 µs, 1.60 ms]
batch = 2
Average: 574.02 µs deviation 100.80 µs (with 10 runs) in [430.81 µs, 825.09 µs]
Average: 49.26 µs deviation 26.08 µs (with 10 runs) in [36.03 µs, 126.85 µs]
Average: 97.99 µs deviation 36.68 µs (with 10 runs) in [70.40 µs, 162.49 µs]
Average: 201.22 ms deviation 9.69 ms (with 2 runs) in [191.53 ms, 210.90 ms]
Average: 58.67 µs deviation 45.19 µs (with 4 runs) in [30.32 µs, 136.79 µs]
batch = 5
Average: 613.05 µs deviation 158.19 µs (with 10 runs) in [445.31 µs, 992.75 µs]
Average: 66.06 µs deviation 22.38 µs (with 10 runs) in [47.45 µs, 129.78 µs]
Average: 128.58 µs deviation 22.65 µs (with 10 runs) in [114.29 µs, 195.16 µs]
Average: 202.56 ms deviation 1.32 ms (with 2 runs) in [201.24 ms, 203.88 ms]
Average: 99.46 µs deviation 45.30 µs (with 4 runs) in [51.36 µs, 173.53 µs]
batch = 7
Average: 704.79 µs deviation 170.51 µs (with 10 runs) in [443.58 µs, 917.29 µs]
Average: 88.73 µs deviation 21.88 µs (with 10 runs) in [57.76 µs, 133.77 µs]
Average: 111.39 µs deviation 32.52 µs (with 10 runs) in [78.74 µs, 165.49 µs]
Average: 187.77 ms deviation 2.66 ms (with 2 runs) in [185.11 ms, 190.43 ms]
Average: 89.51 µs deviation 35.44 µs (with 4 runs) in [64.69 µs, 150.72 µs]
batch = 8
Average: 599.34 µs deviation 149.89 µs (with 10 runs) in [449.70 µs, 938.79 µs]
Average: 47.31 µs deviation 18.38 µs (with 10 runs) in [39.07 µs, 100.46 µs]
Average: 84.85 µs deviation 13.38 µs (with 10 runs) in [79.05 µs, 124.80 µs]
Average: 203.27 ms deviation 4.49 ms (with 2 runs) in [198.77 ms, 207.76 ms]
Average: 126.54 µs deviation 52.07 µs (with 4 runs) in [78.62 µs, 214.12 µs]
batch = 10
Average: 624.00 µs deviation 162.77 µs (with 10 runs) in [455.07 µs, 990.46 µs]
Average: 66.39 µs deviation 17.53 µs (with 10 runs) in [54.83 µs, 118.08 µs]
Average: 92.00 µs deviation 22.07 µs (with 10 runs) in [78.93 µs, 153.88 µs]
Average: 261.13 ms deviation 54.64 ms (with 2 runs) in [206.49 ms, 315.77 ms]
Average: 163.65 µs deviation 31.05 µs (with 4 runs) in [131.85 µs, 214.81 µs]
batch = 100
Average: 692.62 µs deviation 113.09 µs (with 10 runs) in [512.16 µs, 896.36 µs]
Average: 83.00 µs deviation 28.15 µs (with 10 runs) in [61.71 µs, 155.10 µs]
Average: 220.71 µs deviation 44.36 µs (with 10 runs) in [177.38 µs, 310.52 µs]
Average: 197.85 ms deviation 10.12 ms (with 2 runs) in [187.73 ms, 207.98 ms]
Average: 919.46 µs deviation 114.15 µs (with 4 runs) in [734.12 µs, 1.03 ms]
batch = 200
Average: 753.20 µs deviation 179.59 µs (with 10 runs) in [554.94 µs, 1.03 ms]
Average: 116.67 µs deviation 28.56 µs (with 10 runs) in [83.44 µs, 182.40 µs]
Average: 296.90 µs deviation 21.28 µs (with 10 runs) in [277.29 µs, 349.83 µs]
Average: 205.52 ms deviation 8.52 ms (with 2 runs) in [197.00 ms, 214.03 ms]
Average: 1.76 ms deviation 451.29 µs (with 4 runs) in [1.44 ms, 2.54 ms]
batch = 500
Average: 748.39 µs deviation 116.65 µs (with 10 runs) in [674.92 µs, 1.07 ms]
Average: 163.28 µs deviation 17.11 µs (with 10 runs) in [150.32 µs, 211.87 µs]
Average: 614.15 µs deviation 47.82 µs (with 10 runs) in [571.61 µs, 743.15 µs]
Average: 194.69 ms deviation 4.90 ms (with 2 runs) in [189.79 ms, 199.59 ms]
Average: 3.94 ms deviation 263.47 µs (with 4 runs) in [3.55 ms, 4.27 ms]
batch = 1000
Average: 899.24 µs deviation 42.31 µs (with 10 runs) in [833.70 µs, 998.00 µs]
Average: 285.60 µs deviation 43.57 µs (with 10 runs) in [252.13 µs, 404.74 µs]
Average: 1.18 ms deviation 94.64 µs (with 10 runs) in [1.10 ms, 1.42 ms]
Average: 192.36 ms deviation 2.26 ms (with 2 runs) in [190.10 ms, 194.62 ms]
Average: 7.42 ms deviation 26.76 µs (with 4 runs) in [7.38 ms, 7.45 ms]
batch = 2000
Average: 1.31 ms deviation 120.70 µs (with 10 runs) in [1.21 ms, 1.63 ms]
Average: 523.26 µs deviation 57.52 µs (with 10 runs) in [475.06 µs, 679.19 µs]
Average: 2.39 ms deviation 223.51 µs (with 10 runs) in [2.12 ms, 2.78 ms]
Average: 328.07 ms deviation 65.75 ms (with 2 runs) in [262.32 ms, 393.83 ms]
Average: 20.18 ms deviation 1.39 ms (with 4 runs) in [18.16 ms, 22.08 ms]
batch = 3000
Average: 2.46 ms deviation 343.11 µs (with 10 runs) in [1.94 ms, 2.98 ms]
Average: 842.25 µs deviation 47.85 µs (with 10 runs) in [749.00 µs, 929.11 µs]
Average: 3.59 ms deviation 210.25 µs (with 10 runs) in [3.30 ms, 4.00 ms]
Average: 203.53 ms deviation 11.45 ms (with 2 runs) in [192.08 ms, 214.97 ms]
Average: 27.22 ms deviation 3.58 ms (with 4 runs) in [22.87 ms, 32.33 ms]
batch = 4000
Average: 3.13 ms deviation 752.84 µs (with 10 runs) in [2.32 ms, 4.50 ms]
Average: 1.31 ms deviation 188.63 µs (with 10 runs) in [1.04 ms, 1.68 ms]
Average: 4.73 ms deviation 574.07 µs (with 10 runs) in [4.27 ms, 5.88 ms]
Average: 193.47 ms deviation 4.77 ms (with 2 runs) in [188.70 ms, 198.25 ms]
Average: 32.11 ms deviation 1.24 ms (with 4 runs) in [30.44 ms, 33.76 ms]
batch = 5000
Average: 2.77 ms deviation 303.78 µs (with 10 runs) in [2.39 ms, 3.42 ms]
Average: 1.25 ms deviation 83.17 µs (with 10 runs) in [1.17 ms, 1.41 ms]
Average: 5.70 ms deviation 352.30 µs (with 10 runs) in [5.21 ms, 6.43 ms]
Average: 199.38 ms deviation 11.69 ms (with 2 runs) in [187.69 ms, 211.07 ms]
Average: 46.80 ms deviation 5.56 ms (with 4 runs) in [38.57 ms, 53.04 ms]
batch = 10000
Average: 4.48 ms deviation 362.14 µs (with 10 runs) in [4.05 ms, 5.20 ms]
Average: 2.42 ms deviation 117.54 µs (with 10 runs) in [2.30 ms, 2.70 ms]
Average: 10.73 ms deviation 193.20 µs (with 10 runs) in [10.48 ms, 11.21 ms]
Average: 206.34 ms deviation 12.64 ms (with 2 runs) in [193.70 ms, 218.98 ms]
Average: 83.42 ms deviation 4.68 ms (with 4 runs) in [78.36 ms, 90.48 ms]
batch = 20000
Average: 8.30 ms deviation 630.93 µs (with 2 runs) in [7.66 ms, 10.10 ms]
Average: 5.87 ms deviation 203.11 µs (with 2 runs) in [5.58 ms, 6.22 ms]
Average: 22.55 ms deviation 2.14 ms (with 2 runs) in [20.79 ms, 28.44 ms]
Average: 219.12 ms deviation 2.18 ms (with 2 runs) in [216.93 ms, 221.30 ms]
Average: 170.93 ms deviation 12.15 ms (with 2 runs) in [154.61 ms, 188.88 ms]
batch = 50000
Average: 20.29 ms deviation 1.16 ms (with 2 runs) in [19.31 ms, 23.44 ms]
Average: 16.82 ms deviation 2.40 ms (with 2 runs) in [14.80 ms, 22.99 ms]
Average: 62.07 ms deviation 11.67 ms (with 2 runs) in [52.14 ms, 83.67 ms]
Average: 225.02 ms deviation 597.14 µs (with 2 runs) in [224.42 ms, 225.62 ms]
Average: 403.81 ms deviation 17.18 ms (with 2 runs) in [382.44 ms, 429.30 ms]
batch = 100000
Average: 38.56 ms deviation 668.49 µs (with 2 runs) in [37.59 ms, 39.60 ms]
Average: 34.35 ms deviation 3.23 ms (with 2 runs) in [32.66 ms, 43.94 ms]
Average: 110.96 ms deviation 7.62 ms (with 2 runs) in [104.67 ms, 130.42 ms]
Average: 292.40 ms deviation 17.23 ms (with 2 runs) in [275.17 ms, 309.63 ms]
Average: 818.34 ms deviation 17.06 ms (with 2 runs) in [798.83 ms, 836.21 ms]
batch = 200000
Average: 91.00 ms deviation 2.41 ms (with 2 runs) in [88.26 ms, 96.05 ms]
Average: 70.40 ms deviation 9.45 ms (with 2 runs) in [64.58 ms, 97.42 ms]
Average: 218.47 ms deviation 12.04 ms (with 2 runs) in [206.40 ms, 241.31 ms]
Average: 397.57 ms deviation 10.73 ms (with 2 runs) in [386.84 ms, 408.30 ms]
Average: 1.49 s deviation 31.59 ms (with 2 runs) in [1.45 s, 1.53 s]
batch = 400000
Average: 196.19 ms deviation 4.51 ms (with 2 runs) in [187.93 ms, 202.83 ms]
Average: 144.07 ms deviation 19.90 ms (with 2 runs) in [129.53 ms, 202.40 ms]
Average: 427.58 ms deviation 11.61 ms (with 2 runs) in [411.34 ms, 448.11 ms]
Average: 784.17 ms deviation 61.18 ms (with 2 runs) in [722.99 ms, 845.34 ms]
Average: 3.09 s deviation 69.31 ms (with 2 runs) in [3.00 s, 3.17 s]
dfb = pandas.DataFrame(memo)[["average", "lib", "batch"]]
piv = dfb.pivot("batch", "lib", "average")
for c in piv.columns:
    piv["ave_" + c] = piv[c] / piv.index
libs = list(c for c in piv.columns if "ave_" in c)
ax = piv.plot(y=libs, logy=True, logx=True, figsize=(10, 5))
ax.set_title("Computation time per observation when computed in a batch")
ax.set_ylabel("sec")
ax.set_xlabel("batch size")
ax.grid(True);
../_images/onnx_deploy_72_0.png

If we could switch libraries…

  • Export one model from a library to another one.
  • Optimisation is better for one-off prediction.

Let’s save the model from ml.net:

trf.Save("rf-mlnet.zip")

And load it back in nimbusml:

from nimbusml import Pipeline
pipe = Pipeline()
pipe.load_model("rf-mlnet.zip")
df = make_dataframe(diabetes_y_test.astype(numpy.float32),
                    diabetes_X_test.astype(numpy.float32))
pipe.predict(df).head()
Score
0 193.684052
1 128.217041
2 157.777420
3 70.880699
4 172.210541

It works because ML.net and nimbusml share the same core library.

ONNX

ONNX = language to describe models

  • Standard format to describe machine learning
  • Easier to exchange, export

ONNX = machine learning oriented

Can represent any mathematical function handling numerical and text features.

NbImage("onnxop.png", width=600)
../_images/onnx_deploy_84_0.png

actively supported

  • Microsoft
  • Facebook
  • first created to deploy deep learning models
  • extended to other models

Train somewhere, predict somewhere else

Cannot optimize the code for both training and predicting.

Training Predicting
Batch prediction One-off prediction
Huge memory Small memory
Huge data Small data
. High latency

Libraries for predictions

  • Optimized for predictions
  • Optimized for a device

ONNX Runtime

ONNX Runtime for inferencing machine learning models now in preview

Dedicated runtime for:

  • CPU
  • GPU
NbImage("onnxrt.png", width=800)
../_images/onnx_deploy_90_0.png

ONNX visually

ONNX can represent any pipeline of data.

Let’s see it with a cool feature of nimbusml: pipeline visualization.

import pandas
from nimbusml.preprocessing.schema import ColumnConcatenator
from nimbusml.feature_extraction.categorical import OneHotVectorizer
from nimbusml.linear_model import LogisticRegressionBinaryClassifier
from nimbusml import Pipeline, FileDataStream, Role
import pandas
df = pandas.DataFrame(dict(catA=["A", "B", "A"], catB=["D", "F", "F"], Label=[0, 1, 0]))
df
catA catB Label
0 A D 0
1 B F 1
2 A F 0
transform_1 = OneHotVectorizer() << {'catAs':'catA'}
transform_2 = OneHotVectorizer() << {'catBs':'catB'}
transform_3 = ColumnConcatenator() << {'Features': ['catAs', 'catBs']}
algo = LogisticRegressionBinaryClassifier() << {Role.Feature:'Features',
                                                Role.Label: "Label"}
pipeline = Pipeline([transform_1, transform_2, transform_3, algo])
from nimbusml.utils.exports import dot_export_pipeline
dot_vis = dot_export_pipeline(pipeline, df)
print(dot_vis)
digraph{
  orientation=portrait;
  sch0[label="<f0> catA|<f1> catB|<f2> Label",shape=record,fontsize=8];
  node1[label="OneHotVectorizer",shape=box,style="filled,rounded",color=cyan,fontsize=12];
  sch0:f0 -> node1;
  sch1[label="<f0> catAs",shape=record,fontsize=8];
  node1 -> sch1:f0;
  node2[label="OneHotVectorizer",shape=box,style="filled,rounded",color=cyan,fontsize=12];
  sch0:f1 -> node2;
  sch2[label="<f0> catBs",shape=record,fontsize=8];
  node2 -> sch2:f0;
  node3[label="ColumnConcatenator",shape=box,style="filled,rounded",color=cyan,fontsize=12];
  sch1:f0 -> node3;
  sch2:f0 -> node3;
  sch3[label="<f0> Features",shape=record,fontsize=8];
  node3 -> sch3:f0;
  node4[label="LogisticRegressionBinaryClassifier",shape=box,style="filled,rounded",color=yellow,fontsize=12];
  sch3:f0 -> node4 [label="Feature",fontsize=8];
  sch0:f2 -> node4 [label="Label",fontsize=8];
  sch4[label="<f0> PredictedLabel|<f1> PredictedProba|<f2> Score",shape=record,fontsize=8];
  node4 -> sch4:f0;
  node4 -> sch4:f1;
  node4 -> sch4:f2;
}
from jyquickhelper import RenderJsDot
RenderJsDot(dot_vis, width='40%')

ONNX export is not ready in nimbusml but it is in mlnet.

df.to_csv("data_train.csv", index=False)
FileDataStream.read_csv_pandas(df).schema.to_string()
'col=catA:TX:0 col=catB:TX:1 col=Label:I8:2 header=+'
%%maml

chain

cmd = train{
    data = data_train.csv
    loader = text{col=catA:TX:0 col=catB:TX:1 col=Label:R4:2
                  header=+ sep=,}

    xf = Categorical {col=catAs:catA}
    xf = Categorical {col=catBs:catB}
    xf = concat {col=Features:catAs,catBs}
    tr = lr

    out = ft_example.zip
}

cmd = saveonnx{
    in = ft_example.zip
    onnx = ft_example.onnx
    domain = ai.onnx.ml
    idrop = Label
}
-
=====================================================================================
Executing: train{data = data_train.csv
    loader = text{col=catA:TX:0 col=catB:TX:1 col=Label:R4:2
                  header=+ sep=,}
    xf = Categorical {col=catAs:catA}
    xf = Categorical {col=catBs:catB}
    xf = concat {col=Features:catAs,catBs}
    tr = lr
    out = ft_example.zip}
=====================================================================================
maml.exe Train tr=lr loader=text{col=catA:TX:0 col=catB:TX:1 col=Label:R4:2
                  header=+ sep=,} data=data_train.csv out=ft_example.zip xf=Categorical{col=catAs:catA} xf=Categorical{col=catBs:catB} xf=concat{col=Features:catAs,catBs}
Not adding a normalizer.
LBFGS multi-threading will attempt to load dataset into memory. In case of out-of-memory issues, add 'numThreads=1' to the trainer arguments and 'cache=-' to the command line arguments to turn off multi-threading.
Beginning optimization
num vars: 5
improvement criterion: Mean Improvement
L1 regularization selected 1 of 5 weights.
Not training a calibrator because it is not needed.
[1] 'Building term dictionary' started.
[1] (00:00.04)      3 examples      Total Terms: 2
[1] 'Building term dictionary' finished in 00:00:00.0428871.
[2] 'Building term dictionary #2' started.
[2] (00:00.00)      3 examples      Total Terms: 2
[2] 'Building term dictionary #2' finished in 00:00:00.0019938.
[3] 'LBFGS data prep' started.
[3] 'LBFGS data prep' finished in 00:00:00.0099745.
[4] 'LBFGS Optimizer' started.
[4] (00:00.03)      0 iterations    Loss: 0.6931471824646
[4] (00:00.05)      1 iterations    Loss: 0.646595060825348 Improvement: 0.04655
[4] (00:00.05)      2 iterations    Loss: 0.638094425201416 Improvement: 0.01611
[4] (00:00.05)      3 iterations    Loss: 0.63709282875061  Improvement: 0.004599
[4] (00:00.05)      4 iterations    Loss: 0.636658787727356 Improvement: 0.001463
[4] (00:00.05)      5 iterations    Loss: 0.63655686378479  Improvement: 0.0004412
[4] (00:00.05)      6 iterations    Loss: 0.636525630950928 Improvement: 0.0001337
[4] (00:00.05)      7 iterations    Loss: 0.636517405509949 Improvement: 3.958E-05
[4] (00:00.05)      8 iterations    Loss: 0.636515080928802 Improvement: 1.164E-05
[4] (00:00.05)      9 iterations    Loss: 0.63651442527771  Improvement: 3.401E-06
[4] (00:00.05)      10 iterations   Loss: 0.636514246463776 Improvement: 9.843E-07
[4] (00:00.05)      11 iterations   Loss: 0.636514246463776 Improvement: 2.461E-07
[4] (00:00.05)      12 iterations   Loss: 0.636514186859131 Improvement: 1.062E-07
[4] (00:00.05)      13 iterations   Loss: 0.636514186859131 Improvement: 2.656E-08
[4] 'LBFGS Optimizer' finished in 00:00:00.0548519.
[5] 'Saving model' started.
Physical memory usage(MB): 838
Virtual memory usage(MB): 6362
14/11/2018 18:29:25  Time elapsed(s): 0.612
=====================================================================================
Executing: saveonnx{in = ft_example.zip
    onnx = ft_example.onnx
    domain = ai.onnx.ml
    idrop = Label}
=====================================================================================
=====================================================================================
Executed 2 commands in 00:00:00.9491737
=====================================================================================
[5] 'Saving model' finished in 00:00:00.0997894.
-
-----
-
Warning: Too few instances to use 4 threads, decreasing to 1 thread(s)
import onnx
model_onnx = onnx.load("ft_example.onnx")
print(str(model_onnx)[:400] + "\n...")
ir_version: 3
producer_name: "ML.NET"
producer_version: "0.8.27213.0"
domain: "ai.onnx.ml"
graph {
  node {
    input: "catA"
    output: "catAs"
    name: "LabelEncoder"
    op_type: "LabelEncoder"
    attribute {
      name: "classes_strings"
      strings: "A"
      strings: "B"
      type: STRINGS
    }
    attribute {
      name: "default_int64"
      i: -1
      type: INT
    }
    attribute
...

ONNX runtime: compute predictions

import onnxruntime

sess = onnxruntime.InferenceSession("ft_example.onnx")

for i in sess.get_inputs():
    print('Input:', i)
for o in sess.get_outputs():
    print('Output:', o)
Input: NodeArg(name='catA', type='tensor(string)', shape=[1, 1])
Input: NodeArg(name='catB', type='tensor(string)', shape=[1, 1])
Output: NodeArg(name='catA0', type='tensor(string)', shape=[1, 1])
Output: NodeArg(name='catB0', type='tensor(string)', shape=[1, 1])
Output: NodeArg(name='catAs1', type='tensor(float)', shape=[1, 2])
Output: NodeArg(name='catBs1', type='tensor(float)', shape=[1, 2])
Output: NodeArg(name='Features0', type='tensor(float)', shape=[1, 4])
Output: NodeArg(name='PredictedLabel0', type='tensor(float)', shape=[1, 1])
Output: NodeArg(name='Score0', type='tensor(float)', shape=[1, 1])
Output: NodeArg(name='Probability0', type='tensor(float)', shape=[1, 1])

ONNX demo on random forest

rf
RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None,
           oob_score=False, random_state=None, verbose=0, warm_start=False)

Conversion to ONNX

onnxmltools

from onnxmltools import convert_sklearn
from onnxmltools.convert.common.data_types import FloatTensorType
model_onnx = convert_sklearn(rf, "rf_diabetes",
                             [('input', FloatTensorType([1, 10]))])
print(str(model_onnx)[:450] + "\n...")
ir_version: 3
producer_name: "OnnxMLTools"
producer_version: "1.3.0.1000"
domain: "onnxml"
model_version: 0
doc_string: ""
graph {
  node {
    input: "input"
    output: "variable"
    name: "TreeEnsembleRegressor"
    op_type: "TreeEnsembleRegressor"
    attribute {
      name: "n_targets"
      i: 1
      type: INT
    }
    attribute {
      name: "nodes_falsenodeids"
      ints: 238
      ints: 191
      ints: 188
      ints: 115
      ints:
...

Save the model

from onnxmltools.utils import save_model
save_model(model_onnx, 'rf_sklearn.onnx')

Computes predictions

import onnxruntime

sess = onnxruntime.InferenceSession("rf_sklearn.onnx")

for i in sess.get_inputs():
    print('Input:', i)
for o in sess.get_outputs():
    print('Output:', o)
Input: NodeArg(name='input', type='tensor(float)', shape=[1, 10])
Output: NodeArg(name='variable', type='tensor(float)', shape=[1, 1])
import numpy

def predict_onnxrt(x):
    return sess.run(["variable"], {'input': x})

print("Prediction:", predict_onnxrt(diabetes_X_test[:1].astype(numpy.float32)))
Prediction: [array([[198.5]], dtype=float32)]
measures_rf += [timeexec("onnx", "predict_onnxrt(diabetes_X_test[:1].astype(numpy.float32))",
                         context=globals())]
Average: 27.00 µs deviation 10.18 µs (with 50 runs) in [18.12 µs, 47.92 µs]
fig, ax = plt.subplots(1, 1, figsize=(10,3))
df = pandas.DataFrame(data=measures_rf)
df = df.set_index("legend").sort_values("average")
df[["average", "deviation"]].plot(kind="barh", logx=True, ax=ax, xerr="deviation",
                                  legend=False, fontsize=12, width=0.8)
ax.set_ylabel("")
ax.grid(b=True, which="major")
ax.grid(b=True, which="minor")
ax.set_title("Prediction time for one observation\nRandom Forest (10 trees)");
../_images/onnx_deploy_114_0.png

Deep learning

  • transfer learning with keras
  • orther convert pytorch, caffee…
measures_dl = []
from keras.applications.mobilenetv2 import MobileNetV2
model = MobileNetV2(input_shape=None, alpha=1.0, depth_multiplier=1,
                    include_top=True,
                    weights='imagenet', input_tensor=None,
                    pooling=None, classes=1000)
model
<keras.engine.training.Model at 0x246047db5c0>
from pyensae.datasource import download_data
import os
if not os.path.exists("simages/noclass"):
    os.makedirs("simages/noclass")
images = download_data("dog-cat-pixabay.zip",
                       whereTo="simages/noclass")
from mlinsights.plotting import plot_gallery_images
plot_gallery_images(images[:7]);
../_images/onnx_deploy_119_0.png
from keras.preprocessing.image import ImageDataGenerator
import numpy
params = dict(rescale=1./255)
augmenting_datagen = ImageDataGenerator(**params)
flow = augmenting_datagen.flow_from_directory('simages', batch_size=1, target_size=(224, 224),
                                              classes=['noclass'], shuffle=False)
imgs = [img[0][0] for i, img in zip(range(0,31), flow)]
Found 31 images belonging to 1 classes.
array_images = [im[numpy.newaxis, :, :, :] for im in imgs]
array_images[0].shape
(1, 224, 224, 3)
outputs = [model.predict(im) for im in array_images]
outputs[0].shape
(1, 1000)
outputs[0].ravel()[:10]
array([3.5999357e-04, 1.2039350e-03, 1.2471760e-04, 6.1937186e-05,
       1.1310327e-03, 1.7601112e-04, 1.9819068e-04, 1.4307768e-04,
       5.5190694e-04, 1.7074044e-04], dtype=float32)

Let’s measure time.

from jupytalk.benchmark import timeexec
measures_dl += [timeexec("keras.mobilenet", "model.predict(array_images[0])",
                         context=globals(), repeat=3, number=10)]
Average: 119.88 ms deviation 7.41 ms (with 10 runs) in [110.84 ms, 128.99 ms]
from onnxmltools import convert_keras
try:
    konnx = convert_keras(model, "mobilev2")
except ValueError as e:
    # keras updated its version on
    print(e)
Unsupported shape calculation for operator <class 'keras.layers.advanced_activations.ReLU'>

Keras has been updated, onnxmltools was not…

I raised an issue 165.

Let’s switch to pytorch.

import torchvision.models as models
modelt = models.squeezenet1_1(pretrained=True)
modelt.classifier
c:python370_x64libsite-packagestorchvisionmodelssqueezenet.py:94: UserWarning: nn.init.kaiming_uniform is now deprecated in favor of nn.init.kaiming_uniform_.
  init.kaiming_uniform(m.weight.data)
c:python370_x64libsite-packagestorchvisionmodelssqueezenet.py:92: UserWarning: nn.init.normal is now deprecated in favor of nn.init.normal_.
  init.normal(m.weight.data, mean=0.0, std=0.01)
Sequential(
  (0): Dropout(p=0.5)
  (1): Conv2d(512, 1000, kernel_size=(1, 1), stride=(1, 1))
  (2): ReLU(inplace)
  (3): AvgPool2d(kernel_size=13, stride=1, padding=0)
)
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
trans = transforms.Compose([transforms.Resize((224, 224)),
                            transforms.CenterCrop(224),
                            transforms.ToTensor()])
imgs = datasets.ImageFolder("simages", trans)
dataloader = DataLoader(imgs, batch_size=1, shuffle=False, num_workers=1)
img_seq = iter(dataloader)
imgs = list(img[0] for img in img_seq)
all_outputs = [modelt.forward(img).detach().numpy().ravel() for img in imgs[:2]]
all_outputs[0].shape
(1000,)
measures_dl += [timeexec("pytorch.squeezenet", "modelt.forward(imgs[0]).detach().numpy().ravel()",
                         context=globals(), repeat=3, number=10)]
Average: 81.15 ms deviation 2.46 ms (with 10 runs) in [77.84 ms, 83.72 ms]

Let’s convert into ONNX.

import torch.onnx
from torch.autograd import Variable
input_names = [ "actual_input_1" ]
output_names = [ "output1" ]
dummy_input = Variable(torch.randn(10, 3, 224, 224))

try:
    torch.onnx.export(modelt, dummy_input, "resnet18.onnx", verbose=False,
                      input_names=input_names, output_names=output_names)
except Exception as e:
    print(str(e).split('\n')[0])
c:python370_x64libsite-packagestorchonnxsymbolic.py:69: UserWarning: ONNX export failed on max_pool2d_with_indices because ceil_mode not supported
  warnings.warn("ONNX export failed on " + op + " because " + msg + " not supported")
ONNX export failed: Couldn't export operator aten::max_pool2d_with_indices

Well… work in progress.

Model zoo

Converted Models

NbImage("zoo.png", width=800)
../_images/onnx_deploy_138_0.png

MobileNet and SqueezeNet

Download a pre-converted version MobileNetv2

download_data("mobilenetv2-1.0.onnx",
              url="https://s3.amazonaws.com/onnx-model-zoo/mobilenet/mobilenetv2-1.0/")
'mobilenetv2-1.0.onnx'
sess = onnxruntime.InferenceSession("mobilenetv2-1.0.onnx")
for i in sess.get_inputs():
    print('Input:', i)
for o in sess.get_outputs():
    print('Output:', o)
Input: NodeArg(name='data', type='tensor(float)', shape=[1, 3, 224, 224])
Output: NodeArg(name='mobilenetv20_output_flatten0_reshape0', type='tensor(float)', shape=[1, 1000])
print(array_images[0].shape)
print(array_images[0].transpose((0, 3, 1, 2)).shape)
(1, 224, 224, 3)
(1, 3, 224, 224)
res = sess.run(None, {'data': array_images[0].transpose((0, 3, 1, 2))})
res[0].shape
(1, 1000)
measures_dl += [timeexec("onnx.mobile", "sess.run(None, {'data': array_images[0].transpose((0, 3, 1, 2))})",
                         context=globals(), repeat=3, number=10)]
Average: 35.17 ms deviation 1.32 ms (with 10 runs) in [33.70 ms, 36.90 ms]

Download a pre-converted version SqueezeNet

download_data("squeezenet1.1.onnx",
              url="https://s3.amazonaws.com/onnx-model-zoo/squeezenet/squeezenet1.1/")
'squeezenet1.1.onnx'
sess = onnxruntime.InferenceSession("squeezenet1.1.onnx")
for i in sess.get_inputs():
    print('Input:', i)
for o in sess.get_outputs():
    print('Output:', o)
Input: NodeArg(name='data', type='tensor(float)', shape=[1, 3, 224, 224])
Output: NodeArg(name='squeezenet0_flatten0_reshape0', type='tensor(float)', shape=[1, 1000])
measures_dl += [timeexec("onnx.squeezenet", "sess.run(None, {'data': array_images[0].transpose((0, 3, 1, 2))})",
                         context=globals(), repeat=3, number=10)]
Average: 14.61 ms deviation 1.78 ms (with 10 runs) in [12.82 ms, 17.04 ms]
fig, ax = plt.subplots(1, 1, figsize=(10,3))
df = pandas.DataFrame(data=measures_dl)
df = df.set_index("legend").sort_values("average")
df[["average", "deviation"]].plot(kind="barh", logx=True, ax=ax, xerr="deviation",
                                  legend=False, fontsize=12, width=0.8)
ax.set_ylabel("")
ax.grid(b=True, which="major")
ax.grid(b=True, which="minor")
ax.set_title("Prediction time for one observation\nDeep learning models 224x224x3 (ImageNet)");
../_images/onnx_deploy_151_0.png

Tiny yolo

Source: TinyYOLOv2 on onnx

download_data("tiny_yolov2.tar.gz",
              url="https://onnxzoo.blob.core.windows.net/models/opset_8/tiny_yolov2/")
['.\tiny_yolov2/model.onnx',
 '.\tiny_yolov2/test_data_set_0/input_0.pb',
 '.\tiny_yolov2/test_data_set_0/output_0.pb',
 '.\tiny_yolov2/test_data_set_1/input_0.pb',
 '.\tiny_yolov2/test_data_set_1/output_0.pb',
 '.\tiny_yolov2/test_data_set_2/input_0.pb',
 '.\tiny_yolov2/test_data_set_2/output_0.pb']
sess = onnxruntime.InferenceSession("tiny_yolov2/model.onnx")
for i in sess.get_inputs():
    print('Input:', i)
for o in sess.get_outputs():
    print('Output:', o)
Input: NodeArg(name='image', type='tensor(float)', shape=[None, 3, 416, 416])
Output: NodeArg(name='grid', type='tensor(float)', shape=[None, 125, 13, 13])
from PIL import Image,ImageDraw
img = Image.open('Au-Salon-de-l-agriculture-la-campagne-recrute.jpg')
img
../_images/onnx_deploy_155_0.png
img2 = img.resize((416, 416))
img2
../_images/onnx_deploy_156_0.png
X = numpy.asarray(img2)
X = X.transpose(2,0,1)
X = X.reshape(1,3,416,416)

out = sess.run(None, {'image': X.astype(numpy.float32)})
out = out[0][0]
def display_yolo(img, seuil):
    import numpy as np
    numClasses = 20
    anchors = [1.08, 1.19, 3.42, 4.41, 6.63, 11.38, 9.42, 5.11, 16.62, 10.52]

    def sigmoid(x, derivative=False):
        return x*(1-x) if derivative else 1/(1+np.exp(-x))

    def softmax(x):
        scoreMatExp = np.exp(np.asarray(x))
        return scoreMatExp / scoreMatExp.sum(0)

    clut = [(0,0,0),(255,0,0),(255,0,255),(0,0,255),(0,255,0),(0,255,128),
            (128,255,0),(128,128,0),(0,128,255),(128,0,128),
            (255,0,128),(128,0,255),(255,128,128),(128,255,128),(255,255,0),
            (255,128,128),(128,128,255),(255,128,128),(128,255,128),(128,255,128)]
    label = ["aeroplane","bicycle","bird","boat","bottle",
             "bus","car","cat","chair","cow","diningtable",
             "dog","horse","motorbike","person","pottedplant",
             "sheep","sofa","train","tvmonitor"]

    draw = ImageDraw.Draw(img)
    for cy in range(0,13):
        for cx in range(0,13):
            for b in range(0,5):
                channel = b*(numClasses+5)
                tx = out[channel  ][cy][cx]
                ty = out[channel+1][cy][cx]
                tw = out[channel+2][cy][cx]
                th = out[channel+3][cy][cx]
                tc = out[channel+4][cy][cx]

                x = (float(cx) + sigmoid(tx))*32
                y = (float(cy) + sigmoid(ty))*32

                w = np.exp(tw) * 32 * anchors[2*b  ]
                h = np.exp(th) * 32 * anchors[2*b+1]

                confidence = sigmoid(tc)

                classes = np.zeros(numClasses)
                for c in range(0,numClasses):
                    classes[c] = out[channel + 5 +c][cy][cx]
                    classes = softmax(classes)
                detectedClass = classes.argmax()

                if seuil < classes[detectedClass]*confidence:
                    color =clut[detectedClass]
                    x = x - w/2
                    y = y - h/2
                    draw.line((x  ,y  ,x+w,y ),fill=color, width=3)
                    draw.line((x  ,y  ,x  ,y+h),fill=color, width=3)
                    draw.line((x+w,y  ,x+w,y+h),fill=color, width=3)
                    draw.line((x  ,y+h,x+w,y+h),fill=color, width=3)

    return img
img2 = img.resize((416, 416))
display_yolo(img2, 0.038)
../_images/onnx_deploy_159_0.png

Conclusion

  • ONNX is a working progress, active development
  • ONNX is open source
  • ONNX does not depend on the machine learning framework
  • ONNX provides dedicated runtimes
  • ONNX is fast, available in Python…

Metadata to trace deployed models

meta = sess.get_modelmeta()
meta.description
"The Tiny YOLO network from the paper 'YOLO9000: Better, Faster, Stronger' (2016), arXiv:1612.08242"
meta.producer_name, meta.version
('WinMLTools', 0)