Deploy machine learned models with ONNX

Links: notebook, html, python, slides, GitHub

Xavier Dupré - Senior Data Scientist at Microsoft - Computer Science Teacher at ENSAE

Most of machine learning libraries are optimized to train models and not necessarily to use them for fast predictions in online web services. ONNX is one solution started last year by Microsoft and Facebook. This presentation describes the concept and shows some examples with scikit-learn and ML.net.

La plupart des libraires de machine learning sont optimisées pour entraîner des modèles et pas nécessairement les utiliser dans des sites internet online où l’exigence de rapidité est importante. ONNX, une initiative open source proposée l’année dernière par Microsoft et Facebook est une réponse à ce problème. Ce talk illustrera ce concept avec un démo mêlant deep learning, scikit-learn et ML.net, la librairie de machine learning open source écrite en C# et développée par Microsoft.

from jyquickhelper import add_notebook_menu
add_notebook_menu(last_level=2)
from pyquickhelper.helpgen import NbImage

Open source tools in this talk

import keras, lightgbm, onnx, skl2onnx, onnxruntime, sklearn, torch, xgboost
mods = [keras, lightgbm, onnx, skl2onnx, onnxruntime, sklearn, torch, xgboost]
for m in mods:
    print(m.__name__, m.__version__)
Using TensorFlow backend.
c:python372_x64libsite-packagestensorflowpythonframeworkdtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
c:python372_x64libsite-packagestensorflowpythonframeworkdtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
c:python372_x64libsite-packagestensorflowpythonframeworkdtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
c:python372_x64libsite-packagestensorflowpythonframeworkdtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
c:python372_x64libsite-packagestensorflowpythonframeworkdtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
c:python372_x64libsite-packagestensorflowpythonframeworkdtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
keras 2.2.5
lightgbm 2.2.3
onnx 1.5.1
skl2onnx 1.5.9999
onnxruntime 0.5.0
sklearn 0.22.dev0
torch 1.1.0
xgboost 0.90

ML.net

  • Open source in 2018

  • ML.net

  • Machine learning library written in C#

  • Used in many places in Microsoft Services (Bing, …)

  • Working on it for three years

NbImage("mlnet.png", width=500)
../_images/onnx_deploy_8_0.png

onnx

  • Serialisation library specialized for machine learning based on Google.Protobuf

  • Open source in 2017

  • onnx

NbImage("onnx.png", width=500)
../_images/onnx_deploy_10_0.png

sklearn-learn

  • Open source in 2018

  • Converters for scikit-learn models

  • sklearn-onnx

NbImage("sklearn-onnx.png")
../_images/onnx_deploy_12_0.png

onnxruntime

NbImage("onnxruntime.png", width=400)
../_images/onnx_deploy_14_0.png

The problem about deployment

Learn and predict

  • Two different purposes not necessarily aligned for optimization

  • Learn : computation optimized for large number of observations (batch prediction)

  • Predict : computation optimized for one observation (one-off prediction)

  • Machine learning libraries optimize the learn scenario.

Illustration with a linear regression

We consider a datasets available in scikit-learn: diabetes

measures_lr = []
from sklearn.datasets import load_diabetes
diabetes = load_diabetes()
diabetes_X_train = diabetes.data[:-20]
diabetes_X_test  = diabetes.data[-20:]
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test  = diabetes.target[-20:]
diabetes_X_train[:1]
array([[ 0.03807591,  0.05068012,  0.06169621,  0.02187235, -0.0442235 ,
        -0.03482076, -0.04340085, -0.00259226,  0.01990842, -0.01764613]])

scikit-learn

from sklearn.linear_model import LinearRegression
clr = LinearRegression()
clr.fit(diabetes_X_train, diabetes_y_train)
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
clr.predict(diabetes_X_test[:1])
array([197.61846908])
from jupytalk.benchmark import timeexec
measures_lr += [timeexec("sklearn",
                         "clr.predict(diabetes_X_test[:1])",
                         context=globals())]
Average: 49.57 µs deviation 10.27 µs (with 50 runs) in [41.56 µs, 67.88 µs]

pure python

def python_prediction(X, coef, intercept):
    s = intercept
    for a, b in zip(X, coef):
        s += a * b
    return s

python_prediction(diabetes_X_test[0], clr.coef_, clr.intercept_)
197.61846907503298
measures_lr += [timeexec("python", "python_prediction(diabetes_X_test[0], clr.coef_, clr.intercept_)",
                         context=globals())]
Average: 7.74 µs deviation 2.85 µs (with 50 runs) in [5.91 µs, 14.29 µs]

Summary

import pandas
df = pandas.DataFrame(data=measures_lr)
df = df.set_index("legend").sort_values("average")
df
average deviation first first3 last3 repeat min5 max5 code run
legend
python 0.000008 0.000003 0.000018 0.000011 0.000007 200 0.000006 0.000014 python_prediction(diabetes_X_test[0], clr.coef... 50
sklearn 0.000050 0.000010 0.000111 0.000081 0.000049 200 0.000042 0.000068 clr.predict(diabetes_X_test[:1]) 50
%matplotlib inline
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1, figsize=(10,3))
df[["average", "deviation"]].plot(kind="barh", logx=True, ax=ax, xerr="deviation",
                                  legend=False, fontsize=12, width=0.8)
ax.set_ylabel("")
ax.grid(b=True, which="major")
ax.grid(b=True, which="minor")
ax.set_title("Prediction time for one observation\nLinear Regression");
../_images/onnx_deploy_30_0.png

Illustration with a random forest

measures_rf = []

scikit-learn

from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators=10)
rf.fit(diabetes_X_train, diabetes_y_train)
RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',
                      max_depth=None, max_features='auto', max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, n_estimators=10,
                      n_jobs=None, oob_score=False, random_state=None,
                      verbose=0, warm_start=False)
measures_rf += [timeexec("sklearn", "rf.predict(diabetes_X_test[:1])",
                         context=globals())]
Average: 722.81 µs deviation 70.12 µs (with 50 runs) in [659.65 µs, 863.66 µs]

XGBoost

from xgboost import XGBRegressor
xg = XGBRegressor(n_estimators=10)
xg.fit(diabetes_X_train, diabetes_y_train)
[18:06:37] WARNING: d:buildxgboostxgboost-0.90.gitsrcobjectiveregression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0,
             importance_type='gain', learning_rate=0.1, max_delta_step=0,
             max_depth=3, min_child_weight=1, missing=None, n_estimators=10,
             n_jobs=1, nthread=None, objective='reg:linear', random_state=0,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
             silent=None, subsample=1, verbosity=1)
measures_rf += [timeexec("xgboost", "xg.predict(diabetes_X_test[:1])",
                         context=globals())]
Average: 75.20 µs deviation 21.65 µs (with 50 runs) in [63.63 µs, 102.74 µs]

LightGBM

from lightgbm import LGBMRegressor
lg = LGBMRegressor(n_estimators=10)
lg.fit(diabetes_X_train, diabetes_y_train)
LGBMRegressor(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,
              importance_type='split', learning_rate=0.1, max_depth=-1,
              min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0,
              n_estimators=10, n_jobs=-1, num_leaves=31, objective=None,
              random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True,
              subsample=1.0, subsample_for_bin=200000, subsample_freq=0)
measures_rf += [timeexec("lightgbm", "lg.predict(diabetes_X_test[:1])",
                         context=globals())]
Average: 115.30 µs deviation 22.88 µs (with 50 runs) in [101.76 µs, 138.91 µs]

pure python

This would require to reimplement the prediction function.

Summary

df = pandas.DataFrame(data=measures_rf)
df = df.set_index("legend").sort_values("average")
df
average deviation first first3 last3 repeat min5 max5 code run
legend
xgboost 0.000075 0.000022 0.000253 0.000181 0.000073 200 0.000064 0.000103 xg.predict(diabetes_X_test[:1]) 50
lightgbm 0.000115 0.000023 0.000358 0.000228 0.000111 200 0.000102 0.000139 lg.predict(diabetes_X_test[:1]) 50
sklearn 0.000723 0.000070 0.001144 0.000866 0.000697 200 0.000660 0.000864 rf.predict(diabetes_X_test[:1]) 50
fig, ax = plt.subplots(1, 1, figsize=(10,3))
df[["average", "deviation"]].plot(kind="barh", logx=True, ax=ax, xerr="deviation",
                                  legend=False, fontsize=12, width=0.8)
ax.set_ylabel("")
ax.grid(b=True, which="major")
ax.grid(b=True, which="minor")
ax.set_title("Prediction time for one observation\nRandom Forest (10 trees)");
../_images/onnx_deploy_45_0.png

Keep in mind

  • Trained trees are not necessarily the same.

  • Performance is not compared.

  • Order of magnitude is important here.

What is batch prediction?

  • Instead of running N times 1 prediction

  • We run 1 time N predictions

import numpy
memo = []
batch = [1, 2, 5, 7, 8, 10, 100, 200, 500, 1000, 2000,
         3000, 4000, 5000, 10000, 20000, 50000,
         100000, 200000, 400000, ]

number = 10
repeat = 10
for i in batch:
    if i <= diabetes_X_test.shape[0]:
        mx = diabetes_X_test[:i]
    else:
        mxs = [diabetes_X_test] * (i // diabetes_X_test.shape[0] + 1)
        mx = numpy.vstack(mxs)
        mx = mx[:i]

    print("batch", "=", i)
    number = 10 if i <= 10000 else 2

    memo.append(timeexec("sklearn %d" % i, "rf.predict(mx)",
                         context=globals(), number=number, repeat=repeat))
    memo[-1]["batch"] = i
    memo[-1]["lib"] = "sklearn"

    memo.append(timeexec("xgboost %d" % i, "xg.predict(mx)",
                         context=globals(), number=number, repeat=repeat))
    memo[-1]["batch"] = i
    memo[-1]["lib"] = "xgboost"

    memo.append(timeexec("lightgbm %d" % i, "lg.predict(mx)",
                         context=globals(), number=number, repeat=repeat))
    memo[-1]["batch"] = i
    memo[-1]["lib"] = "lightgbm"
batch = 1
Average: 953.61 µs deviation 342.68 µs (with 10 runs) in [646.32 µs, 1.70 ms]
Average: 84.95 µs deviation 39.70 µs (with 10 runs) in [64.52 µs, 203.46 µs]
Average: 204.81 µs deviation 104.45 µs (with 10 runs) in [103.63 µs, 479.39 µs]
batch = 2
Average: 897.44 µs deviation 387.43 µs (with 10 runs) in [656.91 µs, 1.99 ms]
Average: 88.51 µs deviation 45.46 µs (with 10 runs) in [63.27 µs, 219.08 µs]
Average: 232.66 µs deviation 97.42 µs (with 10 runs) in [116.00 µs, 373.58 µs]
batch = 5
Average: 996.89 µs deviation 428.41 µs (with 10 runs) in [667.90 µs, 2.02 ms]
Average: 89.49 µs deviation 59.17 µs (with 10 runs) in [64.93 µs, 266.42 µs]
Average: 210.64 µs deviation 97.11 µs (with 10 runs) in [116.23 µs, 418.22 µs]
batch = 7
Average: 938.64 µs deviation 420.20 µs (with 10 runs) in [642.86 µs, 2.10 ms]
Average: 94.72 µs deviation 48.08 µs (with 10 runs) in [64.86 µs, 232.49 µs]
Average: 261.14 µs deviation 185.46 µs (with 10 runs) in [114.26 µs, 745.33 µs]
batch = 8
Average: 1.13 ms deviation 524.41 µs (with 10 runs) in [661.64 µs, 2.22 ms]
Average: 172.21 µs deviation 91.67 µs (with 10 runs) in [78.72 µs, 414.16 µs]
Average: 308.47 µs deviation 123.92 µs (with 10 runs) in [147.59 µs, 528.34 µs]
batch = 10
Average: 1.04 ms deviation 479.01 µs (with 10 runs) in [668.20 µs, 2.24 ms]
Average: 89.91 µs deviation 60.85 µs (with 10 runs) in [66.34 µs, 271.84 µs]
Average: 206.34 µs deviation 97.44 µs (with 10 runs) in [121.31 µs, 468.34 µs]
batch = 100
Average: 991.80 µs deviation 360.13 µs (with 10 runs) in [748.01 µs, 1.94 ms]
Average: 157.97 µs deviation 88.16 µs (with 10 runs) in [95.79 µs, 397.26 µs]
Average: 453.24 µs deviation 146.52 µs (with 10 runs) in [231.33 µs, 657.38 µs]
batch = 200
Average: 1.09 ms deviation 440.39 µs (with 10 runs) in [791.74 µs, 2.16 ms]
Average: 205.30 µs deviation 87.04 µs (with 10 runs) in [127.18 µs, 411.09 µs]
Average: 562.89 µs deviation 161.42 µs (with 10 runs) in [344.84 µs, 782.15 µs]
batch = 500
Average: 1.15 ms deviation 394.69 µs (with 10 runs) in [881.27 µs, 2.24 ms]
Average: 350.38 µs deviation 100.80 µs (with 10 runs) in [225.70 µs, 513.16 µs]
Average: 872.18 µs deviation 151.07 µs (with 10 runs) in [689.05 µs, 1.19 ms]
batch = 1000
Average: 1.34 ms deviation 437.25 µs (with 10 runs) in [1.11 ms, 2.61 ms]
Average: 490.41 µs deviation 95.86 µs (with 10 runs) in [370.00 µs, 681.28 µs]
Average: 1.50 ms deviation 335.43 µs (with 10 runs) in [1.27 ms, 2.46 ms]
batch = 2000
Average: 1.94 ms deviation 641.59 µs (with 10 runs) in [1.51 ms, 3.69 ms]
Average: 798.91 µs deviation 127.78 µs (with 10 runs) in [661.21 µs, 1.07 ms]
Average: 2.63 ms deviation 263.64 µs (with 10 runs) in [2.41 ms, 3.36 ms]
batch = 3000
Average: 2.23 ms deviation 698.40 µs (with 10 runs) in [1.88 ms, 4.27 ms]
Average: 1.13 ms deviation 123.08 µs (with 10 runs) in [978.34 µs, 1.31 ms]
Average: 3.81 ms deviation 263.33 µs (with 10 runs) in [3.60 ms, 4.55 ms]
batch = 4000
Average: 2.64 ms deviation 750.52 µs (with 10 runs) in [2.25 ms, 4.84 ms]
Average: 1.47 ms deviation 224.88 µs (with 10 runs) in [1.20 ms, 2.07 ms]
Average: 5.03 ms deviation 328.90 µs (with 10 runs) in [4.79 ms, 5.93 ms]
batch = 5000
Average: 2.99 ms deviation 578.87 µs (with 10 runs) in [2.65 ms, 4.70 ms]
Average: 1.93 ms deviation 256.40 µs (with 10 runs) in [1.70 ms, 2.54 ms]
Average: 6.27 ms deviation 442.01 µs (with 10 runs) in [5.92 ms, 7.37 ms]
batch = 10000
Average: 4.85 ms deviation 733.32 µs (with 10 runs) in [4.50 ms, 7.04 ms]
Average: 3.55 ms deviation 212.67 µs (with 10 runs) in [3.35 ms, 4.12 ms]
Average: 12.29 ms deviation 420.68 µs (with 10 runs) in [11.85 ms, 13.22 ms]
batch = 20000
Average: 9.18 ms deviation 1.51 ms (with 2 runs) in [8.16 ms, 12.96 ms]
Average: 7.97 ms deviation 813.84 µs (with 2 runs) in [7.30 ms, 9.61 ms]
Average: 24.65 ms deviation 1.19 ms (with 2 runs) in [23.15 ms, 27.56 ms]
batch = 50000
Average: 21.37 ms deviation 2.06 ms (with 2 runs) in [20.20 ms, 27.46 ms]
Average: 19.05 ms deviation 612.96 µs (with 2 runs) in [18.46 ms, 20.77 ms]
Average: 61.02 ms deviation 2.18 ms (with 2 runs) in [58.98 ms, 64.68 ms]
batch = 100000
Average: 41.20 ms deviation 2.00 ms (with 2 runs) in [39.74 ms, 46.70 ms]
Average: 39.38 ms deviation 1.30 ms (with 2 runs) in [38.42 ms, 43.20 ms]
Average: 119.68 ms deviation 3.57 ms (with 2 runs) in [116.67 ms, 129.01 ms]
batch = 200000
Average: 97.21 ms deviation 2.31 ms (with 2 runs) in [95.54 ms, 102.76 ms]
Average: 78.89 ms deviation 1.36 ms (with 2 runs) in [77.45 ms, 82.46 ms]
Average: 241.92 ms deviation 8.90 ms (with 2 runs) in [235.75 ms, 267.51 ms]
batch = 400000
Average: 198.69 ms deviation 9.44 ms (with 2 runs) in [191.57 ms, 224.58 ms]
Average: 159.20 ms deviation 2.85 ms (with 2 runs) in [156.50 ms, 165.55 ms]
Average: 483.20 ms deviation 17.04 ms (with 2 runs) in [473.16 ms, 533.38 ms]
dfb = pandas.DataFrame(memo)[["average", "lib", "batch"]]
piv = dfb.pivot("batch", "lib", "average")
for c in piv.columns:
    piv["ave_" + c] = piv[c] / piv.index
libs = list(c for c in piv.columns if "ave_" in c)
ax = piv.plot(y=libs, logy=True, logx=True, figsize=(10, 5))
ax.set_title("Computation time per observation when computed in a batch")
ax.set_ylabel("sec")
ax.set_xlabel("batch size")
ax.grid(True);
../_images/onnx_deploy_49_0.png

ONNX

ONNX = language to describe models

  • Standard format to describe machine learning

  • Easier to exchange, export

ONNX = machine learning oriented

Can represent any mathematical function handling numerical and text features.

NbImage("onnxop.png", width=600)
../_images/onnx_deploy_53_0.png

actively supported

  • Microsoft

  • Facebook

  • first created to deploy deep learning models

  • extended to other models

Train somewhere, predict somewhere else

Cannot optimize the code for both training and predicting.

Training

Predicting

Batch prediction

One-off prediction

Huge memory

Small memory

Huge data

Small data

.

High latency

Libraries for predictions

  • Optimized for predictions

  • Optimized for a device

ONNX Runtime

ONNX Runtime for inferencing machine learning models now in preview

Dedicated runtime for:

  • CPU

  • GPU

NbImage("onnxrt.png", width=800)
../_images/onnx_deploy_59_0.png

ONNX demo on random forest

rf
RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',
                      max_depth=None, max_features='auto', max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, n_estimators=10,
                      n_jobs=None, oob_score=False, random_state=None,
                      verbose=0, warm_start=False)

Conversion to ONNX

onnxmltools

from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
model_onnx = convert_sklearn(rf, "rf_diabetes",
                             [('input', FloatTensorType([1, 10]))])
The maximum opset needed by this model is only 1.
print(str(model_onnx)[:450] + "\n...")
ir_version: 6
producer_name: "skl2onnx"
producer_version: "1.5.9999"
domain: "ai.onnx"
model_version: 0
doc_string: ""
graph {
  node {
    input: "input"
    output: "variable"
    name: "TreeEnsembleRegressor"
    op_type: "TreeEnsembleRegressor"
    attribute {
      name: "n_targets"
      i: 1
      type: INT
    }
    attribute {
      name: "nodes_falsenodeids"
      ints: 240
      ints: 199
      ints: 104
      ints: 13
      ints: 10
...

Save the model

def save_model(model, filename):
    with open(filename, "wb") as f:
        f.write(model.SerializeToString())

save_model(model_onnx, 'rf_sklearn.onnx')

Computes predictions

import onnxruntime

sess = onnxruntime.InferenceSession("rf_sklearn.onnx")

for i in sess.get_inputs():
    print('Input:', i)
for o in sess.get_outputs():
    print('Output:', o)
Input: NodeArg(name='input', type='tensor(float)', shape=[1, 10])
Output: NodeArg(name='variable', type='tensor(float)', shape=[1, 1])
import numpy

def predict_onnxrt(x):
    return sess.run(["variable"], {'input': x})

print("Prediction:", predict_onnxrt(diabetes_X_test[:1].astype(numpy.float32)))
Prediction: [array([[222.40002]], dtype=float32)]
measures_rf += [timeexec("onnx", "predict_onnxrt(diabetes_X_test[:1].astype(numpy.float32))",
                         context=globals())]
Average: 15.52 µs deviation 4.74 µs (with 50 runs) in [12.86 µs, 24.94 µs]
fig, ax = plt.subplots(1, 1, figsize=(10,3))
df = pandas.DataFrame(data=measures_rf)
df = df.set_index("legend").sort_values("average")
df[["average", "deviation"]].plot(kind="barh", logx=True, ax=ax, xerr="deviation",
                                  legend=False, fontsize=12, width=0.8)
ax.set_ylabel("")
ax.grid(b=True, which="major")
ax.grid(b=True, which="minor")
ax.set_title("Prediction time for one observation\nRandom Forest (10 trees)");
../_images/onnx_deploy_72_0.png

Deep learning

  • transfer learning with keras

  • orther convert pytorch, caffee…

measures_dl = []
from keras.applications.mobilenet_v2 import MobileNetV2
model = MobileNetV2(input_shape=None, alpha=1.0, include_top=True,
                    weights='imagenet', input_tensor=None,
                    pooling=None, classes=1000)
model
WARNING:tensorflow:From c:python372_x64libsite-packagestensorflowpythonframeworkop_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
<keras.engine.training.Model at 0x2bd48e445c0>
from pyensae.datasource import download_data
import os
if not os.path.exists("simages/noclass"):
    os.makedirs("simages/noclass")
images = download_data("dog-cat-pixabay.zip",
                       whereTo="simages/noclass")
from mlinsights.plotting import plot_gallery_images
plot_gallery_images(images[:7]);
../_images/onnx_deploy_77_0.png
from keras.preprocessing.image import ImageDataGenerator
import numpy
params = dict(rescale=1./255)
augmenting_datagen = ImageDataGenerator(**params)
flow = augmenting_datagen.flow_from_directory('simages', batch_size=1, target_size=(224, 224),
                                              classes=['noclass'], shuffle=False)
imgs = [img[0][0] for i, img in zip(range(0,31), flow)]
Found 31 images belonging to 1 classes.
array_images = [im[numpy.newaxis, :, :, :] for im in imgs]
array_images[0].shape
(1, 224, 224, 3)
outputs = [model.predict(im) for im in array_images]
outputs[0].shape
(1, 1000)
outputs[0].ravel()[:10]
array([3.5999392e-04, 1.2039390e-03, 1.2471771e-04, 6.1937310e-05,
       1.1310327e-03, 1.7601105e-04, 1.9819096e-04, 1.4307754e-04,
       5.5190694e-04, 1.7074020e-04], dtype=float32)

Let’s measure time.

from jupytalk.benchmark import timeexec
measures_dl += [timeexec("keras.mobilenet", "model.predict(array_images[0])",
                         context=globals(), repeat=3, number=10)]
Average: 188.01 ms deviation 8.88 ms (with 10 runs) in [176.64 ms, 198.31 ms]
from keras2onnx import convert_keras
try:
    konnx = convert_keras(model, "mobilev2")
except ValueError as e:
    # keras updated its version on
    print(e)

Let’s switch to pytorch.

import torchvision.models as models
modelt = models.squeezenet1_1(pretrained=True)
modelt.classifier
Sequential(
  (0): Dropout(p=0.5)
  (1): Conv2d(512, 1000, kernel_size=(1, 1), stride=(1, 1))
  (2): ReLU(inplace)
  (3): AdaptiveAvgPool2d(output_size=(1, 1))
)
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
trans = transforms.Compose([transforms.Resize((224, 224)),
                            transforms.CenterCrop(224),
                            transforms.ToTensor()])
imgs = datasets.ImageFolder("simages", trans)
dataloader = DataLoader(imgs, batch_size=1, shuffle=False, num_workers=1)
img_seq = iter(dataloader)
imgs = list(img[0] for img in img_seq)
all_outputs = [modelt.forward(img).detach().numpy().ravel() for img in imgs[:2]]
all_outputs[0].shape
(1000,)
measures_dl += [timeexec("pytorch.squeezenet", "modelt.forward(imgs[0]).detach().numpy().ravel()",
                         context=globals(), repeat=3, number=10)]
Average: 80.83 ms deviation 7.22 ms (with 10 runs) in [74.14 ms, 90.85 ms]

Let’s convert into ONNX.

import torch.onnx
from torch.autograd import Variable
input_names = [ "actual_input_1" ]
output_names = [ "output1" ]
dummy_input = Variable(torch.randn(10, 3, 224, 224))

try:
    torch.onnx.export(modelt, dummy_input, "resnet18.onnx", verbose=False,
                      input_names=input_names, output_names=output_names)
except Exception as e:
    print(str(e).split('\n')[0])

Well… work in progress.

Model zoo

Converted Models

NbImage("zoo.png", width=800)
../_images/onnx_deploy_95_0.png

MobileNet and SqueezeNet

Download a pre-converted version MobileNetv2

download_data("mobilenetv2-1.0.onnx",
              url="https://s3.amazonaws.com/onnx-model-zoo/mobilenet/mobilenetv2-1.0/")
'mobilenetv2-1.0.onnx'
sess = onnxruntime.InferenceSession("mobilenetv2-1.0.onnx")
for i in sess.get_inputs():
    print('Input:', i)
for o in sess.get_outputs():
    print('Output:', o)
Input: NodeArg(name='data', type='tensor(float)', shape=[1, 3, 224, 224])
Output: NodeArg(name='mobilenetv20_output_flatten0_reshape0', type='tensor(float)', shape=[1, 1000])
print(array_images[0].shape)
print(array_images[0].transpose((0, 3, 1, 2)).shape)
(1, 224, 224, 3)
(1, 3, 224, 224)
res = sess.run(None, {'data': array_images[0].transpose((0, 3, 1, 2))})
res[0].shape
(1, 1000)
measures_dl += [timeexec("onnx.mobile", "sess.run(None, {'data': array_images[0].transpose((0, 3, 1, 2))})",
                         context=globals(), repeat=3, number=10)]
Average: 35.27 ms deviation 1.72 ms (with 10 runs) in [33.71 ms, 37.66 ms]

Download a pre-converted version SqueezeNet

download_data("squeezenet1.1.onnx",
              url="https://s3.amazonaws.com/onnx-model-zoo/squeezenet/squeezenet1.1/")
'squeezenet1.1.onnx'
sess = onnxruntime.InferenceSession("squeezenet1.1.onnx")
for i in sess.get_inputs():
    print('Input:', i)
for o in sess.get_outputs():
    print('Output:', o)
Input: NodeArg(name='data', type='tensor(float)', shape=[1, 3, 224, 224])
Output: NodeArg(name='squeezenet0_flatten0_reshape0', type='tensor(float)', shape=[1, 1000])
measures_dl += [timeexec("onnx.squeezenet", "sess.run(None, {'data': array_images[0].transpose((0, 3, 1, 2))})",
                         context=globals(), repeat=3, number=10)]
Average: 10.86 ms deviation 855.71 µs (with 10 runs) in [10.21 ms, 12.07 ms]
fig, ax = plt.subplots(1, 1, figsize=(10,3))
df = pandas.DataFrame(data=measures_dl)
df = df.set_index("legend").sort_values("average")
df[["average", "deviation"]].plot(kind="barh", logx=True, ax=ax, xerr="deviation",
                                  legend=False, fontsize=12, width=0.8)
ax.set_ylabel("")
ax.grid(b=True, which="major")
ax.grid(b=True, which="minor")
ax.set_title("Prediction time for one observation\nDeep learning models 224x224x3 (ImageNet)");
../_images/onnx_deploy_108_0.png

Tiny yolo

Source: TinyYOLOv2 on onnx

download_data("tiny_yolov2.tar.gz",
              url="https://onnxzoo.blob.core.windows.net/models/opset_8/tiny_yolov2/")
['.\tiny_yolov2/./Model.onnx',
 '.\tiny_yolov2/./test_data_set_2/input_0.pb',
 '.\tiny_yolov2/./test_data_set_2/output_0.pb',
 '.\tiny_yolov2/./test_data_set_1/input_0.pb',
 '.\tiny_yolov2/./test_data_set_1/output_0.pb',
 '.\tiny_yolov2/./test_data_set_0/input_0.pb',
 '.\tiny_yolov2/./test_data_set_0/output_0.pb']
sess = onnxruntime.InferenceSession("tiny_yolov2/Model.onnx")
for i in sess.get_inputs():
    print('Input:', i)
for o in sess.get_outputs():
    print('Output:', o)
Input: NodeArg(name='image', type='tensor(float)', shape=['None', 3, 416, 416])
Output: NodeArg(name='grid', type='tensor(float)', shape=['None', 125, 13, 13])
from PIL import Image,ImageDraw
img = Image.open('Au-Salon-de-l-agriculture-la-campagne-recrute.jpg')
img
../_images/onnx_deploy_112_0.png
img2 = img.resize((416, 416))
img2
../_images/onnx_deploy_113_0.png
X = numpy.asarray(img2)
X = X.transpose(2,0,1)
X = X.reshape(1,3,416,416)

out = sess.run(None, {'image': X.astype(numpy.float32)})
out = out[0][0]
def display_yolo(img, seuil):
    import numpy as np
    numClasses = 20
    anchors = [1.08, 1.19, 3.42, 4.41, 6.63, 11.38, 9.42, 5.11, 16.62, 10.52]

    def sigmoid(x, derivative=False):
        return x*(1-x) if derivative else 1/(1+np.exp(-x))

    def softmax(x):
        scoreMatExp = np.exp(np.asarray(x))
        return scoreMatExp / scoreMatExp.sum(0)

    clut = [(0,0,0),(255,0,0),(255,0,255),(0,0,255),(0,255,0),(0,255,128),
            (128,255,0),(128,128,0),(0,128,255),(128,0,128),
            (255,0,128),(128,0,255),(255,128,128),(128,255,128),(255,255,0),
            (255,128,128),(128,128,255),(255,128,128),(128,255,128),(128,255,128)]
    label = ["aeroplane","bicycle","bird","boat","bottle",
             "bus","car","cat","chair","cow","diningtable",
             "dog","horse","motorbike","person","pottedplant",
             "sheep","sofa","train","tvmonitor"]

    draw = ImageDraw.Draw(img)
    for cy in range(0,13):
        for cx in range(0,13):
            for b in range(0,5):
                channel = b*(numClasses+5)
                tx = out[channel  ][cy][cx]
                ty = out[channel+1][cy][cx]
                tw = out[channel+2][cy][cx]
                th = out[channel+3][cy][cx]
                tc = out[channel+4][cy][cx]

                x = (float(cx) + sigmoid(tx))*32
                y = (float(cy) + sigmoid(ty))*32

                w = np.exp(tw) * 32 * anchors[2*b  ]
                h = np.exp(th) * 32 * anchors[2*b+1]

                confidence = sigmoid(tc)

                classes = np.zeros(numClasses)
                for c in range(0,numClasses):
                    classes[c] = out[channel + 5 +c][cy][cx]
                    classes = softmax(classes)
                detectedClass = classes.argmax()

                if seuil < classes[detectedClass]*confidence:
                    color =clut[detectedClass]
                    x = x - w/2
                    y = y - h/2
                    draw.line((x  ,y  ,x+w,y ),fill=color, width=3)
                    draw.line((x  ,y  ,x  ,y+h),fill=color, width=3)
                    draw.line((x+w,y  ,x+w,y+h),fill=color, width=3)
                    draw.line((x  ,y+h,x+w,y+h),fill=color, width=3)

    return img
img2 = img.resize((416, 416))
display_yolo(img2, 0.038)
../_images/onnx_deploy_116_0.png

Conclusion

  • ONNX is a working progress, active development

  • ONNX is open source

  • ONNX does not depend on the machine learning framework

  • ONNX provides dedicated runtimes

  • ONNX is fast, available in Python…

Metadata to trace deployed models

meta = sess.get_modelmeta()
meta.description
"The Tiny YOLO network from the paper 'YOLO9000: Better, Faster, Stronger' (2016), arXiv:1612.08242"
meta.producer_name, meta.version
('OnnxMLTools', 0)