and ONNX

ONNX format provides a way to describe a machine learned model. The main purpose is to deploy model into production in such a way that it is optimized to compute predictions.

About ONNX

Every machine learned model can be described as a sequence of basic numerical operations: +, *, … Let’s see for example what it looks like for a linear regression. Let’s first train a model:


from sklearn.datasets import load_diabetes
diabetes = load_diabetes()
diabetes_X_train =[:-20]
diabetes_X_test =[-20:]
diabetes_y_train =[:-20]
diabetes_y_test =[-20:]

from sklearn.linear_model import LinearRegression
clr = LinearRegression(), diabetes_y_train)


    LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,

The model is trained and we can display the coefficients.




    [ 3.03499549e-01 -2.37639315e+02  5.10530605e+02  3.27736980e+02
     -8.14131709e+02  4.92814588e+02  1.02848452e+02  1.84606489e+02
      7.43519617e+02  7.60951722e+01]

The model can be deployed as is with the module scikit-learn. It is simple but slow, more than 10 times than a pure python implementation for this particular example.


from textwrap import wrap
code = str(clr.intercept_) + " + " + \
    " + ".join("x[{0}]*({1})".format(i, c) for i, c in enumerate(clr.coef_))


    152.76430691633442 + x[0]*(0.3034995490660432) +
    x[1]*(-237.63931533353403) + x[2]*(510.5306054362253) +
    x[3]*(327.7369804093466) + x[4]*(-814.1317093725387) +
    x[5]*(492.81458798373217) + x[6]*(102.8484521916802) +
    x[7]*(184.60648905984) + x[8]*(743.519616750542) +

Next figure explores various rewriting of this linear models, including C++ ones with numba or cffi and AVX instructions.

This solution is still tied to Python even though it reduces the number of dependencies. It is one option some followed when that was really needed. Linear models are easy, decision trees, random forests a little bit less, deep learning models even less. It is now a common need and that what be worth having a common solution.

That’s where ONNX takes place. It provides a common way to describe machine learning models with high level functions specialied for machine learning: onnx ml functions.

ONNX description of a linear model

Module onnxmltools implements a subset of machine learned models for scikit-learn or lightgbm. The conversion requires the user to give a name and the input shape.


from onnxmltools import convert_sklearn
from onnxmltools.utils import save_model
from onnxmltools.convert.common.data_types import FloatTensorType

onnx_model = convert_sklearn(clr, 'linear regression',
                             [('input', FloatTensorType([1, 10]))])
save_model(onnx_model, 'lr_diabete.onnx')


    The maximum opset needed by this model is only 1.

Let’s see what the ONNX format looks like by using module onnx.


import onnx
model = onnx.load('lr_diabete.onnx')

The result shows one main function which is a linear regression. Every coefficient is converted by default into floats. ONNX assumes every machine learned models can be described by a set of these functions or more precisely a pipeline. It also describes the input and output.

ONNX conversion with is a machine learning library written in C#. It implements many learners (see Components) which can be run from C# or from the command line. Let’s first split the dataset into train and test then save it on disk.


from sklearn.datasets import load_diabetes
from pandas import DataFrame
from sklearn.model_selection import train_test_split

diabetes = load_diabetes()
df = DataFrame(, columns=[
               "F%d" % i for i in range([1])])
df["Label"] =
df_train, df_test = train_test_split(df)
df_train.to_csv("diabetes_train.csv", index=False)
df_test.to_csv("diabetes_test.csv", index=False)



The following command line trains a model, evaluates it on the test set, saves it as a zip format and finally convert it into ONNX format.



cmd = traintest{
    data = diabetes_train.csv
    test = diabetes_test.csv
    loader = text{col = Label: R4: 10 col = Features: R4: 0-9 header = + sep = , }
    tr = ols
    out =

cmd = saveonnx{
    in =
    onnx = lr_diabete_cs.onnx
    domain =
    idrop = Label
    odrop = Features1

Let’s display the outcome. Parameters idrop and odrop defines which input and output are not necessary.


import onnx
model = onnx.load('lr_diabete_cs.onnx')

Two different machine learning libraries produce a similar model finally described the same way. The second one includes a Scaler Transform.

ONNX serialization

ONNX internally relies on :epkg`Google Protobuf` which is used here as an efficient way to serialize the data. The outcome is compact and optimized for a fast access.

ONNX runtime

Once the model is described with a common language, it becomes possible to separate training and testing. The training still happens with a standard machine library, the predictions are computed on a different machine with a dedicated runtime. onnxruntime is one of them which has a python interface. The following example prints the inputs and outputs and then compute the predictions for one random example.


import onnxruntime as rt
import numpy
from sklearn.datasets import load_diabetes

sess = rt.InferenceSession("lr_diabete_cs.onnx")

for i in sess.get_inputs():
    print('Input:', i)
for o in sess.get_outputs():
    print('Output:', o)

X = load_diabetes().data
x = X[:1].astype(numpy.float32)
res =, {'Features': x})
for o, r in zip(sess.get_outputs(), res):
    print(o, "=", r)


    Input: NodeArg(name='Features', type='tensor(float)', shape=[1, 10])
    Output: NodeArg(name='Features1', type='tensor(float)', shape=[1, 10])
    Output: NodeArg(name='Score0', type='tensor(float)', shape=[1, 1])
    NodeArg(name='Features1', type='tensor(float)', shape=[1, 10]) = [[ 0.3438729   1.          0.3617374   0.16564418 -0.28732657 -0.22337231
      -0.2395467  -0.01668718  0.14901626 -0.1301223 ]]
    NodeArg(name='Score0', type='tensor(float)', shape=[1, 1]) = [[210.23141]]

The last result is the expected one. The runtime does not depend on scikit-learn or and runs on CPU or GPU. It is implemented in C++ and is optimized for deep learning.