1. Failed to load model with error: Unknown model file format version.

  2. How to deal with a dataframe as input?

Failed to load model with error: Unknown model file format version.

onnxruntime (or runtime='onnxruntime1' with OnnxInference) fails sometimes to load a model showing the following error messsage:

RuntimeError: Unable to create InferenceSession due to '[ONNXRuntimeError] :
2 : INVALID_ARGUMENT : Failed to load model with error: Unknown model file format version.'

This case is due to metadata ir_version which defines the IR_VERSION or ONNX version. When a model is machine learned model is converted, it is usually done with the default version (ir_version) returned by the onnx package. onnxruntime raises the above mentioned error message when this version (ir_version) is too recent. In this case, onnxruntime should be updated to the latest version available or the metadata ir_version can just be changed to a lower number. Th function get_ir_version_from_onnx returns the latest tested version with mlprodict.


from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_iris
from mlprodict.onnxrt import OnnxInference
import numpy

iris = load_iris()
X = iris.data[:, :2]
y = iris.target
lr = LinearRegression()
lr.fit(X, y)

# Conversion into ONNX.
from mlprodict.onnx_conv import to_onnx
model_onnx = to_onnx(lr, X.astype(numpy.float32),
print("ir_version", model_onnx.ir_version)

# Change ir_version
model_onnx.ir_version = 6

# Predictions with onnxruntime
oinf = OnnxInference(model_onnx, runtime='onnxruntime1')
ypred = oinf.run({'X': X[:5].astype(numpy.float32)})
print("ONNX output:", ypred)

# To avoid keep a fixed version number, you can use
# the value returned by function get_ir_version_from_onnx
from mlprodict.tools import get_ir_version_from_onnx
model_onnx.ir_version = get_ir_version_from_onnx()
print("ir_version", model_onnx.ir_version)


    ir_version 6
    ONNX output: {'variable': array([[0.172],
           [0.034]], dtype=float32)}
    ir_version 7

(original entry : asv_options_helper.py:docstring of mlprodict.tools.asv_options_helper.get_ir_version_from_onnx, line 8)

How to deal with a dataframe as input?

Each column of the dataframe is considered as an named input. The first step is to make sure that every column type is correct. pandas tends to select the least generic type to hold the content of one column. ONNX does not automatically cast the data it receives. The data must have the same type with the model is converted and when the converted model receives the data to predict.


from io import StringIO
from textwrap import dedent
import numpy
import pandas
from pyquickhelper.pycode import ExtTestCase
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from mlprodict.onnx_conv import to_onnx
from mlprodict.onnxrt import OnnxInference

text = dedent('''
text = text.replace(

X_train = pandas.read_csv(StringIO(text))
for c in X_train.columns:
    if c != 'color':
        X_train[c] = X_train[c].astype(numpy.float32)
numeric_features = [c for c in X_train if c != 'color']

pipe = Pipeline([
    ("prep", ColumnTransformer([
        ("color", Pipeline([
            ('one', OneHotEncoder()),
            ('select', ColumnTransformer(
                [('sel1', 'passthrough', [0])]))
        ]), ['color']),
        ("others", "passthrough", numeric_features)

pred = pipe.transform(X_train)

model_onnx = to_onnx(pipe, X_train, target_opset=12)
oinf = OnnxInference(model_onnx)

# The dataframe is converted into a dictionary,
# each key is a column name, each value is a numpy array.
inputs = {c: X_train[c].values for c in X_train.columns}
inputs = {c: v.reshape((v.shape[0], 1)) for c, v in inputs.items()}

onxp = oinf.run(inputs)


    [[1.000e+00 7.400e+00 7.000e-01 0.000e+00 1.900e+00 7.600e-02 1.100e+01
      3.400e+01 9.978e-01 3.510e+00 5.600e-01 9.400e+00 5.000e+00]
     [1.000e+00 7.800e+00 8.800e-01 0.000e+00 2.600e+00 9.800e-02 2.500e+01
      6.700e+01 9.968e-01 3.200e+00 6.800e-01 9.800e+00 5.000e+00]
     [1.000e+00 7.800e+00 7.600e-01 4.000e-02 2.300e+00 9.200e-02 1.500e+01
      5.400e+01 9.970e-01 3.260e+00 6.500e-01 9.800e+00 5.000e+00]
     [1.000e+00 1.120e+01 2.800e-01 5.600e-01 1.900e+00 7.500e-02 1.700e+01
      6.000e+01 9.980e-01 3.160e+00 5.800e-01 9.800e+00 6.000e+00]]
    {'transformed_column': array([[1.000e+00, 7.400e+00, 7.000e-01, 0.000e+00, 1.900e+00, 7.600e-02,
            1.100e+01, 3.400e+01, 9.978e-01, 3.510e+00, 5.600e-01, 9.400e+00,
           [1.000e+00, 7.800e+00, 8.800e-01, 0.000e+00, 2.600e+00, 9.800e-02,
            2.500e+01, 6.700e+01, 9.968e-01, 3.200e+00, 6.800e-01, 9.800e+00,
           [1.000e+00, 7.800e+00, 7.600e-01, 4.000e-02, 2.300e+00, 9.200e-02,
            1.500e+01, 5.400e+01, 9.970e-01, 3.260e+00, 6.500e-01, 9.800e+00,
           [1.000e+00, 1.120e+01, 2.800e-01, 5.600e-01, 1.900e+00, 7.500e-02,
            1.700e+01, 6.000e+01, 9.980e-01, 3.160e+00, 5.800e-01, 9.800e+00,
            6.000e+00]], dtype=float32)}

(original entry : convert.py:docstring of mlprodict.onnx_conv.convert.to_onnx, line 41)