Precision loss due to float32 conversion with ONNX

The notebook studies the loss of precision while converting a non-continuous model into float32. It studies the conversion of GradientBoostingClassifier and then a DecisionTreeRegressor for which a runtime supported float64 was implemented.


We just train such a model on Iris dataset.

We are interested into the probability of the last class.

Conversion to ONNX and comparison to original outputs

Let's extract the probability of the last class.

Let's compare both predictions.

The highest difference is quite high but there is only one.

Why this difference?

The function astype_range returns floats (single floats) around the true value of the orginal features in double floats.

If a decision tree uses a threshold which verifies float32(t) != t, it cannot be converted into single float without discrepencies. The interval [float32(t - |t|*1e-7), float32(t + |t|*1e-7)] is close to all double values converted to the same float32 but every feature x in this interval verifies float32(x) >= float32(t). It is not an issue for continuous machine learned models as all errors usually compensate. For non continuous models, there might some outliers. Next function considers all intervals of input features and randomly chooses one extremity for each of them.

The function draws out 100 input vectors randomly choosing one extremity for each feature. It then sort every row. First column is the lower bound, last column is the upper bound.

We get the same value as before. At least one feature of one observation is really close to one threshold and changes the prediction.

Bigger datasets


This model is much simple than the previous one as it contains only one tree. We study it on the Boston datasets.

The last difference is quite big. Let's reuse function onnx_shaker.

That's consistent. This function is way to retrieve the error due to the conversion into float32 without using the expected values.

Runtime supporting float64 for DecisionTreeRegressor

We prooved that the conversion to float32 introduces discrepencies in a statistical way. But if the runtime supports float64 and not only float32, we should have absolutely no discrepencies. Let's verify that error disappear when the runtime supports an operator handling float64, which is the case for the python runtime for DecisionTreeRegression.

The option rewrite_ops is needed to tell the function the operator we need is not (yet) supported by the official specification of ONNX. TreeEnsembleRegressor only allows float coefficients and we need double coefficients. That's why the function rewrites the converter of this operator and selects the appropriate runtime operator RuntimeTreeEnsembleRegressorDouble. It works as if the ONNX specification was extended to support operator TreeEnsembleRegressorDouble which behaves the same but with double.

The runtime operator is accessible with the following path:

Different from this one:

And the highest absolute difference is now null.


We may wonder if we should extend the ONNX specifications to support double for every operator. However, the fact the model predict a very different value for an observation indicates the prediction cannot be trusted as a very small modification of the input introduces a huge change on the output. I would use a different model. We may also wonder which prediction is the best one compare to the expected value...

Well at the end, it is only luck on that kind of example.