Discrepencies with ONNX

The notebook shows one example where the conversion leads with discrepencies if default options are used. It converts a pipeline with two steps, a scaler followed by a tree.

The bug this notebook is tracking does not always appear, it has a better chance to happen with integer features but that's not always the case. The notebook must be run again in that case.

Data and first model

We take a random datasets with mostly integers.

Other models:

Conversion to ONNX

The pipeline shows huge discrepencies. They appear for a pipeline StandardScaler + DecisionTreeRegressor applied in integer features. They disappear if floats are used, or if the scaler is removed. The bug also disappear if the tree is not big enough (max_depth=4 instread of 5).

Other way to convert

ONNX does not support double for TreeEnsembleRegressor but that a new operator TreeEnsembleRegressorDouble was implemented into mlprodict. We need to update the conversion.

We see that the use of double removes the discrepencies.

OnnxPipeline

Another way to reduce the number of discrepencies is to use a pipeline which converts every steps into ONNX before training the next one. That way, every steps is either trained on the inputs, either trained on the outputs produced by ONNX. Let's see how it works.

We see that the first steps was replaced by an object OnnxTransformer which wraps an ONNX file into a transformer following the scikit-learn API. The initial steps are still available.

Training the next steps based on ONNX outputs is better. This is not completely satisfactory... Let's check the accuracy.

Pretty close.

Final explanation: StandardScalerFloat

We proposed two ways to have an ONNX pipeline which produces the same prediction as scikit-learn. Let's now replace the StandardScaler by a new one which outputs float and not double. It turns out that class StandardScaler computes X /= self.scale_ but ONNX does X *= self.scale_inv_. We need to implement this exact same operator with float32 to remove all discrepencies.

We need to register a new converter so that sklearn-onnx knows how to convert the new scaler. We reuse the existing converters.

That means than the differences between float32(X / Y) and float32(X) * float32(1 / Y) are big enough to select a different path in the decision tree. float32(X) / float32(Y) and float32(X) * float32(1 / Y) are also different enough to trigger a different path. Let's illustrate that on example:

The last random set shows very big differences, obviously big enough to trigger a different path in the graph. The difference for double could probably be significant in some cases, not enough on this example.

Change the conversion with option div

Option 'div' was added to the converter for StandardScaler to change the way the scaler is converted.

The ONNX graph is different and using division. Let's measure the discrepencies.

The only combination which works is the model converted with option div_cast (use of division in double precision), float input for ONNX, double input for scikit-learn.

Explanation in practice

Based on previous sections, the following example buids a case where discreprencies are significant.

We tried to cast float into double before applying the normalisation and to cast back into single float. It does not help much.

Last experiment, we try to use double all along.

onnxruntime does not support this. Let's switch to mlprodict.

Differences are lower if every operator is done with double.

Conclusion

Maybe the best option is just to introduce a transform which just cast inputs into floats.

It seems to work that way.