Tricky detail when converting a random forest from scikit-learn into ONNX

scikit-learn use a specific comparison when computing the preduction of a decision tree, it does (float)x <= threshold (see tree.pyx / method apply_dense). ONNX does not specify such things and compares x to threshold, both having the same type. What to do then when writing the converter.

Conversion to float

Region where (float)x <= y

Let's see how the comparison (float)x <= y looks like.

Equivalent to (float)x <= (float)y ?

Applied to a decision tree, it does not mean that the evaluation of the condition of each node would fail in 5.75% of the cases, it depends on how the thresholds are built and the area of errors depends on the numbers.

Good threshold

Let's draw the function:

That's explain some tricky lines in package skl2onnx. Let's check if it still works with negative value.

It works.

What about double double?

The probability it fails is lower than for floats but still significant.

Let's fix it in a similar way. Let's first define a function which finds the split double which defines the border between doubles, below the are rounded to one float, above it, they are rounded to another float. And it is not always to middle of it.

All doubles equivalent to the same float

We can use the previous code to determine a double interval in which every double is converted into the same float.

Verification

Let's check the rules works for many random x.