Fast TopK elements

Looking for the top k elements is something needed to implement a simple k nearest neighbors. The implementation scikit-learn is using relies on numpy: _kneighbors_reduce_func. mlprodict also contains a C++ implementation of the same function. Let's compare them.

Two implementations

We assume we are looking for the k nearest elements of every row of matrix X which is a dense matrix of doubles.

Now the implementation with mlprodict (C++) available at topk_element_min. It uses heap).

Speed comparison by size

Quite a lot faster on this simple example. Let's look for bigger matrices.

Speed comparison by k

The implementation is half faster in all cases and much more efficient for small values which is usually the case for the nearest neighbors. This implementation is using openmp, maybe that's why it gets 50% faster on this two cores machine.