.. _topkcpprst: ================== Fast TopK elements ================== .. only:: html **Links:** :download:`notebook `, :downloadlink:`html `, :download:`PDF `, :download:`python `, :downloadlink:`slides `, :githublink:`GitHub|_doc/notebooks/topk_cpp.ipynb|*` Looking for the top k elements is something needed to implement a simple k nearest neighbors. The implementation *scikit-learn* is using relies on *numpy*: `\_kneighbors_reduce_func `__. *mlprodict* also contains a C++ implementation of the same function. Let’s compare them. .. code:: ipython3 from jyquickhelper import add_notebook_menu add_notebook_menu() .. contents:: :local: .. code:: ipython3 %matplotlib inline Two implementations ------------------- We assume we are looking for the ``k`` nearest elements of every row of matrix ``X`` which is a dense matrix of doubles. .. code:: ipython3 import numpy.random as rnd from sklearn.neighbors._base import KNeighborsMixin mixin = KNeighborsMixin() .. code:: ipython3 def topk_sklearn(X, k): return mixin._kneighbors_reduce_func(X, 0, k, False) X = rnd.randn(5, 10) ind = topk_sklearn(X, 3) ind .. parsed-literal:: array([[2, 7, 3], [7, 0, 8], [1, 5, 6], [8, 9, 3], [4, 6, 5]], dtype=int64) Now the implementation with *mlprodict* (C++) available at `topk_element_min `__. It uses `heap `__. .. code:: ipython3 from mlprodict.onnxrt.ops_cpu._op_onnx_numpy import topk_element_min_double .. code:: ipython3 def topk_cpp(X, k): return topk_element_min_double(X, k, True, 50) ind = topk_cpp(X, 3) ind .. parsed-literal:: array([[2, 7, 3], [7, 0, 8], [1, 5, 6], [8, 9, 3], [4, 6, 5]], dtype=int64) Speed comparison by size ------------------------ .. code:: ipython3 %timeit topk_sklearn(X, 3) .. parsed-literal:: 21.7 µs ± 4.19 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each) .. code:: ipython3 %timeit topk_cpp(X, 3) .. parsed-literal:: 4.1 µs ± 435 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) Quite a lot faster on this simple example. Let’s look for bigger matrices. .. code:: ipython3 X = rnd.randn(1000, 100) .. code:: ipython3 %timeit topk_sklearn(X, 10) .. parsed-literal:: 1.8 ms ± 102 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) .. code:: ipython3 %timeit topk_cpp(X, 10) .. parsed-literal:: 786 µs ± 116 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) .. code:: ipython3 from cpyquickhelper.numbers import measure_time from tqdm import tqdm from pandas import DataFrame rows = [] for n in tqdm(range(1000, 10001, 1000)): X = rnd.randn(n, 1000) res = measure_time('topk_sklearn(X, 20)', {'X': X, 'topk_sklearn': topk_sklearn}, div_by_number=True, number=2, repeat=2) res["N"] = n res["name"] = 'topk_sklearn' rows.append(res) res = measure_time('topk_cpp(X, 20)', {'X': X, 'topk_cpp': topk_cpp}, div_by_number=True, number=4, repeat=4) res["N"] = n res["name"] = 'topk_cpp' rows.append(res) df = DataFrame(rows) df.head() .. parsed-literal:: 100%|██████████| 10/10 [00:08<00:00, 1.16it/s] .. raw:: html
average deviation min_exec max_exec repeat number context_size N name
0 0.016310 0.000260 0.016050 0.016571 2 2 240 1000 topk_sklearn
1 0.003872 0.000501 0.003335 0.004631 4 4 240 1000 topk_cpp
2 0.034684 0.001629 0.033055 0.036313 2 2 240 2000 topk_sklearn
3 0.006973 0.000558 0.006307 0.007756 4 4 240 2000 topk_cpp
4 0.051934 0.000851 0.051084 0.052785 2 2 240 3000 topk_sklearn
.. code:: ipython3 import matplotlib.pyplot as plt fig, ax = plt.subplots(1, 2, figsize=(10, 4)) piv = df.pivot("N", "name", "average") piv.plot(ax=ax[0], logy=True, logx=True) ax[0].set_title("top 20") piv["ratio"] = piv["topk_cpp"] / piv["topk_sklearn"] piv[["ratio"]].plot(ax=ax[1]) ax[1].set_title("ratio"); .. image:: topk_cpp_17_0.png Speed comparison by k --------------------- .. code:: ipython3 rows = [] X = rnd.randn(2000, 1000) for k in tqdm(list(range(1, 20)) + list(range(20, 1000, 20))): res = measure_time('topk_sklearn(X, k)', {'X': X, 'topk_sklearn': topk_sklearn, 'k': k}, div_by_number=True, number=2, repeat=2) res["k"] = k res["name"] = 'topk_sklearn' rows.append(res) res = measure_time('topk_cpp(X, k)', {'X': X, 'topk_cpp': topk_cpp, 'k': k}, div_by_number=True, number=2, repeat=2) res["k"] = k res["name"] = 'topk_cpp' rows.append(res) df = DataFrame(rows) df.head() .. parsed-literal:: 100%|██████████| 68/68 [00:34<00:00, 1.95it/s] .. raw:: html
average deviation min_exec max_exec repeat number context_size k name
0 0.009558 0.001392 0.008166 0.010949 2 2 240 1 topk_sklearn
1 0.002665 0.000571 0.002094 0.003236 2 2 240 1 topk_cpp
2 0.021933 0.000575 0.021358 0.022508 2 2 240 2 topk_sklearn
3 0.001000 0.000084 0.000917 0.001084 2 2 240 2 topk_cpp
4 0.025986 0.001411 0.024575 0.027398 2 2 240 3 topk_sklearn
.. code:: ipython3 fig, ax = plt.subplots(1, 2, figsize=(10, 4)) piv = df.pivot("k", "name", "average") piv.plot(ax=ax[0], logy=True, logx=True) ax[0].set_title("top k for 2000 rows") piv["ratio"] = piv["topk_cpp"] / piv["topk_sklearn"] piv[["ratio"]].plot(ax=ax[1]) ax[1].set_title("ratio") ax[0].set_xlabel("k") ax[1].set_xlabel("k"); .. image:: topk_cpp_20_0.png The implementation is half faster in all cases and much more efficient for small values which is usually the case for the nearest neighbors. This implementation is using *openmp*, maybe that’s why it gets 50% faster on this two cores machine.