.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "gyexamples/plot_op_onnx_topk.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_gyexamples_plot_op_onnx_topk.py: .. _onnxtopkrst: TopK benchmark ============== This example compares :epkg:`onnxruntime` and :epkg:`mlprodict` for an implementation of operator `TopK `_. We measure two runtimes by computing a ratio between their time execution through the following kind of graphs. .. contents:: :local: Graph to compare performance ++++++++++++++++++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 19-38 .. code-block:: default from numpy.random import randn import numpy import matplotlib.pyplot as plt from pandas import DataFrame from onnxruntime import InferenceSession, __version__ as ort_version from tqdm import tqdm from cpyquickhelper.numbers import measure_time from pyquickhelper.pycode.profiling import profile from skl2onnx.algebra.onnx_ops import OnnxTopK_11 from skl2onnx.common.data_types import FloatTensorType from skl2onnx.algebra.onnx_ops import OnnxTopK from mlprodict.onnxrt.validate.validate_benchmark import benchmark_fct from mlprodict.onnxrt import OnnxInference from mlprodict.onnxrt.ops_cpu.op_topk import ( topk_sorted_implementation, topk_sorted_implementation_cpp) from mlprodict import __version__ as mlp_version from mlprodict.plotting.plotting import plot_benchmark_metrics .. GENERATED FROM PYTHON SOURCE LINES 39-40 Available optimisation on this machine. .. GENERATED FROM PYTHON SOURCE LINES 40-44 .. code-block:: default from mlprodict.testing.experimental_c_impl.experimental_c import code_optimisation print(code_optimisation()) .. rst-class:: sphx-glr-script-out .. code-block:: none AVX-omp=8 .. GENERATED FROM PYTHON SOURCE LINES 45-46 Graph. .. GENERATED FROM PYTHON SOURCE LINES 46-64 .. code-block:: default def plot_metric(metric, ax=None, xlabel="N", ylabel="k", middle=1., transpose=False, shrink=1.0, title=None): ax, cbar = plot_benchmark_metrics( metric, ax=ax, xlabel=xlabel, ylabel=ylabel, middle=middle, transpose=transpose, cbar_kw={'shrink': shrink}) if title is not None: ax.set_title(title) return ax data = {(1, 1): 0.1, (10, 1): 1, (1, 10): 2, (10, 10): 100, (100, 1): 100, (100, 10): 1000} fig, ax = plt.subplots(1, 2, figsize=(10, 4)) plot_metric(data, ax[0], shrink=0.6) .. image-sg:: /gyexamples/images/sphx_glr_plot_op_onnx_topk_001.png :alt: plot op onnx topk :srcset: /gyexamples/images/sphx_glr_plot_op_onnx_topk_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 66-70 .. code-block:: default plot_metric(data, ax[1], transpose=True) .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 71-77 TopK in ONNX ++++++++++++ The following lines creates an ONNX graph using one TopK ONNX node. The outcome is the ONNX graph converted into json. .. GENERATED FROM PYTHON SOURCE LINES 77-93 .. code-block:: default X32 = randn(100000, 100).astype(numpy.float32) node = OnnxTopK_11('X', numpy.array([5], dtype=numpy.int64), output_names=['dist', 'ind']) model_onnx = node.to_onnx( [('X', X32)], target_opset=12, # shape inference does not seem to work in onnxruntime # so we speccify the output shape outputs=[('dist', X32[:1, :5]), ('ind', X32[:1, :5].astype(numpy.int64))]) model_onnx .. rst-class:: sphx-glr-script-out .. code-block:: none ir_version: 6 producer_name: "skl2onnx" producer_version: "1.13.1" domain: "ai.onnx" model_version: 0 graph { node { input: "X" input: "To_TopKcst" output: "dist" output: "ind" name: "To_TopK" op_type: "TopK" domain: "" } name: "OnnxTopK_11" initializer { dims: 1 data_type: 7 int64_data: 5 name: "To_TopKcst" } input { name: "X" type { tensor_type { elem_type: 1 shape { dim { } dim { dim_value: 100 } } } } } output { name: "dist" type { tensor_type { elem_type: 1 shape { dim { } dim { dim_value: 5 } } } } } output { name: "ind" type { tensor_type { elem_type: 7 shape { dim { } dim { dim_value: 5 } } } } } } opset_import { domain: "" version: 11 } .. GENERATED FROM PYTHON SOURCE LINES 94-95 That gives... .. GENERATED FROM PYTHON SOURCE LINES 95-102 .. code-block:: default oinf = OnnxInference(model_onnx, runtime="python") res = oinf.run({'X': X32}) dist, ind = res['dist'], res['ind'] dist[:2], ind[:2] .. rst-class:: sphx-glr-script-out .. code-block:: none (array([[2.9598944, 2.1381311, 2.010453 , 1.9549123, 1.9296134], [2.168306 , 1.9842782, 1.9551469, 1.8612137, 1.6905841]], dtype=float32), array([[81, 7, 90, 49, 14], [48, 20, 77, 53, 93]])) .. GENERATED FROM PYTHON SOURCE LINES 103-104 With onnxruntime. .. GENERATED FROM PYTHON SOURCE LINES 104-111 .. code-block:: default sess = InferenceSession(model_onnx.SerializeToString()) dist, ind = sess.run(None, {'X': X32}) dist[:2], ind[:2] .. rst-class:: sphx-glr-script-out .. code-block:: none (array([[2.9598944, 2.1381311, 2.010453 , 1.9549123, 1.9296134], [2.168306 , 1.9842782, 1.9551469, 1.8612137, 1.6905841]], dtype=float32), array([[81, 7, 90, 49, 14], [48, 20, 77, 53, 93]], dtype=int64)) .. GENERATED FROM PYTHON SOURCE LINES 112-113 Let's compare two implementations. .. GENERATED FROM PYTHON SOURCE LINES 113-153 .. code-block:: default def benchmark(X, fct1, fct2, N, repeat=10, number=10): def ti(n): if n <= 1: return 50 if n <= 1000: return 2 if n <= 10000: return 0.51 return 0.11 # to warm up the engine time_kwargs = {n: dict(repeat=10, number=10) for n in N[:2]} benchmark_fct(fct1, X, time_kwargs=time_kwargs, skip_long_test=False) benchmark_fct(fct2, X, time_kwargs=time_kwargs, skip_long_test=False) # real measure time_kwargs = {n: dict(repeat=int(repeat * ti(n)), number=int(number * ti(n))) for n in N} res1 = benchmark_fct(fct1, X, time_kwargs=time_kwargs, skip_long_test=False) res2 = benchmark_fct(fct2, X, time_kwargs=time_kwargs, skip_long_test=False) res = {} for r in sorted(res1): r1 = res1[r] r2 = res2[r] ratio = r2['ttime'] / r1['ttime'] res[r] = ratio return res N = [1, 10, 100, 1000, 10000, 100000] res = benchmark(X32, lambda x: sess.run(None, {'X': x}), lambda x: oinf.run({'X': x}), N=N) res .. rst-class:: sphx-glr-script-out .. code-block:: none {1: 1.5083332913744698, 10: 1.4151957338951786, 100: 36.552590364056236, 1000: 14.071824402120802, 10000: 2.9472532147646975, 100000: 1.0841909476754} .. GENERATED FROM PYTHON SOURCE LINES 154-165 The implementation in `mlprodict `_ is faster when the number of rows grows. It is faster for 1 rows, for many rows, the implementation uses openmp to parallelize. C++ implementation vs numpy +++++++++++++++++++++++++++ :epkg:`scikit-learn` uses :epkg:`numpy` to compute the top *k* elements. .. GENERATED FROM PYTHON SOURCE LINES 165-172 .. code-block:: default res = benchmark(X32, lambda x: topk_sorted_implementation(x, 5, 1, 0), lambda x: topk_sorted_implementation_cpp(x, 5, 1, 0), N=N) res .. rst-class:: sphx-glr-script-out .. code-block:: none {1: 0.3022641439658741, 10: 0.3051468584177248, 100: 20.282515692671545, 1000: 2.5687795181097637, 10000: 0.3324244877649639, 100000: 0.12383379717969736} .. GENERATED FROM PYTHON SOURCE LINES 173-174 It seems to be faster too. Let's profile. .. GENERATED FROM PYTHON SOURCE LINES 174-181 .. code-block:: default xr = randn(1000000, 100) text = profile(lambda: topk_sorted_implementation(xr, 5, 1, 0), pyinst_format='text')[1] print(text) .. rst-class:: sphx-glr-script-out .. code-block:: none _ ._ __/__ _ _ _ _ _/_ Recorded: 05:37:59 AM Samples: 5 /_//_/// /_\ / //_// / //_'/ // Duration: 3.054 CPU time: 3.046 / _/ v4.4.0 Program: somewhere/workspace/mlprodict/mlprodict_UT_39_std/_doc/examples/plot_op_onnx_topk.py 3.053 profile ../pycode/profiling.py:455 `- 3.053 plot_op_onnx_topk.py:177 [15 frames hidden] plot_op_onnx_topk, mlprodict, <__arra... 3.053 topk_sorted_implementation mlprodict/onnxrt/ops_cpu/op_topk.py:18 .. GENERATED FROM PYTHON SOURCE LINES 182-187 Parallelisation +++++++++++++++ We need to disable the parallelisation to really compare both implementation. .. GENERATED FROM PYTHON SOURCE LINES 187-215 .. code-block:: default # In[11]: def benchmark_test(X, fct1, fct2, N, K, repeat=10, number=10): res = {} for k in tqdm(K): def f1(x, k=k): return fct1(x, k=k) def f2(x, k=k): return fct2(x, k=k) r = benchmark(X32, f1, f2, N=N, repeat=repeat, number=number) for n, v in r.items(): res[n, k] = v return res K = [1, 2, 5, 10, 15] N = [1, 2, 3, 10, 100, 1000, 10000] bench_para = benchmark_test( X32, (lambda x, k: topk_sorted_implementation_cpp( x, k=k, axis=1, largest=0, th_para=100000000)), (lambda x, k: topk_sorted_implementation_cpp( x, k=k, axis=1, largest=0, th_para=1)), N=N, K=K) bench_para .. rst-class:: sphx-glr-script-out .. code-block:: none 0%| | 0/5 [00:00 .. GENERATED FROM PYTHON SOURCE LINES 223-232 This is somehow expected. First column is closed to 1 as it is the same code being compared. Next columns are red, meaning the parallelisation does not help, the parallelisation helps for bigger N, as least more than 100. Parallellisation with ONNX ++++++++++++++++++++++++++ We replicate the same experiment with an ONNX graph. .. GENERATED FROM PYTHON SOURCE LINES 232-246 .. code-block:: default k_ = numpy.array([3], dtype=numpy.int64) node = OnnxTopK_11('X', 'k', output_names=['dist', 'ind']) model_onnx = node.to_onnx( [('X', X32), ('k', k_)], target_opset=12, # shape inference does not seem to work in onnxruntime # so we speccify the output shape outputs=[('dist', X32[:1, :5]), ('ind', X32[:1, :5].astype(numpy.int64))]) .. GENERATED FROM PYTHON SOURCE LINES 247-248 Test .. GENERATED FROM PYTHON SOURCE LINES 248-256 .. code-block:: default oinf_no_para = OnnxInference(model_onnx, runtime="python") res = oinf_no_para.run({'X': X32, 'k': k_}) dist, ind = res['dist'], res['ind'] dist[:2], ind[:2] .. rst-class:: sphx-glr-script-out .. code-block:: none (array([[2.9598944, 2.1381311, 2.010453 ], [2.168306 , 1.9842782, 1.9551469]], dtype=float32), array([[81, 7, 90], [48, 20, 77]])) .. GENERATED FROM PYTHON SOURCE LINES 257-258 Let's play with the thresholds triggering the parallelisation. .. GENERATED FROM PYTHON SOURCE LINES 258-264 .. code-block:: default oinf_para = OnnxInference(model_onnx, runtime="python") oinf_no_para.sequence_[0].ops_.th_para = 100000000 oinf_para.sequence_[0].ops_.th_para = 1 .. GENERATED FROM PYTHON SOURCE LINES 265-266 Results. .. GENERATED FROM PYTHON SOURCE LINES 266-277 .. code-block:: default bench_onnx_para = benchmark_test( X32, (lambda x, k: oinf_no_para.run( {'X': x, 'k': numpy.array([k], dtype=numpy.int64)})), (lambda x, k: oinf_para.run( {'X': x, 'k': numpy.array([k], dtype=numpy.int64)})), N=N, K=K) bench_onnx_para .. rst-class:: sphx-glr-script-out .. code-block:: none 0%| | 0/5 [00:00 .. GENERATED FROM PYTHON SOURCE LINES 286-290 Pretty much the same results. onnxruntime vs mlprodict (no parallelisation) +++++++++++++++++++++++++++++++++++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 290-302 .. code-block:: default sess = InferenceSession(model_onnx.SerializeToString()) bench_ort = benchmark_test( X32, (lambda x, k: sess.run( None, {'X': x, 'k': numpy.array([k], dtype=numpy.int64)})), (lambda x, k: oinf_no_para.run( {'X': x, 'k': numpy.array([k], dtype=numpy.int64)})), N=N, K=K) bench_ort .. rst-class:: sphx-glr-script-out .. code-block:: none 0%| | 0/5 [00:00 .. GENERATED FROM PYTHON SOURCE LINES 310-314 It seems the implementation of operator TopK in onnxruntime 1.1.1 can be improved. Versions: .. GENERATED FROM PYTHON SOURCE LINES 314-316 .. code-block:: default ort_version, mlp_version .. rst-class:: sphx-glr-script-out .. code-block:: none ('1.13.1', '0.9.1887') .. GENERATED FROM PYTHON SOURCE LINES 317-318 And with parallelisation above 50 rows. .. GENERATED FROM PYTHON SOURCE LINES 318-329 .. code-block:: default oinf_para.sequence_[0].ops_.th_para = 50 bench_ort_para = benchmark_test( X32, (lambda x, k: sess.run( None, {'X': x, 'k': numpy.array([k], dtype=numpy.int64)})), (lambda x, k: oinf_para.run( {'X': x, 'k': numpy.array([k], dtype=numpy.int64)})), N=N, K=K) bench_ort_para .. rst-class:: sphx-glr-script-out .. code-block:: none 0%| | 0/5 [00:00 .. GENERATED FROM PYTHON SOURCE LINES 338-348 onnxruntime and mlprodict implement the same algorithm. The only difference comes from the threshold which trigger the parallelisation. It should depend on the machine. That explains the difference in time for 100 observations. ############################# Interesting... Comparison with onnxruntime +++++++++++++++++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 348-367 .. code-block:: default X = numpy.array([ [0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], ], dtype=numpy.float32) K = numpy.array([3], dtype=numpy.int64) node = OnnxTopK('X', K, output_names=['values', 'indices'], op_version=12) onx = node.to_onnx([('X', FloatTensorType())]) py_topk = OnnxInference(onx, runtime="python_compiled") ort_topk = InferenceSession(onx.SerializeToString()) .. GENERATED FROM PYTHON SOURCE LINES 368-369 Check the outputs. .. GENERATED FROM PYTHON SOURCE LINES 369-375 .. code-block:: default r1 = py_topk.run({'X': X}) r1 .. rst-class:: sphx-glr-script-out .. code-block:: none {'values': array([[ 3., 2., 1.], [ 7., 6., 5.], [11., 10., 9.]], dtype=float32), 'indices': array([[3, 2, 1], [3, 2, 1], [3, 2, 1]])} .. GENERATED FROM PYTHON SOURCE LINES 377-382 .. code-block:: default r2 = ort_topk.run(None, {'X': X}) r2 .. rst-class:: sphx-glr-script-out .. code-block:: none [array([[ 3., 2., 1.], [ 7., 6., 5.], [11., 10., 9.]], dtype=float32), array([[3, 2, 1], [3, 2, 1], [3, 2, 1]], dtype=int64)] .. GENERATED FROM PYTHON SOURCE LINES 383-384 Some figures. .. GENERATED FROM PYTHON SOURCE LINES 384-391 .. code-block:: default bs = [] bs.append(measure_time(lambda: py_topk.run({'X': X}), context=globals(), div_by_number=True)) bs[-1]['c'] = 'py' bs[-1] .. rst-class:: sphx-glr-script-out .. code-block:: none {'average': 4.906759643927216e-05, 'deviation': 9.002059404027586e-07, 'min_exec': 4.850869998335838e-05, 'max_exec': 5.1643881015479564e-05, 'repeat': 10, 'number': 50, 'ttime': 0.0004906759643927216, 'context_size': 2272, 'c': 'py'} .. GENERATED FROM PYTHON SOURCE LINES 393-399 .. code-block:: default bs.append(measure_time(lambda: ort_topk.run(None, {'X': X}), context=globals(), div_by_number=True)) bs[-1]['c'] = 'or' bs[-1] .. rst-class:: sphx-glr-script-out .. code-block:: none {'average': 4.793880949728191e-05, 'deviation': 5.419752606740932e-07, 'min_exec': 4.753009881824255e-05, 'max_exec': 4.9395898822695014e-05, 'repeat': 10, 'number': 50, 'ttime': 0.0004793880949728191, 'context_size': 2272, 'c': 'or'} .. GENERATED FROM PYTHON SOURCE LINES 401-411 .. code-block:: default X = numpy.random.randn(10000, 100).astype(numpy.float32) bs.append(measure_time(lambda: py_topk.run({'X': X}), context=globals(), div_by_number=True)) bs[-1]['c'] = 'py-100' bs[-1] .. rst-class:: sphx-glr-script-out .. code-block:: none {'average': 0.007579522713785991, 'deviation': 0.00031576592226994387, 'min_exec': 0.006823993700090796, 'max_exec': 0.007875203518196941, 'repeat': 10, 'number': 50, 'ttime': 0.07579522713785991, 'context_size': 2272, 'c': 'py-100'} .. GENERATED FROM PYTHON SOURCE LINES 413-420 .. code-block:: default bs.append(measure_time(lambda: ort_topk.run(None, {'X': X}), context=globals(), div_by_number=True)) bs[-1]['c'] = 'ort-100' bs[-1] .. rst-class:: sphx-glr-script-out .. code-block:: none {'average': 0.0020190685361158103, 'deviation': 1.0210657449903279e-05, 'min_exec': 0.002008609820622951, 'max_exec': 0.0020338385622017084, 'repeat': 10, 'number': 50, 'ttime': 0.020190685361158103, 'context_size': 2272, 'c': 'ort-100'} .. GENERATED FROM PYTHON SOURCE LINES 422-425 .. code-block:: default df = DataFrame(bs) df .. raw:: html
average deviation min_exec max_exec repeat number ttime context_size c
0 0.000049 9.002059e-07 0.000049 0.000052 10 50 0.000491 2272 py
1 0.000048 5.419753e-07 0.000048 0.000049 10 50 0.000479 2272 or
2 0.007580 3.157659e-04 0.006824 0.007875 10 50 0.075795 2272 py-100
3 0.002019 1.021066e-05 0.002009 0.002034 10 50 0.020191 2272 ort-100


.. rst-class:: sphx-glr-timing **Total running time of the script:** ( 14 minutes 54.203 seconds) .. _sphx_glr_download_gyexamples_plot_op_onnx_topk.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_op_onnx_topk.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_op_onnx_topk.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_