.. _onnxnodetimerst: =============================================== Time processing for every ONNX nodes in a graph =============================================== .. only:: html **Links:** :download:`notebook `, :downloadlink:`html `, :download:`PDF `, :download:`python `, :downloadlink:`slides `, :githublink:`GitHub|_doc/notebooks/onnx_node_time.ipynb|*` The following notebook show how long the runtime spends in each node of an ONNX graph. .. code:: ipython3 from jyquickhelper import add_notebook_menu add_notebook_menu() .. contents:: :local: .. code:: ipython3 %load_ext mlprodict .. code:: ipython3 %matplotlib inline LogisticRegression ------------------ .. code:: ipython3 from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression iris = load_iris() X, y = iris.data, iris.target X_train, X_test, y_train, y_test = train_test_split(X, y) clr = LogisticRegression(solver='liblinear') clr.fit(X_train, y_train) .. parsed-literal:: LogisticRegression(solver='liblinear') .. code:: ipython3 import numpy from mlprodict.onnx_conv import to_onnx onx = to_onnx(clr, X_test.astype(numpy.float32)) with open("logreg_time.onnx", "wb") as f: f.write(onx.SerializeToString()) # add -l 1 if nothing shows up %onnxview onx .. raw:: html
.. code:: ipython3 from mlprodict.onnxrt import OnnxInference import pandas oinf = OnnxInference(onx) res = oinf.run({'X': X_test}, node_time=True) pandas.DataFrame(list(res[1])) .. raw:: html
i name op_type time
0 0 LinearClassifier LinearClassifier 0.199603
1 1 Normalizer Normalizer 0.000091
2 2 Cast Cast 0.000014
3 3 ZipMap ZipMap 0.000016
.. code:: ipython3 oinf.run({'X': X_test})['output_probability'][:5] .. parsed-literal:: {0: array([8.38235830e-01, 1.21554664e-03, 6.97352537e-04, 7.93823160e-01, 9.24825077e-01]), 1: array([0.16162989, 0.39692812, 0.25688601, 0.20607722, 0.07516498]), 2: array([1.34279470e-04, 6.01856333e-01, 7.42416637e-01, 9.96200831e-05, 9.94208860e-06])} Measure time spent in each node ------------------------------- With parameter ``node_time=True``, method *run* returns the output and time measurement. .. code:: ipython3 exe = oinf.run({'X': X_test}, node_time=True) exe[1] .. parsed-literal:: [{'i': 0, 'name': 'LinearClassifier', 'op_type': 'LinearClassifier', 'time': 0.00015699999999974068}, {'i': 1, 'name': 'Normalizer', 'op_type': 'Normalizer', 'time': 5.43000000003957e-05}, {'i': 2, 'name': 'Cast', 'op_type': 'Cast', 'time': 1.1699999999947863e-05}, {'i': 3, 'name': 'ZipMap', 'op_type': 'ZipMap', 'time': 1.940000000111297e-05}] .. code:: ipython3 import pandas pandas.DataFrame(exe[1]) .. raw:: html
i name op_type time
0 0 LinearClassifier LinearClassifier 0.000157
1 1 Normalizer Normalizer 0.000054
2 2 Cast Cast 0.000012
3 3 ZipMap ZipMap 0.000019
Logistic regression: python runtime vs onnxruntime -------------------------------------------------- Function `enumerate_validated_operator_opsets `__ implements automated tests for every model with artificial data. Option ``node_time`` automatically returns the time spent in each node and does it multiple time. .. code:: ipython3 from mlprodict.onnxrt.validate import enumerate_validated_operator_opsets res = list(enumerate_validated_operator_opsets( verbose=0, models={"LogisticRegression"}, opset_min=12, runtime='python', debug=False, node_time=True, filter_exp=lambda m, p: p == "b-cl")) .. parsed-literal:: C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) .. code:: ipython3 import pandas df = pandas.DataFrame(res[0]['bench-batch']) df['step'] = df.apply(lambda row: '{}-{}'.format(row['i'], row["name"]), axis=1) df .. raw:: html
i name op_type time N max_time min_time repeat number step
0 0 LinearClassifier LinearClassifier 0.000018 1 0.000033 0.000015 20 30 0-LinearClassifier
1 1 Normalizer Normalizer 0.000017 1 0.000069 0.000012 20 30 1-Normalizer
2 2 Cast Cast 0.000004 1 0.000009 0.000003 20 30 2-Cast
3 3 ZipMap ZipMap 0.000005 1 0.000007 0.000004 20 30 3-ZipMap
4 0 LinearClassifier LinearClassifier 0.000020 10 0.000052 0.000017 20 20 0-LinearClassifier
5 1 Normalizer Normalizer 0.000015 10 0.000035 0.000013 20 20 1-Normalizer
6 2 Cast Cast 0.000004 10 0.000013 0.000003 20 20 2-Cast
7 3 ZipMap ZipMap 0.000004 10 0.000008 0.000004 20 20 3-ZipMap
8 0 LinearClassifier LinearClassifier 0.000024 100 0.000036 0.000019 10 8 0-LinearClassifier
9 1 Normalizer Normalizer 0.000018 100 0.000023 0.000015 10 8 1-Normalizer
10 2 Cast Cast 0.000004 100 0.000006 0.000003 10 8 2-Cast
11 3 ZipMap ZipMap 0.000007 100 0.000005 0.000004 10 8 3-ZipMap
12 0 LinearClassifier LinearClassifier 0.000051 1000 0.000057 0.000047 5 5 0-LinearClassifier
13 1 Normalizer Normalizer 0.000041 1000 0.000045 0.000040 5 5 1-Normalizer
14 2 Cast Cast 0.000003 1000 0.000004 0.000003 5 5 2-Cast
15 3 ZipMap ZipMap 0.000004 1000 0.000004 0.000004 5 5 3-ZipMap
16 0 LinearClassifier LinearClassifier 0.000315 10000 0.000328 0.000315 3 3 0-LinearClassifier
17 1 Normalizer Normalizer 0.000272 10000 0.000284 0.000256 3 3 1-Normalizer
18 2 Cast Cast 0.000004 10000 0.000004 0.000004 3 3 2-Cast
19 3 ZipMap ZipMap 0.000004 10000 0.000004 0.000004 3 3 3-ZipMap
20 0 LinearClassifier LinearClassifier 0.005634 100000 0.005634 0.005634 1 2 0-LinearClassifier
21 1 Normalizer Normalizer 0.004671 100000 0.004671 0.004671 1 2 1-Normalizer
22 2 Cast Cast 0.000024 100000 0.000024 0.000024 1 2 2-Cast
23 3 ZipMap ZipMap 0.000013 100000 0.000013 0.000013 1 2 3-ZipMap
Following tables shows the time spent in each node, it is relative to the total time. For one observation, the runtime spends 10% of the time in ZipMap, it is only 1% or 2% with 10 observations. These proportions change due to the computing cost of each node. .. code:: ipython3 piv = df.pivot('step', 'N', 'time') total = piv.sum(axis=0) piv / total .. raw:: html
N 1 10 100 1000 10000 100000
step
0-LinearClassifier 0.410138 0.459103 0.450882 0.512622 0.530490 0.544785
1-Normalizer 0.390060 0.353622 0.350126 0.414227 0.456671 0.451642
2-Cast 0.095729 0.089857 0.074343 0.034398 0.006092 0.002306
3-ZipMap 0.104073 0.097418 0.124649 0.038753 0.006747 0.001267
The python implementation of *ZipMap* does not change the data but wraps in into a frozen class `ArrayZipMapDitionary `__ which mocks a list of dictionaries *pandas* can ingest to create a DataFrame. The cost is a fixed cost and does not depend on the number of processed rows. .. code:: ipython3 from pyquickhelper.pycode.profiling import profile bigX = numpy.random.randn(100000, X_test.shape[1]).astype(numpy.float32) print(profile(lambda: oinf.run({'X': bigX}), pyinst_format="text")[1]) .. parsed-literal:: _ ._ __/__ _ _ _ _ _/_ Recorded: 00:28:08 Samples: 4 /_//_/// /_\ / //_// / //_'/ // Duration: 0.009 CPU time: 0.031 / _/ v3.0.1 Program: c:\python372_x64\lib\site-packages\ipykernel_launcher.py -f C:\Users\xavie\AppData\Roaming\jupyter\runtime\kernel-287476aa-b8ba-4140-902a-b0aad833ffd0.json 0.008 profile pyquickhelper\pycode\profiling.py:49 `- 0.008 :3 `- 0.008 run mlprodict\onnxrt\onnx_inference.py:475 `- 0.008 _run_sequence_runtime mlprodict\onnxrt\onnx_inference.py:558 `- 0.008 run mlprodict\onnxrt\onnx_inference_node.py:141 |- 0.005 run mlprodict\onnxrt\ops_cpu\_op.py:417 | `- 0.005 run mlprodict\onnxrt\ops_cpu\_op.py:298 | `- 0.005 _run mlprodict\onnxrt\ops_cpu\op_linear_classifier.py:40 | |- 0.003 [self] | `- 0.002 argmax <__array_function__ internals>:2 | `- 0.002 argmax numpy\core\fromnumeric.py:1112 | [3 frames hidden] numpy | 0.002 _wrapfunc numpy\core\fromnumeric.py:55 `- 0.003 run mlprodict\onnxrt\ops_cpu\_op.py:383 `- 0.003 run mlprodict\onnxrt\ops_cpu\_op.py:298 `- 0.003 _run mlprodict\onnxrt\ops_cpu\op_normalizer.py:66 `- 0.003 norm_l1 mlprodict\onnxrt\ops_cpu\op_normalizer.py:42 `- 0.003 _norm_L1_inplace mlprodict\onnxrt\ops_cpu\op_normalizer.py:49 |- 0.002 [self] `- 0.002 _sum numpy\core\_methods.py:36 [2 frames hidden] numpy The class *ArrayZipMapDictionary* is fast to build but has an overhead after that because it builds data when needed. .. code:: ipython3 res = oinf.run({'X': bigX}) prob = res['output_probability'] type(prob) .. parsed-literal:: mlprodict.onnxrt.ops_cpu.op_zipmap.ArrayZipMapDictionary .. code:: ipython3 %timeit pandas.DataFrame(prob) .. parsed-literal:: 721 ms ± 54.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) .. code:: ipython3 list_of_dict = [v.asdict() for v in prob] %timeit pandas.DataFrame(list_of_dict) .. parsed-literal:: 108 ms ± 2.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) But if you just need to do the following: .. code:: ipython3 %timeit pandas.DataFrame(prob).values .. parsed-literal:: 713 ms ± 56.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) Then, you can just do that: .. code:: ipython3 print(prob.columns) %timeit prob.values .. parsed-literal:: [0, 1, 2] 390 ns ± 51.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) And then: .. code:: ipython3 %timeit -n 100 pandas.DataFrame(prob.values, columns=prob.columns) .. parsed-literal:: 215 µs ± 82.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) We can then compare to what *onnxruntime* would do when the runtime is called indenpently for each node. We use the runtime named `onnxruntime2 `__. Class *OnnxInference* splits the ONNX graph into multiple ONNX graphs, one for each node, and then calls *onnxruntime* for each of them indenpently. *Python* handles the graph logic. .. code:: ipython3 res = list(enumerate_validated_operator_opsets( verbose=0, models={"LogisticRegression"}, opset_min=12, runtime='onnxruntime2', debug=False, node_time=True)) .. parsed-literal:: C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) .. code:: ipython3 res0 = None for i, r in enumerate(res): if "available-ERROR" in r: print(i, str(r['available-ERROR']).split("\n")[0]) elif res0 is None: res0 = r .. parsed-literal:: 0 Unable to load node 'ZipMap' (output type was inferred) 1 Unable to load node 'ZipMap' (output type was inferred) 4 Unable to load node 'LinearClassifier' (output type was guessed) 5 Unable to load node 'LinearClassifier' (output type was guessed) 6 Unable to load node 'LinearClassifier' (output type was guessed) 7 Unable to load node 'LinearClassifier' (output type was guessed) 8 Unable to load node 'ZipMap' (output type was inferred) 9 Unable to load node 'ZipMap' (output type was inferred) .. code:: ipython3 if '_6ort_run_batch_exc' in res[0]: m = "Something went wrong.", res[0]['_6ort_run_batch_exc'] else: df = pandas.DataFrame(res0['bench-batch']) print(df) df['step'] = df.apply(lambda row: '{}-{}'.format(row['i'], row["name"]), axis=1) piv = df.pivot('step', 'N', 'time') total = piv.sum(axis=0) m = piv / total m .. parsed-literal:: i name op_type time N max_time \ 0 0 LinearClassifier LinearClassifier 0.000052 1 0.000190 1 0 LinearClassifier LinearClassifier 0.000044 10 0.000070 2 0 LinearClassifier LinearClassifier 0.000071 100 0.000133 3 0 LinearClassifier LinearClassifier 0.000066 1000 0.000079 4 0 LinearClassifier LinearClassifier 0.000409 10000 0.000408 5 0 LinearClassifier LinearClassifier 0.003275 100000 0.003275 min_time repeat number 0 0.000028 20 30 1 0.000031 20 20 2 0.000046 10 8 3 0.000057 5 5 4 0.000365 3 3 5 0.003275 1 2 .. raw:: html
N 1 10 100 1000 10000 100000
step
0-LinearClassifier 1.0 1.0 1.0 1.0 1.0 1.0
*onnxruntime* creates a new container each time a ZipMap is executed. That’s whay it takes that much time and the ratio increases when the number of observations increases. GaussianProcessRegressor ------------------------ This operator is slow for small batches compare to scikit-learn but closes the gap as the batch size increases. Let’s see where the time goes. .. code:: ipython3 from onnx.defs import onnx_opset_version from mlprodict.tools.asv_options_helper import get_opset_number_from_onnx onnx_opset_version(), get_opset_number_from_onnx() .. parsed-literal:: (12, 12) .. code:: ipython3 res = list(enumerate_validated_operator_opsets( verbose=1, models={"GaussianProcessRegressor"}, opset_min=get_opset_number_from_onnx(), opset_max=get_opset_number_from_onnx(), runtime='python', debug=False, node_time=True, filter_exp=lambda m, p: p == "b-reg")) .. parsed-literal:: [enumerate_validated_operator_opsets] opset in [12, 12]. .. parsed-literal:: GaussianProcessRegressor : 0%| | 0/1 [00:00
i name op_type time N max_time min_time repeat number step
0 0 kgpd_CDist CDist 0.000033 1 0.000045 0.000027 20 30 00-kgpd_CDist
1 1 kgpd_Div Div 0.000009 1 0.000016 0.000007 20 30 01-kgpd_Div
2 2 kgpd_Mul Mul 0.000006 1 0.000007 0.000005 20 30 02-kgpd_Mul
3 3 kgpd_Sin Sin 0.000007 1 0.000009 0.000006 20 30 03-kgpd_Sin
4 4 kgpd_Div1 Div 0.000007 1 0.000008 0.000005 20 30 04-kgpd_Div1
.. code:: ipython3 pivpy = df.pivot('step', 'N', 'time') total = pivpy.sum(axis=0) pivpy / total .. raw:: html
N 1 10 100 1000 10000 100000
step
00-kgpd_CDist 0.311496 0.300665 0.244035 0.227984 0.264447 0.288546
01-kgpd_Div 0.082535 0.067193 0.028348 0.011667 0.012230 0.015447
02-kgpd_Mul 0.059840 0.050664 0.018670 0.006959 0.010950 0.012468
03-kgpd_Sin 0.067037 0.086529 0.106165 0.113068 0.102102 0.107563
04-kgpd_Div1 0.061852 0.053088 0.025749 0.010935 0.009810 0.009875
05-kgpd_Pow 0.072520 0.166539 0.361318 0.438253 0.418169 0.404182
06-kgpd_Mul1 0.057508 0.050477 0.020386 0.010334 0.009079 0.010466
07-kgpd_Exp 0.067885 0.098850 0.150876 0.168079 0.165177 0.145106
08-gpr_MatMul 0.137546 0.064570 0.025069 0.009029 0.007134 0.006159
09-gpr_Add 0.081782 0.061424 0.019383 0.003692 0.000903 0.000190
.. code:: ipython3 ax = (pivpy / total).T.plot(logx=True, figsize=(14, 4)) ax.set_ylim([0,1]) ax.set_title("Time spent in each node relatively to the total time\npython runtime"); .. image:: onnx_node_time_40_0.png The operator *Scan* is clearly time consuming when the batch size is small. *onnxruntime* is more efficient for this one. .. code:: ipython3 res = list(enumerate_validated_operator_opsets( verbose=1, models={"GaussianProcessRegressor"}, opset_min=get_opset_number_from_onnx(), opset_max=get_opset_number_from_onnx(), runtime='onnxruntime2', debug=False, node_time=True, filter_exp=lambda m, p: p == "b-reg")) .. parsed-literal:: [enumerate_validated_operator_opsets] opset in [12, 12]. .. parsed-literal:: GaussianProcessRegressor : 0%| | 0/1 [00:00
N 1 10 100 1000 10000 100000
step
00-kgpd_CDist 0.114001 0.120884 0.128845 0.148042 0.156776 0.180377
01-kgpd_Div 0.101792 0.098622 0.085983 0.085108 0.086603 0.084520
02-kgpd_Mul 0.099980 0.097547 0.084001 0.064706 0.072849 0.081023
03-kgpd_Sin 0.089632 0.103505 0.194194 0.301002 0.245769 0.260717
04-kgpd_Div1 0.099119 0.096737 0.088709 0.063237 0.095840 0.091635
05-kgpd_Pow 0.108045 0.098307 0.081161 0.064898 0.079015 0.076962
06-kgpd_Mul1 0.098561 0.098475 0.082770 0.063557 0.087732 0.076762
07-kgpd_Exp 0.090019 0.087015 0.086282 0.088542 0.103798 0.087690
08-gpr_MatMul 0.100426 0.102766 0.106220 0.102751 0.069157 0.059617
09-gpr_Add 0.098425 0.096143 0.061836 0.018155 0.002462 0.000696
.. code:: ipython3 if r is not None: ax = (pivort / total).T.plot(logx=True, figsize=(14, 4)) ax.set_ylim([0,1]) ax.set_title("Time spent in each node relatively to the total time\nonnxtunime"); .. image:: onnx_node_time_44_0.png The results are relative. Let’s see which runtime is best node by node. .. code:: ipython3 if r is not None: r = (pivort - pivpy) / pivpy r .. raw:: html
N 1 10 100 1000 10000 100000
step
00-kgpd_CDist -0.239113 -0.367743 -0.630420 -0.703226 -0.677041 -0.631775
01-kgpd_Div 1.564119 1.308106 1.123155 2.333824 2.857590 2.223117
02-kgpd_Mul 2.473648 2.027773 2.149388 3.249448 2.624165 2.828025
03-kgpd_Sin 1.779780 0.881090 0.280401 0.216680 0.311288 0.427768
04-kgpd_Div1 2.331710 1.865534 1.411570 1.642922 4.321977 4.466249
05-kgpd_Pow 2.097502 -0.071724 -0.842765 -0.932321 -0.897065 -0.887837
06-kgpd_Mul1 2.563218 2.067909 1.842112 1.811019 4.263897 3.320524
07-kgpd_Exp 1.756911 0.384288 -0.599693 -0.759241 -0.657668 -0.644029
08-gpr_MatMul 0.517953 1.502798 1.965953 4.201281 4.281286 4.701255
09-gpr_Add 1.502111 1.461452 1.233097 1.247736 0.486358 1.160395
Based on this, *onnxruntime* is faster for operators *Scan*, *Pow*, *Exp* and slower for all the others. Measuring the time with a custom dataset ---------------------------------------- We use the example `Comparison of kernel ridge and Gaussian process regression `__. .. code:: ipython3 import numpy import pandas import matplotlib.pyplot as plt from sklearn.kernel_ridge import KernelRidge from sklearn.model_selection import GridSearchCV from sklearn.gaussian_process import GaussianProcessRegressor from sklearn.gaussian_process.kernels import WhiteKernel, ExpSineSquared rng = numpy.random.RandomState(0) # Generate sample data X = 15 * rng.rand(100, 1) y = numpy.sin(X).ravel() y += 3 * (0.5 - rng.rand(X.shape[0])) # add noise gp_kernel = ExpSineSquared(1.0, 5.0, periodicity_bounds=(1e-2, 1e1)) gpr = GaussianProcessRegressor(kernel=gp_kernel) gpr.fit(X, y) .. parsed-literal:: C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\gaussian_process\kernels.py:409: ConvergenceWarning: The optimal value found for dimension 0 of parameter length_scale is close to the specified lower bound 1e-05. Decreasing the bound and calling fit again may find a better value. ConvergenceWarning) C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\gaussian_process\kernels.py:418: ConvergenceWarning: The optimal value found for dimension 0 of parameter periodicity is close to the specified upper bound 10.0. Increasing the bound and calling fit again may find a better value. ConvergenceWarning) .. parsed-literal:: GaussianProcessRegressor(kernel=ExpSineSquared(length_scale=1, periodicity=5)) .. code:: ipython3 onx = to_onnx(gpr, X_test.astype(numpy.float64)) with open("gpr_time.onnx", "wb") as f: f.write(onx.SerializeToString()) %onnxview onx -r 1 .. raw:: html
.. code:: ipython3 from mlprodict.tools import get_ir_version_from_onnx onx.ir_version = get_ir_version_from_onnx() .. code:: ipython3 oinfpy = OnnxInference(onx, runtime="python") oinfort = OnnxInference(onx, runtime="onnxruntime2") ``runtime==onnxruntime2`` tells the class ``OnnxInference`` to use *onnxruntime* for every node independently, there are as many calls as there are nodes in the graph. .. code:: ipython3 respy = oinfpy.run({'X': X_test}, node_time=True) try: resort = oinfort.run({'X': X_test}, node_time=True) except Exception as e: print(e) resort = None .. code:: ipython3 if resort is not None: df = pandas.DataFrame(respy[1]).merge(pandas.DataFrame(resort[1]), on=["i", "name", "op_type"], suffixes=("_py", "_ort")) df['delta'] = df.time_ort - df.time_py else: df = None df .. raw:: html
i name op_type time_py time_ort delta
0 0 Sc_Scan Scan 0.007998 0.005970 -0.002028
1 1 kgpd_Transpose Transpose 0.000032 0.000599 0.000567
2 2 kgpd_Sqrt Sqrt 0.000063 0.000112 0.000049
3 3 kgpd_Div Div 0.000143 0.000097 -0.000045
4 4 kgpd_Mul Mul 0.000038 0.000321 0.000283
5 5 kgpd_Sin Sin 0.000095 0.000146 0.000051
6 6 kgpd_Div1 Div 0.000027 0.000096 0.000069
7 7 kgpd_Pow Pow 0.000299 0.000104 -0.000196
8 8 kgpd_Mul1 Mul 0.000032 0.000097 0.000065
9 9 kgpd_Exp Exp 0.000383 0.000111 -0.000271
10 10 gpr_MatMul MatMul 0.000080 0.004359 0.004279
11 11 gpr_Add Add 0.000034 0.000165 0.000131
The following function runs multiple the same inference and aggregates the results node by node. .. code:: ipython3 from mlprodict.onnxrt.validate.validate import benchmark_fct res = benchmark_fct(lambda X: oinfpy.run({'X': X_test}, node_time=True), X_test, node_time=True) .. code:: ipython3 df = pandas.DataFrame(res) df[df.N == 100] .. raw:: html
i name op_type time N max_time min_time repeat number
24 0 Sc_Scan Scan 0.004154 100 0.004330 0.003843 10 8
25 1 kgpd_Transpose Transpose 0.000013 100 0.000019 0.000010 10 8
26 2 kgpd_Sqrt Sqrt 0.000018 100 0.000022 0.000015 10 8
27 3 kgpd_Div Div 0.000025 100 0.000092 0.000015 10 8
28 4 kgpd_Mul Mul 0.000012 100 0.000019 0.000009 10 8
29 5 kgpd_Sin Sin 0.000057 100 0.000070 0.000050 10 8
30 6 kgpd_Div1 Div 0.000014 100 0.000017 0.000011 10 8
31 7 kgpd_Pow Pow 0.000172 100 0.000198 0.000155 10 8
32 8 kgpd_Mul1 Mul 0.000020 100 0.000101 0.000009 10 8
33 9 kgpd_Exp Exp 0.000213 100 0.000249 0.000193 10 8
34 10 gpr_MatMul MatMul 0.000034 100 0.000047 0.000026 10 8
35 11 gpr_Add Add 0.000013 100 0.000019 0.000011 10 8
.. code:: ipython3 df100 = df[df.N == 100] .. code:: ipython3 %matplotlib inline .. code:: ipython3 fig, ax = plt.subplots(1, 1, figsize=(14, 4)) ax.bar(df100.i, df100.time, align='center', color='orange') ax.set_xticks(df100.i) ax.set_yscale('log') ax.set_xticklabels(df100.op_type) ax.errorbar(df100.i, df100.time, numpy.abs(df100[["min_time", "max_time"]].T.values - df100.time.values.ravel()), uplims=True, lolims=True, color='blue') ax.set_title("Time spent in each node for 100 observations\nGaussianProcess"); .. image:: onnx_node_time_61_0.png .. code:: ipython3 df100c = df100.cumsum() .. code:: ipython3 fig, ax = plt.subplots(1, 1, figsize=(14, 4)) ax.bar(df100.i, df100c.time, align='center', color='orange') ax.set_xticks(df100.i) #ax.set_yscale('log') ax.set_ylim([df100c.min_time.min(), df100c.max_time.max()]) ax.set_xticklabels(df100.op_type) ax.errorbar(df100.i, df100c.time, numpy.abs((df100c[["min_time", "max_time"]].T.values - df100c.time.values.ravel())), uplims=True, lolims=True) ax.set_title("Cumulated time spent in each node for 100 observations\nGaussianProcess"); .. image:: onnx_node_time_63_0.png onnxruntime2 / onnxruntime1 --------------------------- The runtime ``onnxruntime1`` uses *onnxruntime* for the whole ONNX graph. There is no way to get the computation time for each node except if we create a ONNX graph for each intermediate node. .. code:: ipython3 oinfort1 = OnnxInference(onx, runtime='onnxruntime1') .. code:: ipython3 split = oinfort1.build_intermediate() split .. parsed-literal:: OrderedDict([('scan0', OnnxInference(...)), ('scan1', OnnxInference(...)), ('kgpd_transposed0', OnnxInference(...)), ('kgpd_Y0', OnnxInference(...)), ('kgpd_C03', OnnxInference(...)), ('kgpd_C02', OnnxInference(...)), ('kgpd_output02', OnnxInference(...)), ('kgpd_C01', OnnxInference(...)), ('kgpd_Z0', OnnxInference(...)), ('kgpd_C0', OnnxInference(...)), ('kgpd_output01', OnnxInference(...)), ('gpr_Y0', OnnxInference(...)), ('GPmean', OnnxInference(...))]) .. code:: ipython3 dfs = [] for k, v in split.items(): print("node", k) res = benchmark_fct(lambda x: v.run({'X': x}), X_test) df = pandas.DataFrame(res) df['name'] = k dfs.append(df.reset_index(drop=False)) .. parsed-literal:: node scan0 node scan1 node kgpd_transposed0 node kgpd_Y0 node kgpd_C03 node kgpd_C02 node kgpd_output02 node kgpd_C01 node kgpd_Z0 node kgpd_C0 node kgpd_output01 node gpr_Y0 node GPmean .. code:: ipython3 df = pandas.concat(dfs) df.head() .. raw:: html
index 1 10 100 1000 10000 100000 name
0 average 0.000623 0.000592 0.000754 0.002202 0.017529 0.201192 scan0
1 deviation 0.000115 0.000030 0.000034 0.000026 0.000976 0.000000 scan0
2 min_exec 0.000541 0.000537 0.000657 0.002169 0.016677 0.201192 scan0
3 max_exec 0.000980 0.000639 0.000780 0.002239 0.018896 0.201192 scan0
4 repeat 20.000000 20.000000 10.000000 5.000000 3.000000 1.000000 scan0
.. code:: ipython3 df100c = df[df['index'] == "average"] df100c_min = df[df['index'] == "min_exec"] df100c_max = df[df['index'] == "max_exec"] ave = df100c.iloc[:, 4] ave_min = df100c_min.iloc[:, 4] ave_max = df100c_max.iloc[:, 4] ave.shape, ave_min.shape, ave_max.shape index = numpy.arange(ave.shape[0]) .. code:: ipython3 fig, ax = plt.subplots(1, 1, figsize=(14, 4)) ax.bar(index, ave, align='center', color='orange') ax.set_xticks(index) ax.set_xticklabels(df100c.name) for tick in ax.get_xticklabels(): tick.set_rotation(20) ax.errorbar(index, ave, numpy.abs((numpy.vstack([ave_min.values, ave_max.values]) - ave.values.ravel())), uplims=True, lolims=True) ax.set_title("Cumulated time spent in each node for 100 " "observations\nGaussianProcess and onnxruntime1"); .. image:: onnx_node_time_70_0.png The visual graph helps matching the output names with the operator type. The curve is not monotononic because each experiment computes every output from the start. The number of repetitions should be increased. Documentation of function `benchmark_fct `__ tells how to do it.