.. _onnxnodetimerst: =============================================== Time processing for every ONNX nodes in a graph =============================================== .. only:: html **Links:** :download:`notebook `, :downloadlink:`html `, :download:`PDF `, :download:`python `, :downloadlink:`slides `, :githublink:`GitHub|_doc/notebooks/onnx_node_time.ipynb|*` The following notebook show how long the runtime spends in each node of an ONNX graph. .. code:: ipython3 from jyquickhelper import add_notebook_menu add_notebook_menu() .. contents:: :local: .. code:: ipython3 %load_ext mlprodict .. code:: ipython3 %matplotlib inline LogisticRegression ------------------ .. code:: ipython3 from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression iris = load_iris() X, y = iris.data, iris.target X_train, X_test, y_train, y_test = train_test_split(X, y) clr = LogisticRegression(solver='liblinear') clr.fit(X_train, y_train) .. parsed-literal:: LogisticRegression(solver='liblinear') .. code:: ipython3 import numpy from mlprodict.onnx_conv import to_onnx onx = to_onnx(clr, X_test.astype(numpy.float32)) with open("logreg_time.onnx", "wb") as f: f.write(onx.SerializeToString()) # add -l 1 if nothing shows up %onnxview onx .. raw:: html

.. code:: ipython3 from mlprodict.onnxrt import OnnxInference import pandas oinf = OnnxInference(onx) res = oinf.run({'X': X_test}, node_time=True) pandas.DataFrame(list(res[1])) .. raw:: html

	i	name	op_type	time
0	0	LinearClassifier	LinearClassifier	0.199603
1	1	Normalizer	Normalizer	0.000091
2	2	Cast	Cast	0.000014
3	3	ZipMap	ZipMap	0.000016

.. code:: ipython3 oinf.run({'X': X_test})['output_probability'][:5] .. parsed-literal:: {0: array([8.38235830e-01, 1.21554664e-03, 6.97352537e-04, 7.93823160e-01, 9.24825077e-01]), 1: array([0.16162989, 0.39692812, 0.25688601, 0.20607722, 0.07516498]), 2: array([1.34279470e-04, 6.01856333e-01, 7.42416637e-01, 9.96200831e-05, 9.94208860e-06])} Measure time spent in each node ------------------------------- With parameter ``node_time=True``, method *run* returns the output and time measurement. .. code:: ipython3 exe = oinf.run({'X': X_test}, node_time=True) exe[1] .. parsed-literal:: [{'i': 0, 'name': 'LinearClassifier', 'op_type': 'LinearClassifier', 'time': 0.00015699999999974068}, {'i': 1, 'name': 'Normalizer', 'op_type': 'Normalizer', 'time': 5.43000000003957e-05}, {'i': 2, 'name': 'Cast', 'op_type': 'Cast', 'time': 1.1699999999947863e-05}, {'i': 3, 'name': 'ZipMap', 'op_type': 'ZipMap', 'time': 1.940000000111297e-05}] .. code:: ipython3 import pandas pandas.DataFrame(exe[1]) .. raw:: html

	i	name	op_type	time
0	0	LinearClassifier	LinearClassifier	0.000157
1	1	Normalizer	Normalizer	0.000054
2	2	Cast	Cast	0.000012
3	3	ZipMap	ZipMap	0.000019

Logistic regression: python runtime vs onnxruntime -------------------------------------------------- Function `enumerate_validated_operator_opsets `__ implements automated tests for every model with artificial data. Option ``node_time`` automatically returns the time spent in each node and does it multiple time. .. code:: ipython3 from mlprodict.onnxrt.validate import enumerate_validated_operator_opsets res = list(enumerate_validated_operator_opsets( verbose=0, models={"LogisticRegression"}, opset_min=12, runtime='python', debug=False, node_time=True, filter_exp=lambda m, p: p == "b-cl")) .. parsed-literal:: C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) .. code:: ipython3 import pandas df = pandas.DataFrame(res[0]['bench-batch']) df['step'] = df.apply(lambda row: '{}-{}'.format(row['i'], row["name"]), axis=1) df .. raw:: html

	i	name	op_type	time	N	max_time	min_time	repeat	number	step
0	0	LinearClassifier	LinearClassifier	0.000018	1	0.000033	0.000015	20	30	0-LinearClassifier
1	1	Normalizer	Normalizer	0.000017	1	0.000069	0.000012	20	30	1-Normalizer
2	2	Cast	Cast	0.000004	1	0.000009	0.000003	20	30	2-Cast
3	3	ZipMap	ZipMap	0.000005	1	0.000007	0.000004	20	30	3-ZipMap
4	0	LinearClassifier	LinearClassifier	0.000020	10	0.000052	0.000017	20	20	0-LinearClassifier
5	1	Normalizer	Normalizer	0.000015	10	0.000035	0.000013	20	20	1-Normalizer
6	2	Cast	Cast	0.000004	10	0.000013	0.000003	20	20	2-Cast
7	3	ZipMap	ZipMap	0.000004	10	0.000008	0.000004	20	20	3-ZipMap
8	0	LinearClassifier	LinearClassifier	0.000024	100	0.000036	0.000019	10	8	0-LinearClassifier
9	1	Normalizer	Normalizer	0.000018	100	0.000023	0.000015	10	8	1-Normalizer
10	2	Cast	Cast	0.000004	100	0.000006	0.000003	10	8	2-Cast
11	3	ZipMap	ZipMap	0.000007	100	0.000005	0.000004	10	8	3-ZipMap
12	0	LinearClassifier	LinearClassifier	0.000051	1000	0.000057	0.000047	5	5	0-LinearClassifier
13	1	Normalizer	Normalizer	0.000041	1000	0.000045	0.000040	5	5	1-Normalizer
14	2	Cast	Cast	0.000003	1000	0.000004	0.000003	5	5	2-Cast
15	3	ZipMap	ZipMap	0.000004	1000	0.000004	0.000004	5	5	3-ZipMap
16	0	LinearClassifier	LinearClassifier	0.000315	10000	0.000328	0.000315	3	3	0-LinearClassifier
17	1	Normalizer	Normalizer	0.000272	10000	0.000284	0.000256	3	3	1-Normalizer
18	2	Cast	Cast	0.000004	10000	0.000004	0.000004	3	3	2-Cast
19	3	ZipMap	ZipMap	0.000004	10000	0.000004	0.000004	3	3	3-ZipMap
20	0	LinearClassifier	LinearClassifier	0.005634	100000	0.005634	0.005634	1	2	0-LinearClassifier
21	1	Normalizer	Normalizer	0.004671	100000	0.004671	0.004671	1	2	1-Normalizer
22	2	Cast	Cast	0.000024	100000	0.000024	0.000024	1	2	2-Cast
23	3	ZipMap	ZipMap	0.000013	100000	0.000013	0.000013	1	2	3-ZipMap

Following tables shows the time spent in each node, it is relative to the total time. For one observation, the runtime spends 10% of the time in ZipMap, it is only 1% or 2% with 10 observations. These proportions change due to the computing cost of each node. .. code:: ipython3 piv = df.pivot('step', 'N', 'time') total = piv.sum(axis=0) piv / total .. raw:: html

N	1	10	100	1000	10000	100000
step
0-LinearClassifier	0.410138	0.459103	0.450882	0.512622	0.530490	0.544785
1-Normalizer	0.390060	0.353622	0.350126	0.414227	0.456671	0.451642
2-Cast	0.095729	0.089857	0.074343	0.034398	0.006092	0.002306
3-ZipMap	0.104073	0.097418	0.124649	0.038753	0.006747	0.001267

The python implementation of *ZipMap* does not change the data but wraps in into a frozen class `ArrayZipMapDitionary `__ which mocks a list of dictionaries *pandas* can ingest to create a DataFrame. The cost is a fixed cost and does not depend on the number of processed rows. .. code:: ipython3 from pyquickhelper.pycode.profiling import profile bigX = numpy.random.randn(100000, X_test.shape[1]).astype(numpy.float32) print(profile(lambda: oinf.run({'X': bigX}), pyinst_format="text")[1]) .. parsed-literal:: _ ._ __/__ _ _ _ _ _/_ Recorded: 00:28:08 Samples: 4 /_//_/// /_\ / //_// / //_'/ // Duration: 0.009 CPU time: 0.031 / _/ v3.0.1 Program: c:\python372_x64\lib\site-packages\ipykernel_launcher.py -f C:\Users\xavie\AppData\Roaming\jupyter\runtime\kernel-287476aa-b8ba-4140-902a-b0aad833ffd0.json 0.008 profile pyquickhelper\pycode\profiling.py:49 `- 0.008

:3 `- 0.008 run mlprodict\onnxrt\onnx_inference.py:475 `- 0.008 _run_sequence_runtime mlprodict\onnxrt\onnx_inference.py:558 `- 0.008 run mlprodict\onnxrt\onnx_inference_node.py:141 |- 0.005 run mlprodict\onnxrt\ops_cpu\_op.py:417 | `- 0.005 run mlprodict\onnxrt\ops_cpu\_op.py:298 | `- 0.005 _run mlprodict\onnxrt\ops_cpu\op_linear_classifier.py:40 | |- 0.003 [self] | `- 0.002 argmax <__array_function__ internals>:2 | `- 0.002 argmax numpy\core\fromnumeric.py:1112 | [3 frames hidden] numpy | 0.002 _wrapfunc numpy\core\fromnumeric.py:55 `- 0.003 run mlprodict\onnxrt\ops_cpu\_op.py:383 `- 0.003 run mlprodict\onnxrt\ops_cpu\_op.py:298 `- 0.003 _run mlprodict\onnxrt\ops_cpu\op_normalizer.py:66 `- 0.003 norm_l1 mlprodict\onnxrt\ops_cpu\op_normalizer.py:42 `- 0.003 _norm_L1_inplace mlprodict\onnxrt\ops_cpu\op_normalizer.py:49 |- 0.002 [self] `- 0.002 _sum numpy\core\_methods.py:36 [2 frames hidden] numpy The class *ArrayZipMapDictionary* is fast to build but has an overhead after that because it builds data when needed. .. code:: ipython3 res = oinf.run({'X': bigX}) prob = res['output_probability'] type(prob) .. parsed-literal:: mlprodict.onnxrt.ops_cpu.op_zipmap.ArrayZipMapDictionary .. code:: ipython3 %timeit pandas.DataFrame(prob) .. parsed-literal:: 721 ms ± 54.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) .. code:: ipython3 list_of_dict = [v.asdict() for v in prob] %timeit pandas.DataFrame(list_of_dict) .. parsed-literal:: 108 ms ± 2.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) But if you just need to do the following: .. code:: ipython3 %timeit pandas.DataFrame(prob).values .. parsed-literal:: 713 ms ± 56.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) Then, you can just do that: .. code:: ipython3 print(prob.columns) %timeit prob.values .. parsed-literal:: [0, 1, 2] 390 ns ± 51.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) And then: .. code:: ipython3 %timeit -n 100 pandas.DataFrame(prob.values, columns=prob.columns) .. parsed-literal:: 215 µs ± 82.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) We can then compare to what *onnxruntime* would do when the runtime is called indenpently for each node. We use the runtime named `onnxruntime2 `__. Class *OnnxInference* splits the ONNX graph into multiple ONNX graphs, one for each node, and then calls *onnxruntime* for each of them indenpently. *Python* handles the graph logic. .. code:: ipython3 res = list(enumerate_validated_operator_opsets( verbose=0, models={"LogisticRegression"}, opset_min=12, runtime='onnxruntime2', debug=False, node_time=True)) .. parsed-literal:: C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:1356: UserWarning: 'n_jobs' > 1 does not have any effect when 'solver' is set to 'liblinear'. Got 'n_jobs' = 4. " = {}.".format(effective_n_jobs(self.n_jobs))) .. code:: ipython3 res0 = None for i, r in enumerate(res): if "available-ERROR" in r: print(i, str(r['available-ERROR']).split("\n")[0]) elif res0 is None: res0 = r .. parsed-literal:: 0 Unable to load node 'ZipMap' (output type was inferred) 1 Unable to load node 'ZipMap' (output type was inferred) 4 Unable to load node 'LinearClassifier' (output type was guessed) 5 Unable to load node 'LinearClassifier' (output type was guessed) 6 Unable to load node 'LinearClassifier' (output type was guessed) 7 Unable to load node 'LinearClassifier' (output type was guessed) 8 Unable to load node 'ZipMap' (output type was inferred) 9 Unable to load node 'ZipMap' (output type was inferred) .. code:: ipython3 if '_6ort_run_batch_exc' in res[0]: m = "Something went wrong.", res[0]['_6ort_run_batch_exc'] else: df = pandas.DataFrame(res0['bench-batch']) print(df) df['step'] = df.apply(lambda row: '{}-{}'.format(row['i'], row["name"]), axis=1) piv = df.pivot('step', 'N', 'time') total = piv.sum(axis=0) m = piv / total m .. parsed-literal:: i name op_type time N max_time \ 0 0 LinearClassifier LinearClassifier 0.000052 1 0.000190 1 0 LinearClassifier LinearClassifier 0.000044 10 0.000070 2 0 LinearClassifier LinearClassifier 0.000071 100 0.000133 3 0 LinearClassifier LinearClassifier 0.000066 1000 0.000079 4 0 LinearClassifier LinearClassifier 0.000409 10000 0.000408 5 0 LinearClassifier LinearClassifier 0.003275 100000 0.003275 min_time repeat number 0 0.000028 20 30 1 0.000031 20 20 2 0.000046 10 8 3 0.000057 5 5 4 0.000365 3 3 5 0.003275 1 2 .. raw:: html

N	1	10	100	1000	10000	100000
step
0-LinearClassifier	1.0	1.0	1.0	1.0	1.0	1.0

*onnxruntime* creates a new container each time a ZipMap is executed. That’s whay it takes that much time and the ratio increases when the number of observations increases. GaussianProcessRegressor ------------------------ This operator is slow for small batches compare to scikit-learn but closes the gap as the batch size increases. Let’s see where the time goes. .. code:: ipython3 from onnx.defs import onnx_opset_version from mlprodict.tools.asv_options_helper import get_opset_number_from_onnx onnx_opset_version(), get_opset_number_from_onnx() .. parsed-literal:: (12, 12) .. code:: ipython3 res = list(enumerate_validated_operator_opsets( verbose=1, models={"GaussianProcessRegressor"}, opset_min=get_opset_number_from_onnx(), opset_max=get_opset_number_from_onnx(), runtime='python', debug=False, node_time=True, filter_exp=lambda m, p: p == "b-reg")) .. parsed-literal:: [enumerate_validated_operator_opsets] opset in [12, 12]. .. parsed-literal:: GaussianProcessRegressor : 0%| | 0/1 [00:00

	i	name	op_type	time	N	max_time	min_time	repeat	number	step
0	0	kgpd_CDist	CDist	0.000033	1	0.000045	0.000027	20	30	00-kgpd_CDist
1	1	kgpd_Div	Div	0.000009	1	0.000016	0.000007	20	30	01-kgpd_Div
2	2	kgpd_Mul	Mul	0.000006	1	0.000007	0.000005	20	30	02-kgpd_Mul
3	3	kgpd_Sin	Sin	0.000007	1	0.000009	0.000006	20	30	03-kgpd_Sin
4	4	kgpd_Div1	Div	0.000007	1	0.000008	0.000005	20	30	04-kgpd_Div1

.. code:: ipython3 pivpy = df.pivot('step', 'N', 'time') total = pivpy.sum(axis=0) pivpy / total .. raw:: html

N	1	10	100	1000	10000	100000
step
00-kgpd_CDist	0.311496	0.300665	0.244035	0.227984	0.264447	0.288546
01-kgpd_Div	0.082535	0.067193	0.028348	0.011667	0.012230	0.015447
02-kgpd_Mul	0.059840	0.050664	0.018670	0.006959	0.010950	0.012468
03-kgpd_Sin	0.067037	0.086529	0.106165	0.113068	0.102102	0.107563
04-kgpd_Div1	0.061852	0.053088	0.025749	0.010935	0.009810	0.009875
05-kgpd_Pow	0.072520	0.166539	0.361318	0.438253	0.418169	0.404182
06-kgpd_Mul1	0.057508	0.050477	0.020386	0.010334	0.009079	0.010466
07-kgpd_Exp	0.067885	0.098850	0.150876	0.168079	0.165177	0.145106
08-gpr_MatMul	0.137546	0.064570	0.025069	0.009029	0.007134	0.006159
09-gpr_Add	0.081782	0.061424	0.019383	0.003692	0.000903	0.000190

.. code:: ipython3 ax = (pivpy / total).T.plot(logx=True, figsize=(14, 4)) ax.set_ylim([0,1]) ax.set_title("Time spent in each node relatively to the total time\npython runtime"); .. image:: onnx_node_time_40_0.png The operator *Scan* is clearly time consuming when the batch size is small. *onnxruntime* is more efficient for this one. .. code:: ipython3 res = list(enumerate_validated_operator_opsets( verbose=1, models={"GaussianProcessRegressor"}, opset_min=get_opset_number_from_onnx(), opset_max=get_opset_number_from_onnx(), runtime='onnxruntime2', debug=False, node_time=True, filter_exp=lambda m, p: p == "b-reg")) .. parsed-literal:: [enumerate_validated_operator_opsets] opset in [12, 12]. .. parsed-literal:: GaussianProcessRegressor : 0%| | 0/1 [00:00

N	1	10	100	1000	10000	100000
step
00-kgpd_CDist	0.114001	0.120884	0.128845	0.148042	0.156776	0.180377
01-kgpd_Div	0.101792	0.098622	0.085983	0.085108	0.086603	0.084520
02-kgpd_Mul	0.099980	0.097547	0.084001	0.064706	0.072849	0.081023
03-kgpd_Sin	0.089632	0.103505	0.194194	0.301002	0.245769	0.260717
04-kgpd_Div1	0.099119	0.096737	0.088709	0.063237	0.095840	0.091635
05-kgpd_Pow	0.108045	0.098307	0.081161	0.064898	0.079015	0.076962
06-kgpd_Mul1	0.098561	0.098475	0.082770	0.063557	0.087732	0.076762
07-kgpd_Exp	0.090019	0.087015	0.086282	0.088542	0.103798	0.087690
08-gpr_MatMul	0.100426	0.102766	0.106220	0.102751	0.069157	0.059617
09-gpr_Add	0.098425	0.096143	0.061836	0.018155	0.002462	0.000696

.. code:: ipython3 if r is not None: ax = (pivort / total).T.plot(logx=True, figsize=(14, 4)) ax.set_ylim([0,1]) ax.set_title("Time spent in each node relatively to the total time\nonnxtunime"); .. image:: onnx_node_time_44_0.png The results are relative. Let’s see which runtime is best node by node. .. code:: ipython3 if r is not None: r = (pivort - pivpy) / pivpy r .. raw:: html

N	1	10	100	1000	10000	100000
step
00-kgpd_CDist	-0.239113	-0.367743	-0.630420	-0.703226	-0.677041	-0.631775
01-kgpd_Div	1.564119	1.308106	1.123155	2.333824	2.857590	2.223117
02-kgpd_Mul	2.473648	2.027773	2.149388	3.249448	2.624165	2.828025
03-kgpd_Sin	1.779780	0.881090	0.280401	0.216680	0.311288	0.427768
04-kgpd_Div1	2.331710	1.865534	1.411570	1.642922	4.321977	4.466249
05-kgpd_Pow	2.097502	-0.071724	-0.842765	-0.932321	-0.897065	-0.887837
06-kgpd_Mul1	2.563218	2.067909	1.842112	1.811019	4.263897	3.320524
07-kgpd_Exp	1.756911	0.384288	-0.599693	-0.759241	-0.657668	-0.644029
08-gpr_MatMul	0.517953	1.502798	1.965953	4.201281	4.281286	4.701255
09-gpr_Add	1.502111	1.461452	1.233097	1.247736	0.486358	1.160395

Based on this, *onnxruntime* is faster for operators *Scan*, *Pow*, *Exp* and slower for all the others. Measuring the time with a custom dataset ---------------------------------------- We use the example `Comparison of kernel ridge and Gaussian process regression `__. .. code:: ipython3 import numpy import pandas import matplotlib.pyplot as plt from sklearn.kernel_ridge import KernelRidge from sklearn.model_selection import GridSearchCV from sklearn.gaussian_process import GaussianProcessRegressor from sklearn.gaussian_process.kernels import WhiteKernel, ExpSineSquared rng = numpy.random.RandomState(0) # Generate sample data X = 15 * rng.rand(100, 1) y = numpy.sin(X).ravel() y += 3 * (0.5 - rng.rand(X.shape[0])) # add noise gp_kernel = ExpSineSquared(1.0, 5.0, periodicity_bounds=(1e-2, 1e1)) gpr = GaussianProcessRegressor(kernel=gp_kernel) gpr.fit(X, y) .. parsed-literal:: C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\gaussian_process\kernels.py:409: ConvergenceWarning: The optimal value found for dimension 0 of parameter length_scale is close to the specified lower bound 1e-05. Decreasing the bound and calling fit again may find a better value. ConvergenceWarning) C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\gaussian_process\kernels.py:418: ConvergenceWarning: The optimal value found for dimension 0 of parameter periodicity is close to the specified upper bound 10.0. Increasing the bound and calling fit again may find a better value. ConvergenceWarning) .. parsed-literal:: GaussianProcessRegressor(kernel=ExpSineSquared(length_scale=1, periodicity=5)) .. code:: ipython3 onx = to_onnx(gpr, X_test.astype(numpy.float64)) with open("gpr_time.onnx", "wb") as f: f.write(onx.SerializeToString()) %onnxview onx -r 1 .. raw:: html

.. code:: ipython3 from mlprodict.tools import get_ir_version_from_onnx onx.ir_version = get_ir_version_from_onnx() .. code:: ipython3 oinfpy = OnnxInference(onx, runtime="python") oinfort = OnnxInference(onx, runtime="onnxruntime2") ``runtime==onnxruntime2`` tells the class ``OnnxInference`` to use *onnxruntime* for every node independently, there are as many calls as there are nodes in the graph. .. code:: ipython3 respy = oinfpy.run({'X': X_test}, node_time=True) try: resort = oinfort.run({'X': X_test}, node_time=True) except Exception as e: print(e) resort = None .. code:: ipython3 if resort is not None: df = pandas.DataFrame(respy[1]).merge(pandas.DataFrame(resort[1]), on=["i", "name", "op_type"], suffixes=("_py", "_ort")) df['delta'] = df.time_ort - df.time_py else: df = None df .. raw:: html

	i	name	op_type	time_py	time_ort	delta
0	0	Sc_Scan	Scan	0.007998	0.005970	-0.002028
1	1	kgpd_Transpose	Transpose	0.000032	0.000599	0.000567
2	2	kgpd_Sqrt	Sqrt	0.000063	0.000112	0.000049
3	3	kgpd_Div	Div	0.000143	0.000097	-0.000045
4	4	kgpd_Mul	Mul	0.000038	0.000321	0.000283
5	5	kgpd_Sin	Sin	0.000095	0.000146	0.000051
6	6	kgpd_Div1	Div	0.000027	0.000096	0.000069
7	7	kgpd_Pow	Pow	0.000299	0.000104	-0.000196
8	8	kgpd_Mul1	Mul	0.000032	0.000097	0.000065
9	9	kgpd_Exp	Exp	0.000383	0.000111	-0.000271
10	10	gpr_MatMul	MatMul	0.000080	0.004359	0.004279
11	11	gpr_Add	Add	0.000034	0.000165	0.000131

The following function runs multiple the same inference and aggregates the results node by node. .. code:: ipython3 from mlprodict.onnxrt.validate.validate import benchmark_fct res = benchmark_fct(lambda X: oinfpy.run({'X': X_test}, node_time=True), X_test, node_time=True) .. code:: ipython3 df = pandas.DataFrame(res) df[df.N == 100] .. raw:: html

	i	name	op_type	time	N	max_time	min_time	repeat	number
24	0	Sc_Scan	Scan	0.004154	100	0.004330	0.003843	10	8
25	1	kgpd_Transpose	Transpose	0.000013	100	0.000019	0.000010	10	8
26	2	kgpd_Sqrt	Sqrt	0.000018	100	0.000022	0.000015	10	8
27	3	kgpd_Div	Div	0.000025	100	0.000092	0.000015	10	8
28	4	kgpd_Mul	Mul	0.000012	100	0.000019	0.000009	10	8
29	5	kgpd_Sin	Sin	0.000057	100	0.000070	0.000050	10	8
30	6	kgpd_Div1	Div	0.000014	100	0.000017	0.000011	10	8
31	7	kgpd_Pow	Pow	0.000172	100	0.000198	0.000155	10	8
32	8	kgpd_Mul1	Mul	0.000020	100	0.000101	0.000009	10	8
33	9	kgpd_Exp	Exp	0.000213	100	0.000249	0.000193	10	8
34	10	gpr_MatMul	MatMul	0.000034	100	0.000047	0.000026	10	8
35	11	gpr_Add	Add	0.000013	100	0.000019	0.000011	10	8

.. code:: ipython3 df100 = df[df.N == 100] .. code:: ipython3 %matplotlib inline .. code:: ipython3 fig, ax = plt.subplots(1, 1, figsize=(14, 4)) ax.bar(df100.i, df100.time, align='center', color='orange') ax.set_xticks(df100.i) ax.set_yscale('log') ax.set_xticklabels(df100.op_type) ax.errorbar(df100.i, df100.time, numpy.abs(df100[["min_time", "max_time"]].T.values - df100.time.values.ravel()), uplims=True, lolims=True, color='blue') ax.set_title("Time spent in each node for 100 observations\nGaussianProcess"); .. image:: onnx_node_time_61_0.png .. code:: ipython3 df100c = df100.cumsum() .. code:: ipython3 fig, ax = plt.subplots(1, 1, figsize=(14, 4)) ax.bar(df100.i, df100c.time, align='center', color='orange') ax.set_xticks(df100.i) #ax.set_yscale('log') ax.set_ylim([df100c.min_time.min(), df100c.max_time.max()]) ax.set_xticklabels(df100.op_type) ax.errorbar(df100.i, df100c.time, numpy.abs((df100c[["min_time", "max_time"]].T.values - df100c.time.values.ravel())), uplims=True, lolims=True) ax.set_title("Cumulated time spent in each node for 100 observations\nGaussianProcess"); .. image:: onnx_node_time_63_0.png onnxruntime2 / onnxruntime1 --------------------------- The runtime ``onnxruntime1`` uses *onnxruntime* for the whole ONNX graph. There is no way to get the computation time for each node except if we create a ONNX graph for each intermediate node. .. code:: ipython3 oinfort1 = OnnxInference(onx, runtime='onnxruntime1') .. code:: ipython3 split = oinfort1.build_intermediate() split .. parsed-literal:: OrderedDict([('scan0', OnnxInference(...)), ('scan1', OnnxInference(...)), ('kgpd_transposed0', OnnxInference(...)), ('kgpd_Y0', OnnxInference(...)), ('kgpd_C03', OnnxInference(...)), ('kgpd_C02', OnnxInference(...)), ('kgpd_output02', OnnxInference(...)), ('kgpd_C01', OnnxInference(...)), ('kgpd_Z0', OnnxInference(...)), ('kgpd_C0', OnnxInference(...)), ('kgpd_output01', OnnxInference(...)), ('gpr_Y0', OnnxInference(...)), ('GPmean', OnnxInference(...))]) .. code:: ipython3 dfs = [] for k, v in split.items(): print("node", k) res = benchmark_fct(lambda x: v.run({'X': x}), X_test) df = pandas.DataFrame(res) df['name'] = k dfs.append(df.reset_index(drop=False)) .. parsed-literal:: node scan0 node scan1 node kgpd_transposed0 node kgpd_Y0 node kgpd_C03 node kgpd_C02 node kgpd_output02 node kgpd_C01 node kgpd_Z0 node kgpd_C0 node kgpd_output01 node gpr_Y0 node GPmean .. code:: ipython3 df = pandas.concat(dfs) df.head() .. raw:: html

	index	1	10	100	1000	10000	100000	name
0	average	0.000623	0.000592	0.000754	0.002202	0.017529	0.201192	scan0
1	deviation	0.000115	0.000030	0.000034	0.000026	0.000976	0.000000	scan0
2	min_exec	0.000541	0.000537	0.000657	0.002169	0.016677	0.201192	scan0
3	max_exec	0.000980	0.000639	0.000780	0.002239	0.018896	0.201192	scan0
4	repeat	20.000000	20.000000	10.000000	5.000000	3.000000	1.000000	scan0

.. code:: ipython3 df100c = df[df['index'] == "average"] df100c_min = df[df['index'] == "min_exec"] df100c_max = df[df['index'] == "max_exec"] ave = df100c.iloc[:, 4] ave_min = df100c_min.iloc[:, 4] ave_max = df100c_max.iloc[:, 4] ave.shape, ave_min.shape, ave_max.shape index = numpy.arange(ave.shape[0]) .. code:: ipython3 fig, ax = plt.subplots(1, 1, figsize=(14, 4)) ax.bar(index, ave, align='center', color='orange') ax.set_xticks(index) ax.set_xticklabels(df100c.name) for tick in ax.get_xticklabels(): tick.set_rotation(20) ax.errorbar(index, ave, numpy.abs((numpy.vstack([ave_min.values, ave_max.values]) - ave.values.ravel())), uplims=True, lolims=True) ax.set_title("Cumulated time spent in each node for 100 " "observations\nGaussianProcess and onnxruntime1"); .. image:: onnx_node_time_70_0.png The visual graph helps matching the output names with the operator type. The curve is not monotononic because each experiment computes every output from the start. The number of repetitions should be increased. Documentation of function `benchmark_fct `__ tells how to do it.