.. _l-onnx-tutorial-benchmark-orts: Compare two different onnxruntime ================================= The following section uses what is introduced at :ref:`l-benchmark-onnxruntime-skl-regular` to compare two different version of :epkg:`onnxruntime` on a given list of models. Bash script +++++++++++ The following script compares *onnxruntime 1.1.2* to a local version of *onnxruntime* installed through a local *pypi server* available at `http://localhost:8067/`. :: export models="LinearRegression,LogisticRegression,RandomForestRegressor,RandomForestClassifier,SVR,SVC" export NOW=$(date +"%Y%m%d") export suffix="LRSW-"$NOW echo --ORT112-ENV-- export vers="112" python -m virtualenv ort112 || exit 1 cd ort112 ./bin/python -m pip install -U pip ./bin/pip install numpy scikit-learn onnx pyquickhelper matplotlib threadpoolctl lightgbm xgboost || exit 1 ./bin/pip uninstall -y onnxruntime ./bin/pip install onnxruntime==1.1.2 || exit 1 ./bin/pip install --no-cache-dir --no-deps --index http://localhost:8067/simple/ onnx onnxconverter-common skl2onnx || exit 1 ./bin/pip install --no-cache-dir --no-deps --index http://localhost:8067/simple/ mlprodict || exit 1 ./bin/pip freeze || exit 1 echo --ORT112-BENCH-- ./bin/python -m mlprodict validate_runtime --n_features 4,50 -nu 3 -re 3 -o 11 -op 11 -v 1 --out_raw data$vers$suffix.csv --out_summary summary$vers$suffix.csv -b 1 --dump_folder dump_errors --runtime python_compiled,onnxruntime1 --models $models --out_graph bench_png$vers$suffix --dtype 32 || exit 1 echo --ORT112-END-- cd .. echo --NEW-ENV-- export vers="GIT" python -m virtualenv ortgit || exit 1 cd ortgit ./bin/python -m pip install -U pip ./bin/pip install numpy scikit-learn onnx pyquickhelper matplotlib threadpoolctl lightgbm xgboost || exit 1 ./bin/pip uninstall -y onnxruntime ./bin/pip uninstall -y onnxruntime-dnnl ./bin/pip install --no-cache-dir --no-deps --index http://localhost:8067/simple/ onnxruntime || exit 1 ./bin/pip install --no-cache-dir --no-deps --index http://localhost:8067/simple/ onnx onnxconverter-common skl2onnx || exit 1 ./bin/pip install --no-cache-dir --no-deps --index http://localhost:8067/simple/ mlprodict || exit 1 ./bin/pip freeze || exit 1 echo --NEW-BENCH-- ./bin/python -m mlprodict validate_runtime --n_features 4,50 -nu 3 -re 3 -o 11 -op 11 -v 1 --out_raw data$vers$suffix.csv --out_summary summary$vers$suffix.csv -b 1 --dump_folder dump_errors --runtime python_compiled,onnxruntime1 --models $models --out_graph bench_png$vers$suffix --dtype 32 || exit 1 echo --NEW-END-- cd .. echo --END-- Merge results +++++++++++++ It produces two files: ``data112LRSW-20200311.csv`` and ``dataGITLRSW-20200311.csv``. The following script merges them and computes a speed-up between the two versions and with :epkg:`scikit-learn`. :: from pprint import pprint import pandas from mlprodict.onnxrt.validate.validate_summary import merge_benchmark, summary_report names = {'ort112-': 'data112LRSW-20200311.csv', 'ortgit-': 'dataGITLRSW-20200311.csv'} dfs = {k: pandas.read_csv(v) for k, v in names.items()} merged = merge_benchmark(dfs, baseline="ort112-onnxruntime1") print('runtimes') pprint(set(merged['runtime'])) add_cols = list(sorted(c for c in merged.columns if c.endswith('-base'))) + ['ort_version'] pprint(add_cols) suma = summary_report(merged, add_cols=add_cols, add_index=['ort_version']) pprint(suma.columns) keep = [ 'name', 'problem', 'scenario', 'optim', 'n_features', 'runtime', 'skl_version', 'opset11', 'RT/SKL-N=1', 'RT/SKL-N=1-base', ] suma = suma[keep].copy() def replace(x): if not isinstance(x, str): return x return x.replace( "'zipmap': False", "NOZIPMAP").replace( "'raw_scores': True", "RAW") suma['ORT ?x SKL ONE'] = 1. / suma["RT/SKL-N=1"] suma['ORT ?x SKL BATCH'] = 1. / suma["N=10000"] suma['NEW ?x ORT ONE'] = 1. / suma["RT/SKL-N=1-base"] suma['NEW ?x ORT BATCH'] = 1. / suma["N=10000-base"] suma['optim'] = suma['optim'].apply(replace) suma = suma.drop(['RT/SKL-N=1', 'N=10000', 'RT/SKL-N=1-base', 'N=10000-base'], axis=1) writer = pandas.ExcelWriter('merged.xlsx', engine='xlsxwriter') suma.to_excel(writer, index=False, float_format="%1.3f", freeze_panes=(1, 1)) workbook = writer.book format0 = workbook.add_format({'bg_color': '#FF777E'}) format1 = workbook.add_format({'bg_color': '#FFC7CE'}) format2 = workbook.add_format({'bg_color': '#E6EFEE'}) format3 = workbook.add_format({'bg_color': '#C6DFCE'}) worksheet = writer.sheets['Sheet1'] pl = 'I2:L{}'.format(merged.shape[0] + 1) worksheet.conditional_format( pl, {'type': 'cell', 'criteria': '<', 'value': 0.5, 'format': format0}) worksheet.conditional_format( pl, {'type': 'cell', 'criteria': '<', 'value': 0.8, 'format': format1}) worksheet.conditional_format( pl, {'type': 'cell', 'criteria': '>=', 'value': 2., 'format': format3}) worksheet.conditional_format( pl, {'type': 'cell', 'criteria': '>=', 'value': 1.2, 'format': format2}) writer.save() The outcome is a spreadsheet which looks like this: .. image:: bort112.png Notes +++++ The script could be improve to measure some confidence interval. That's left for later. The speedup computation is not entirely accurate as it compares two different runtime to *scikit-learn* but not exactly the same one. Every benchmark works independently, it runs *scikit-learn*. It assumes the every run of every same model returns similar results. For a better metric, the ONNX models should be generated first and only then the runtimes should be compared but it gives at least an order of magnitude.