.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_tutorial/plot_dbegin_options_zipmap.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_tutorial_plot_dbegin_options_zipmap.py: .. _l-tutorial-example-zipmap: Choose appropriate output of a classifier ========================================= A scikit-learn classifier usually returns a matrix of probabilities. By default, *sklearn-onnx* converts that matrix into a list of dictionaries where each probabily is mapped to its class id or name. That mechanism retains the class names but is slower. Let's see what other options are available. .. contents:: :local: Train a model and convert it ++++++++++++++++++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 23-47 .. code-block:: default from timeit import repeat import numpy import sklearn from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split import onnxruntime as rt import onnx import skl2onnx from skl2onnx.common.data_types import FloatTensorType from skl2onnx import to_onnx from sklearn.linear_model import LogisticRegression from sklearn.multioutput import MultiOutputClassifier iris = load_iris() X, y = iris.data, iris.target X = X.astype(numpy.float32) y = y * 2 + 10 # to get labels different from [0, 1, 2] X_train, X_test, y_train, y_test = train_test_split(X, y) clr = LogisticRegression(max_iter=500) clr.fit(X_train, y_train) print(clr) onx = to_onnx(clr, X_train, target_opset=12) .. rst-class:: sphx-glr-script-out .. code-block:: none LogisticRegression(max_iter=500) .. GENERATED FROM PYTHON SOURCE LINES 48-53 Default behaviour: zipmap=True ++++++++++++++++++++++++++++++ The output type for the probabilities is a list of dictionaries. .. GENERATED FROM PYTHON SOURCE LINES 53-60 .. code-block:: default sess = rt.InferenceSession(onx.SerializeToString()) res = sess.run(None, {'X': X_test}) print(res[1][:2]) print("probabilities type:", type(res[1])) print("type for the first observations:", type(res[1][0])) .. rst-class:: sphx-glr-script-out .. code-block:: none [{10: 0.9537464380264282, 12: 0.046253304928541183, 14: 2.646289090080245e-07}, {10: 0.0001454185403417796, 12: 0.25882941484451294, 14: 0.7410251498222351}] probabilities type: type for the first observations: .. GENERATED FROM PYTHON SOURCE LINES 61-65 Option zipmap=False +++++++++++++++++++ Probabilities are now a matrix. .. GENERATED FROM PYTHON SOURCE LINES 65-76 .. code-block:: default initial_type = [('float_input', FloatTensorType([None, 4]))] options = {id(clr): {'zipmap': False}} onx2 = to_onnx(clr, X_train, options=options, target_opset=12) sess2 = rt.InferenceSession(onx2.SerializeToString()) res2 = sess2.run(None, {'X': X_test}) print(res2[1][:2]) print("probabilities type:", type(res2[1])) print("type for the first observations:", type(res2[1][0])) .. rst-class:: sphx-glr-script-out .. code-block:: none [[9.5374644e-01 4.6253305e-02 2.6462891e-07] [1.4541854e-04 2.5882941e-01 7.4102515e-01]] probabilities type: type for the first observations: .. GENERATED FROM PYTHON SOURCE LINES 77-83 Option zipmap='columns' +++++++++++++++++++++++ This options removes the final operator ZipMap and splits the probabilities into columns. The final model produces one output for the label, and one output per class. .. GENERATED FROM PYTHON SOURCE LINES 83-94 .. code-block:: default options = {id(clr): {'zipmap': 'columns'}} onx3 = to_onnx(clr, X_train, options=options, target_opset=12) sess3 = rt.InferenceSession(onx3.SerializeToString()) res3 = sess3.run(None, {'X': X_test}) for i, out in enumerate(sess3.get_outputs()): print("output: '{}' shape={} values={}...".format( out.name, res3[i].shape, res3[i][:2])) .. rst-class:: sphx-glr-script-out .. code-block:: none output: 'output_label' shape=(38,) values=[10 14]... output: 'i10' shape=(38,) values=[9.5374644e-01 1.4541854e-04]... output: 'i12' shape=(38,) values=[0.0462533 0.2588294]... output: 'i14' shape=(38,) values=[2.6462891e-07 7.4102515e-01]... .. GENERATED FROM PYTHON SOURCE LINES 95-97 Let's compare prediction time +++++++++++++++++++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 97-117 .. code-block:: default print("Average time with ZipMap:") print(sum(repeat(lambda: sess.run(None, {'X': X_test}), number=100, repeat=10)) / 10) print("Average time without ZipMap:") print(sum(repeat(lambda: sess2.run(None, {'X': X_test}), number=100, repeat=10)) / 10) print("Average time without ZipMap but with columns:") print(sum(repeat(lambda: sess3.run(None, {'X': X_test}), number=100, repeat=10)) / 10) # The prediction is much faster without ZipMap # on this example. # The optimisation is even faster when the classes # are described with strings and not integers # as the final result (list of dictionaries) may copy # many times the same information with onnxruntime. .. rst-class:: sphx-glr-script-out .. code-block:: none Average time with ZipMap: 0.02033220697194338 Average time without ZipMap: 0.011903474852442742 Average time without ZipMap but with columns: 0.018549243127927183 .. GENERATED FROM PYTHON SOURCE LINES 118-125 Option zimpap=False and output_class_labels=True ++++++++++++++++++++++++++++++++++++++++++++++++ Option `zipmap=False` seems a better choice because it is much faster but labels are lost in the process. Option `output_class_labels` can be used to expose the labels as a third output. .. GENERATED FROM PYTHON SOURCE LINES 125-136 .. code-block:: default initial_type = [('float_input', FloatTensorType([None, 4]))] options = {id(clr): {'zipmap': False, 'output_class_labels': True}} onx4 = to_onnx(clr, X_train, options=options, target_opset=12) sess4 = rt.InferenceSession(onx4.SerializeToString()) res4 = sess4.run(None, {'X': X_test}) print(res4[1][:2]) print("probabilities type:", type(res4[1])) print("class labels:", res4[2]) .. rst-class:: sphx-glr-script-out .. code-block:: none [[9.5374644e-01 4.6253305e-02 2.6462891e-07] [1.4541854e-04 2.5882941e-01 7.4102515e-01]] probabilities type: class labels: [10 12 14] .. GENERATED FROM PYTHON SOURCE LINES 137-138 Processing time. .. GENERATED FROM PYTHON SOURCE LINES 138-143 .. code-block:: default print("Average time without ZipMap but with output_class_labels:") print(sum(repeat(lambda: sess4.run(None, {'X': X_test}), number=100, repeat=10)) / 10) .. rst-class:: sphx-glr-script-out .. code-block:: none Average time without ZipMap but with output_class_labels: 0.01294846422970295 .. GENERATED FROM PYTHON SOURCE LINES 144-151 MultiOutputClassifier +++++++++++++++++++++ This model is equivalent to several classifiers, one for every label to predict. Instead of returning a matrix of probabilities, it returns a sequence of matrices. Let's first modify the labels to get a problem for a MultiOutputClassifier. .. GENERATED FROM PYTHON SOURCE LINES 151-156 .. code-block:: default y = numpy.vstack([y, y + 100]).T y[::5, 1] = 1000 # Let's a fourth class. print(y[:5]) .. rst-class:: sphx-glr-script-out .. code-block:: none [[ 10 1000] [ 10 110] [ 10 110] [ 10 110] [ 10 110]] .. GENERATED FROM PYTHON SOURCE LINES 157-158 Let's train a MultiOutputClassifier. .. GENERATED FROM PYTHON SOURCE LINES 158-170 .. code-block:: default X_train, X_test, y_train, y_test = train_test_split(X, y) clr = MultiOutputClassifier(LogisticRegression(max_iter=500)) clr.fit(X_train, y_train) print(clr) onx5 = to_onnx(clr, X_train, target_opset=12) sess5 = rt.InferenceSession(onx5.SerializeToString()) res5 = sess5.run(None, {'X': X_test[:3]}) print(res5) .. rst-class:: sphx-glr-script-out .. code-block:: none MultiOutputClassifier(estimator=LogisticRegression(max_iter=500)) somewheresklearn-onnx-jenkins_39_std/sklearn-onnx/skl2onnx/_parse.py:529: UserWarning: Option zipmap is ignored for model . Set option zipmap to False to remove this message. warnings.warn( [array([[ 14, 114], [ 12, 112], [ 14, 114]], dtype=int64), [array([[4.8395174e-04, 2.4657920e-01, 7.5293684e-01], [2.6832024e-02, 8.9546174e-01, 7.7706158e-02], [4.5859913e-05, 9.4109066e-02, 9.0584505e-01]], dtype=float32), array([[1.8320645e-03, 2.2418490e-01, 4.2191991e-01, 3.5206315e-01], [1.7106442e-02, 6.5702021e-01, 1.5892446e-01, 1.6694888e-01], [4.4455085e-04, 2.0774904e-01, 5.7789004e-01, 2.1391642e-01]], dtype=float32)]] .. GENERATED FROM PYTHON SOURCE LINES 171-173 Option zipmap is ignored. Labels are missing but they can be added back as a third output. .. GENERATED FROM PYTHON SOURCE LINES 173-184 .. code-block:: default onx6 = to_onnx(clr, X_train, target_opset=12, options={'zipmap': False, 'output_class_labels': True}) sess6 = rt.InferenceSession(onx6.SerializeToString()) res6 = sess6.run(None, {'X': X_test[:3]}) print("predicted labels", res6[0]) print("predicted probabilies", res6[1]) print("class labels", res6[2]) .. rst-class:: sphx-glr-script-out .. code-block:: none predicted labels [[ 14 114] [ 12 112] [ 14 114]] predicted probabilies [array([[4.8395174e-04, 2.4657920e-01, 7.5293684e-01], [2.6832024e-02, 8.9546174e-01, 7.7706158e-02], [4.5859913e-05, 9.4109066e-02, 9.0584505e-01]], dtype=float32), array([[1.8320645e-03, 2.2418490e-01, 4.2191991e-01, 3.5206315e-01], [1.7106442e-02, 6.5702021e-01, 1.5892446e-01, 1.6694888e-01], [4.4455085e-04, 2.0774904e-01, 5.7789004e-01, 2.1391642e-01]], dtype=float32)] class labels [array([10, 12, 14], dtype=int64), array([ 110, 112, 114, 1000], dtype=int64)] .. GENERATED FROM PYTHON SOURCE LINES 185-186 **Versions used for this example** .. GENERATED FROM PYTHON SOURCE LINES 186-192 .. code-block:: default print("numpy:", numpy.__version__) print("scikit-learn:", sklearn.__version__) print("onnx: ", onnx.__version__) print("onnxruntime: ", rt.__version__) print("skl2onnx: ", skl2onnx.__version__) .. rst-class:: sphx-glr-script-out .. code-block:: none numpy: 1.23.5 scikit-learn: 1.2.2 onnx: 1.13.1 onnxruntime: 1.14.1 skl2onnx: 1.14.0 .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 1.458 seconds) .. _sphx_glr_download_auto_tutorial_plot_dbegin_options_zipmap.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_dbegin_options_zipmap.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_dbegin_options_zipmap.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_