.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/plot_dbegin_options.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_plot_dbegin_options.py>`
        to download the full example code or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_plot_dbegin_options.py:


One model, many possible conversions with options
=================================================

.. index:: options

There is not one way to convert a model. A new operator
might have been added in a newer version of :epkg:`ONNX`
and that speeds up the converted model. The rational choice
would be to use this new operator but what means the associated
runtime has an implementation for it. What if two different
users needs two different conversion for the same model?
Let's see how this may be done.

.. contents::
    :local:


Option *zipmap*
+++++++++++++++

Every classifier is by design converted into an ONNX graph which outputs
two results: the predicted label and the prediction probabilites
for every label. By default, the labels are integers and the
probabilites are stored in dictionaries. That's the purpose
of operator *ZipMap* added at the end of the following graph.

.. gdot::
    :script: DOT-SECTION

    import numpy
    from sklearn.datasets import load_iris
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LogisticRegression
    from skl2onnx import to_onnx
    from mlprodict.onnxrt import OnnxInference

    iris = load_iris()
    X, y = iris.data, iris.target
    X_train, _, y_train, __ = train_test_split(X, y, random_state=11)
    clr = LogisticRegression()
    clr.fit(X_train, y_train)

    model_def = to_onnx(clr, X_train.astype(numpy.float32))
    oinf = OnnxInference(model_def)
    print("DOT-SECTION", oinf.to_dot())

This operator is not really efficient as it copies every probabilies and
labels in a different container. This time is usually significant for
small classifiers. Then it makes sense to remove it.

.. gdot::
    :script: DOT-SECTION

    import numpy
    from sklearn.datasets import load_iris
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LogisticRegression
    from skl2onnx import to_onnx
    from mlprodict.onnxrt import OnnxInference

    iris = load_iris()
    X, y = iris.data, iris.target
    X_train, _, y_train, __ = train_test_split(X, y, random_state=11)
    clr = LogisticRegression()
    clr.fit(X_train, y_train)

    model_def = to_onnx(clr, X_train.astype(numpy.float32),
                        options={LogisticRegression: {'zipmap': False}})
    oinf = OnnxInference(model_def)
    print("DOT-SECTION", oinf.to_dot())

There might be in the graph many classifiers, it is important to have
a way to specify which classifier should keep its *ZipMap*
and which is not. So it is possible to specify options by id.

.. GENERATED FROM PYTHON SOURCE LINES 77-101

.. code-block:: default


    from pyquickhelper.helpgen.graphviz_helper import plot_graphviz
    from pprint import pformat
    from skl2onnx.common._registration import _converter_pool
    from sklearn.preprocessing import MinMaxScaler
    from sklearn.pipeline import Pipeline
    import numpy
    from sklearn.datasets import load_iris
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LogisticRegression
    from skl2onnx import to_onnx
    from mlprodict.onnxrt import OnnxInference

    iris = load_iris()
    X, y = iris.data, iris.target
    X_train, _, y_train, __ = train_test_split(X, y, random_state=11)
    clr = LogisticRegression()
    clr.fit(X_train, y_train)

    model_def = to_onnx(clr, X_train.astype(numpy.float32),
                        options={id(clr): {'zipmap': False}})
    oinf = OnnxInference(model_def, runtime='python_compiled')
    print(oinf)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    somewhereonnxcustom_39_std/_venv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:814: ConvergenceWarning: lbfgs failed to converge (status=1):
    STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

    Increase the number of iterations (max_iter) or scale the data as shown in:
        https://scikit-learn.org/stable/modules/preprocessing.html
    Please also refer to the documentation for alternative solver options:
        https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
      n_iter_i = _check_optimize_result(
    OnnxInference(...)
        def compiled_run(dict_inputs):
            # inputs
            X = dict_inputs['X']
            (label, probability_tensor, ) = n0_linearclassifier(X)
            (probabilities, ) = n1_normalizer(probability_tensor)
            return {
                'label': label,
                'probabilities': probabilities,
            }


.. GENERATED FROM PYTHON SOURCE LINES 102-103

Visually.

.. GENERATED FROM PYTHON SOURCE LINES 103-109

.. code-block:: default


    ax = plot_graphviz(oinf.to_dot())
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)


.. image:: /auto_examples/images/sphx_glr_plot_dbegin_options_001.png
    :alt: plot dbegin options
    :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 110-112

We need to compare that kind of visualisation to
what it would give with operator *ZipMap*.

.. GENERATED FROM PYTHON SOURCE LINES 112-117

.. code-block:: default


    model_def = to_onnx(clr, X_train.astype(numpy.float32))
    oinf = OnnxInference(model_def, runtime='python_compiled')
    print(oinf)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    OnnxInference(...)
        def compiled_run(dict_inputs):
            # inputs
            X = dict_inputs['X']
            (label, probability_tensor, ) = n0_linearclassifier(X)
            (probabilities, ) = n1_normalizer(probability_tensor)
            (output_label, ) = n2_cast(label)
            (output_probability, ) = n3_zipmap(probabilities)
            return {
                'output_label': output_label,
                'output_probability': output_probability,
            }


.. GENERATED FROM PYTHON SOURCE LINES 118-119

Visually.

.. GENERATED FROM PYTHON SOURCE LINES 119-125

.. code-block:: default


    ax = plot_graphviz(oinf.to_dot())
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)


.. image:: /auto_examples/images/sphx_glr_plot_dbegin_options_002.png
    :alt: plot dbegin options
    :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 126-128

Using function *id* has one flaw: it is not pickable.
It is just better to use strings.

.. GENERATED FROM PYTHON SOURCE LINES 128-135

.. code-block:: default


    model_def = to_onnx(clr, X_train.astype(numpy.float32),
                        options={'zipmap': False})
    oinf = OnnxInference(model_def, runtime='python_compiled')
    print(oinf)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    OnnxInference(...)
        def compiled_run(dict_inputs):
            # inputs
            X = dict_inputs['X']
            (label, probability_tensor, ) = n0_linearclassifier(X)
            (probabilities, ) = n1_normalizer(probability_tensor)
            return {
                'label': label,
                'probabilities': probabilities,
            }


.. GENERATED FROM PYTHON SOURCE LINES 136-137

Visually.

.. GENERATED FROM PYTHON SOURCE LINES 137-143

.. code-block:: default


    ax = plot_graphviz(oinf.to_dot())
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)


.. image:: /auto_examples/images/sphx_glr_plot_dbegin_options_003.png
    :alt: plot dbegin options
    :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 144-149

Option in a pipeline
++++++++++++++++++++

In a pipeline, :epkg:`sklearn-onnx` uses the same
name convention.

.. GENERATED FROM PYTHON SOURCE LINES 149-162

.. code-block:: default


    pipe = Pipeline([
        ('norm', MinMaxScaler()),
        ('clr', LogisticRegression())
    ])
    pipe.fit(X_train, y_train)

    model_def = to_onnx(pipe, X_train.astype(numpy.float32),
                        options={'clr__zipmap': False})
    oinf = OnnxInference(model_def, runtime='python_compiled')
    print(oinf)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    OnnxInference(...)
        def compiled_run(dict_inputs):
            # inputs
            X = dict_inputs['X']
            (variable, ) = n0_scaler(X)
            (label, probability_tensor, ) = n1_linearclassifier(variable)
            (probabilities, ) = n2_normalizer(probability_tensor)
            return {
                'label': label,
                'probabilities': probabilities,
            }


.. GENERATED FROM PYTHON SOURCE LINES 163-164

Visually.

.. GENERATED FROM PYTHON SOURCE LINES 164-170

.. code-block:: default


    ax = plot_graphviz(oinf.to_dot())
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)


.. image:: /auto_examples/images/sphx_glr_plot_dbegin_options_004.png
    :alt: plot dbegin options
    :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 171-178

Option *raw_scores*
+++++++++++++++++++

Every classifier is converted in a graph which
returns probabilities by default. But many models
compute unscaled *raw_scores*.
First, with probabilities:

.. GENERATED FROM PYTHON SOURCE LINES 178-194

.. code-block:: default


    pipe = Pipeline([
        ('norm', MinMaxScaler()),
        ('clr', LogisticRegression())
    ])
    pipe.fit(X_train, y_train)

    model_def = to_onnx(
        pipe, X_train.astype(numpy.float32),
        options={id(pipe): {'zipmap': False}})

    oinf = OnnxInference(model_def, runtime='python_compiled')
    print(oinf.run({'X': X.astype(numpy.float32)[:5]}))


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    {'label': array([0, 0, 0, 0, 0]), 'probabilities': array([[0.88268614, 0.10948392, 0.00782984],
           [0.7944385 , 0.19728662, 0.00827491],
           [0.85557765, 0.1379205 , 0.00650185],
           [0.8262804 , 0.16634218, 0.00737738],
           [0.9005015 , 0.092388  , 0.00711049]], dtype=float32)}


.. GENERATED FROM PYTHON SOURCE LINES 195-196

Then with raw scores:

.. GENERATED FROM PYTHON SOURCE LINES 196-205

.. code-block:: default


    model_def = to_onnx(
        pipe, X_train.astype(numpy.float32),
        options={id(pipe): {'raw_scores': True, 'zipmap': False}})

    oinf = OnnxInference(model_def, runtime='python_compiled')
    print(oinf.run({'X': X.astype(numpy.float32)[:5]}))


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    {'label': array([0, 0, 0, 0, 0]), 'probabilities': array([[0.88268614, 0.10948392, 0.00782984],
           [0.7944385 , 0.19728662, 0.00827491],
           [0.85557765, 0.1379205 , 0.00650185],
           [0.8262804 , 0.16634218, 0.00737738],
           [0.9005015 , 0.092388  , 0.00711049]], dtype=float32)}


.. GENERATED FROM PYTHON SOURCE LINES 206-209

It did not seem to work... We need to tell
that applies on a specific part of the pipeline
and not the whole pipeline.

.. GENERATED FROM PYTHON SOURCE LINES 209-217

.. code-block:: default


    model_def = to_onnx(
        pipe, X_train.astype(numpy.float32),
        options={id(pipe.steps[1][1]): {'raw_scores': True, 'zipmap': False}})

    oinf = OnnxInference(model_def, runtime='python_compiled')
    print(oinf.run({'X': X.astype(numpy.float32)[:5]}))


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    {'label': array([0, 0, 0, 0, 0]), 'probabilities': array([[ 2.2707398 ,  0.18354774, -2.4542873 ],
           [ 1.9857951 ,  0.5928171 , -2.5786123 ],
           [ 2.2349298 ,  0.4098304 , -2.6447601 ],
           [ 2.1071343 ,  0.50424725, -2.6113815 ],
           [ 2.3727787 ,  0.095824  , -2.4686024 ]], dtype=float32)}


.. GENERATED FROM PYTHON SOURCE LINES 218-220

There are negative values. That works.
Strings are still easier to use.

.. GENERATED FROM PYTHON SOURCE LINES 220-229

.. code-block:: default


    model_def = to_onnx(
        pipe, X_train.astype(numpy.float32),
        options={'clr__raw_scores': True, 'clr__zipmap': False})

    oinf = OnnxInference(model_def, runtime='python_compiled')
    print(oinf.run({'X': X.astype(numpy.float32)[:5]}))


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    {'label': array([0, 0, 0, 0, 0]), 'probabilities': array([[ 2.2707398 ,  0.18354774, -2.4542873 ],
           [ 1.9857951 ,  0.5928171 , -2.5786123 ],
           [ 2.2349298 ,  0.4098304 , -2.6447601 ],
           [ 2.1071343 ,  0.50424725, -2.6113815 ],
           [ 2.3727787 ,  0.095824  , -2.4686024 ]], dtype=float32)}


.. GENERATED FROM PYTHON SOURCE LINES 230-231

Negative figures. We still have raw scores.

.. GENERATED FROM PYTHON SOURCE LINES 233-238

List of available options
+++++++++++++++++++++++++

Options are registered for every converted to detect any
supported options while running the conversion.

.. GENERATED FROM PYTHON SOURCE LINES 238-251

.. code-block:: default


    all_opts = set()
    for k, v in sorted(_converter_pool.items()):
        opts = v.get_allowed_options()
        if not isinstance(opts, dict):
            continue
        name = k.replace('Sklearn', '')
        print('%s%s %r' % (name, " " * (30 - len(name)), opts))
        for o in opts:
            all_opts.add(o)

    print('all options:', pformat(list(sorted(all_opts))))


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    LightGbmLGBMClassifier         {'nocl': [True, False], 'zipmap': [True, False, 'columns']}
    AdaBoostClassifier             {'zipmap': [True, False, 'columns'], 'nocl': [True, False], 'raw_scores': [True, False]}
    BaggingClassifier              {'zipmap': [True, False, 'columns'], 'nocl': [True, False], 'raw_scores': [True, False]}
    BayesianGaussianMixture        {'score_samples': [True, False]}
    BayesianRidge                  {'return_std': [True, False]}
    BernoulliNB                    {'zipmap': [True, False, 'columns'], 'nocl': [True, False]}
    CalibratedClassifierCV         {'zipmap': [True, False, 'columns'], 'nocl': [True, False]}
    CategoricalNB                  {'zipmap': [True, False, 'columns'], 'nocl': [True, False]}
    ComplementNB                   {'zipmap': [True, False, 'columns'], 'nocl': [True, False]}
    CountVectorizer                {'tokenexp': None, 'separators': None, 'nan': [True, False]}
    DecisionTreeClassifier         {'zipmap': [True, False, 'columns'], 'nocl': [True, False], 'decision_path': [True, False], 'decision_leaf': [True, False]}
    DecisionTreeRegressor          {'decision_path': [True, False], 'decision_leaf': [True, False]}
    ExtraTreeClassifier            {'zipmap': [True, False, 'columns'], 'nocl': [True, False], 'decision_path': [True, False], 'decision_leaf': [True, False]}
    ExtraTreeRegressor             {'decision_path': [True, False], 'decision_leaf': [True, False]}
    ExtraTreesClassifier           {'zipmap': [True, False, 'columns'], 'raw_scores': [True, False], 'nocl': [True, False], 'decision_path': [True, False], 'decision_leaf': [True, False]}
    ExtraTreesRegressor            {'decision_path': [True, False], 'decision_leaf': [True, False]}
    GaussianMixture                {'score_samples': [True, False]}
    GaussianNB                     {'zipmap': [True, False, 'columns'], 'nocl': [True, False]}
    GaussianProcessClassifier      {'optim': [None, 'cdist'], 'nocl': [False, True], 'zipmap': [False, True]}
    GaussianProcessRegressor       {'return_cov': [False, True], 'return_std': [False, True], 'optim': [None, 'cdist']}
    GradientBoostingClassifier     {'zipmap': [True, False, 'columns'], 'raw_scores': [True, False], 'nocl': [True, False]}
    HistGradientBoostingClassifier {'zipmap': [True, False, 'columns'], 'raw_scores': [True, False], 'nocl': [True, False]}
    HistGradientBoostingRegressor  {'zipmap': [True, False, 'columns'], 'raw_scores': [True, False], 'nocl': [True, False]}
    IsolationForest                {'score_samples': [True, False]}
    KMeans                         {'gemm': [True, False]}
    KNNImputer                     {'optim': [None, 'cdist']}
    KNeighborsClassifier           {'zipmap': [True, False, 'columns'], 'nocl': [True, False], 'raw_scores': [True, False], 'optim': [None, 'cdist']}
    KNeighborsRegressor            {'optim': [None, 'cdist']}
    KNeighborsTransformer          {'optim': [None, 'cdist']}
    KernelPCA                      {'optim': [None, 'cdist']}
    LinearClassifier               {'zipmap': [True, False, 'columns'], 'nocl': [True, False], 'raw_scores': [True, False]}
    LinearSVC                      {'nocl': [True, False], 'raw_scores': [True, False]}
    LocalOutlierFactor             {'score_samples': [True, False], 'optim': [None, 'cdist']}
    MLPClassifier                  {'zipmap': [True, False, 'columns'], 'nocl': [True, False]}
    MaxAbsScaler                   {'div': ['std', 'div', 'div_cast']}
    MiniBatchKMeans                {'gemm': [True, False]}
    MultiOutputClassifier          {'nocl': [False, True]}
    MultinomialNB                  {'zipmap': [True, False, 'columns'], 'nocl': [True, False]}
    NearestNeighbors               {'optim': [None, 'cdist']}
    OneVsRestClassifier            {'zipmap': [True, False, 'columns'], 'nocl': [True, False], 'raw_scores': [True, False]}
    RadiusNeighborsClassifier      {'zipmap': [True, False, 'columns'], 'nocl': [True, False], 'raw_scores': [True, False], 'optim': [None, 'cdist']}
    RadiusNeighborsRegressor       {'optim': [None, 'cdist']}
    RandomForestClassifier         {'zipmap': [True, False, 'columns'], 'raw_scores': [True, False], 'nocl': [True, False], 'decision_path': [True, False], 'decision_leaf': [True, False]}
    RandomForestRegressor          {'decision_path': [True, False], 'decision_leaf': [True, False]}
    RobustScaler                   {'div': ['std', 'div', 'div_cast']}
    SGDClassifier                  {'zipmap': [True, False, 'columns'], 'nocl': [True, False], 'raw_scores': [True, False]}
    SVC                            {'zipmap': [True, False, 'columns'], 'nocl': [True, False], 'raw_scores': [True, False]}
    Scaler                         {'div': ['std', 'div', 'div_cast']}
    StackingClassifier             {'zipmap': [True, False, 'columns'], 'nocl': [True, False], 'raw_scores': [True, False]}
    TfidfTransformer               {'nan': [True, False]}
    TfidfVectorizer                {'tokenexp': None, 'separators': None, 'nan': [True, False]}
    VotingClassifier               {'zipmap': [True, False, 'columns'], 'nocl': [True, False]}
    all options: ['decision_leaf',
     'decision_path',
     'div',
     'gemm',
     'nan',
     'nocl',
     'optim',
     'raw_scores',
     'return_cov',
     'return_std',
     'score_samples',
     'separators',
     'tokenexp',
     'zipmap']


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  5.057 seconds)


.. _sphx_glr_download_auto_examples_plot_dbegin_options.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: binder-badge

    .. image:: images/binder_badge_logo.svg
      :target: https://mybinder.org/v2/gh/sdpython/onnxcustom/master?urlpath=lab/tree/notebooks/auto_examples/plot_dbegin_options.ipynb
      :alt: Launch binder
      :width: 150 px


  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_dbegin_options.py <plot_dbegin_options.py>`


  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_dbegin_options.ipynb <plot_dbegin_options.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_