.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/plot_gexternal_xgboost.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_plot_gexternal_xgboost.py>`
        to download the full example code or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_plot_gexternal_xgboost.py:


.. _example-xgboost:

Convert a pipeline with a XGBoost model
========================================

.. index:: XGBoost

:epkg:`sklearn-onnx` only converts :epkg:`scikit-learn` models
into :epkg:`ONNX` but many libraries implement :epkg:`scikit-learn`
API so that their models can be included in a :epkg:`scikit-learn`
pipeline. This example considers a pipeline including a :epkg:`XGBoost`
model. :epkg:`sklearn-onnx` can convert the whole pipeline as long as
it knows the converter associated to a *XGBClassifier*. Let's see
how to do it.

.. contents::
    :local:

Train a XGBoost classifier
++++++++++++++++++++++++++

.. GENERATED FROM PYTHON SOURCE LINES 23-72

.. code-block:: default

    from pyquickhelper.helpgen.graphviz_helper import plot_graphviz
    from mlprodict.onnxrt import OnnxInference
    import numpy
    import onnxruntime as rt
    from sklearn.datasets import load_iris, load_diabetes, make_classification
    from sklearn.model_selection import train_test_split
    from sklearn.pipeline import Pipeline
    from sklearn.preprocessing import StandardScaler
    from xgboost import XGBClassifier, XGBRegressor, DMatrix, train as train_xgb
    from skl2onnx.common.data_types import FloatTensorType
    from skl2onnx import convert_sklearn, to_onnx, update_registered_converter
    from skl2onnx.common.shape_calculator import (
        calculate_linear_classifier_output_shapes,
        calculate_linear_regressor_output_shapes)
    from onnxmltools.convert.xgboost.operator_converters.XGBoost import (
        convert_xgboost)
    from onnxmltools.convert import convert_xgboost as convert_xgboost_booster


    data = load_iris()
    X = data.data[:, :2]
    y = data.target

    ind = numpy.arange(X.shape[0])
    numpy.random.shuffle(ind)
    X = X[ind, :].copy()
    y = y[ind].copy()

    pipe = Pipeline([('scaler', StandardScaler()),
                     ('xgb', XGBClassifier(n_estimators=3))])
    pipe.fit(X, y)

    # The conversion fails but it is expected.

    try:
        convert_sklearn(pipe, 'pipeline_xgboost',
                        [('input', FloatTensorType([None, 2]))],
                        target_opset=12)
    except Exception as e:
        print(e)

    # The error message tells no converter was found
    # for :epkg:`XGBoost` models. By default, :epkg:`sklearn-onnx`
    # only handles models from :epkg:`scikit-learn` but it can
    # be extended to every model following :epkg:`scikit-learn`
    # API as long as the module knows there exists a converter
    # for every model used in a pipeline. That's why
    # we need to register a converter.


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    /usr/local/lib/python3.9/site-packages/xgboost/sklearn.py:1146: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
      warnings.warn(label_encoder_deprecation_msg, UserWarning)
    [14:18:17] WARNING: ../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
    Unable to find a shape calculator for type '<class 'xgboost.sklearn.XGBClassifier'>'.
    It usually means the pipeline being converted contains a
    transformer or a predictor with no corresponding converter
    implemented in sklearn-onnx. If the converted is implemented
    in another library, you need to register
    the converted so that it can be used by sklearn-onnx (function
    update_registered_converter). If the model is not yet covered
    by sklearn-onnx, you may raise an issue to
    https://github.com/onnx/sklearn-onnx/issues
    to get the converter implemented or even contribute to the
    project. If the model is a custom model, a new converter must
    be implemented. Examples can be found in the gallery.


.. GENERATED FROM PYTHON SOURCE LINES 73-84

Register the converter for XGBClassifier
++++++++++++++++++++++++++++++++++++++++

The converter is implemented in :epkg:`onnxmltools`:
`onnxmltools...XGBoost.py
<https://github.com/onnx/onnxmltools/blob/master/onnxmltools/convert/
xgboost/operator_converters/XGBoost.py>`_.
and the shape calculator:
`onnxmltools...Classifier.py
<https://github.com/onnx/onnxmltools/blob/master/onnxmltools/convert/
xgboost/shape_calculators/Classifier.py>`_.

.. GENERATED FROM PYTHON SOURCE LINES 84-90

.. code-block:: default


    update_registered_converter(
        XGBClassifier, 'XGBoostXGBClassifier',
        calculate_linear_classifier_output_shapes, convert_xgboost,
        options={'nocl': [True, False], 'zipmap': [True, False, 'columns']})


.. GENERATED FROM PYTHON SOURCE LINES 91-93

Convert again
+++++++++++++

.. GENERATED FROM PYTHON SOURCE LINES 93-103

.. code-block:: default


    model_onnx = convert_sklearn(
        pipe, 'pipeline_xgboost',
        [('input', FloatTensorType([None, 2]))],
        target_opset=12)

    # And save.
    with open("pipeline_xgboost.onnx", "wb") as f:
        f.write(model_onnx.SerializeToString())


.. GENERATED FROM PYTHON SOURCE LINES 104-108

Compare the predictions
+++++++++++++++++++++++

Predictions with XGBoost.

.. GENERATED FROM PYTHON SOURCE LINES 108-112

.. code-block:: default


    print("predict", pipe.predict(X[:5]))
    print("predict_proba", pipe.predict_proba(X[:1]))


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    predict [0 2 1 2 0]
    predict_proba [[0.69600695 0.1526681  0.15132491]]


.. GENERATED FROM PYTHON SOURCE LINES 113-114

Predictions with onnxruntime.

.. GENERATED FROM PYTHON SOURCE LINES 114-120

.. code-block:: default


    sess = rt.InferenceSession("pipeline_xgboost.onnx")
    pred_onx = sess.run(None, {"input": X[:5].astype(numpy.float32)})
    print("predict", pred_onx[0])
    print("predict_proba", pred_onx[1][:1])


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    predict [0 2 1 2 0]
    predict_proba [{0: 0.6960069537162781, 1: 0.15266810357570648, 2: 0.15132491290569305}]


.. GENERATED FROM PYTHON SOURCE LINES 121-123

Final graph
+++++++++++

.. GENERATED FROM PYTHON SOURCE LINES 123-131

.. code-block:: default


    oinf = OnnxInference(model_onnx)
    ax = plot_graphviz(oinf.to_dot())
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)


.. image:: /auto_examples/images/sphx_glr_plot_gexternal_xgboost_001.png
    :alt: plot gexternal xgboost
    :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 132-134

Same example with XGBRegressor
++++++++++++++++++++++++++++++

.. GENERATED FROM PYTHON SOURCE LINES 134-151

.. code-block:: default


    update_registered_converter(
        XGBRegressor, 'XGBoostXGBRegressor',
        calculate_linear_regressor_output_shapes, convert_xgboost)


    data = load_diabetes()
    x = data.data
    y = data.target
    X_train, X_test, y_train, _ = train_test_split(x, y, test_size=0.5)

    pipe = Pipeline([('scaler', StandardScaler()),
                     ('xgb', XGBRegressor(n_estimators=3))])
    pipe.fit(X_train, y_train)

    print("predict", pipe.predict(X_test[:5]))


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    predict [ 39.68634  118.4799   176.94164   50.84716   51.449596]


.. GENERATED FROM PYTHON SOURCE LINES 152-153

ONNX

.. GENERATED FROM PYTHON SOURCE LINES 153-160

.. code-block:: default


    onx = to_onnx(pipe, X_train.astype(numpy.float32))

    sess = rt.InferenceSession(onx.SerializeToString())
    pred_onx = sess.run(None, {"X": X_test[:5].astype(numpy.float32)})
    print("predict", pred_onx[0].ravel())


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    predict [ 39.68634  118.4799   176.94164   50.84716   51.449596]


.. GENERATED FROM PYTHON SOURCE LINES 161-163

Some discrepencies may appear. In that case,
you should read :ref:`l-example-discrepencies-float-double`.

.. GENERATED FROM PYTHON SOURCE LINES 165-171

Same with a Booster
+++++++++++++++++++

A booster cannot be inserted in a pipeline. It requires
a different conversion function because it does not
follow :epkg:`scikit-learn` API.

.. GENERATED FROM PYTHON SOURCE LINES 171-192

.. code-block:: default


    x, y = make_classification(n_classes=2, n_features=5,
                               n_samples=100,
                               random_state=42, n_informative=3)
    X_train, X_test, y_train, _ = train_test_split(x, y, test_size=0.5,
                                                   random_state=42)

    dtrain = DMatrix(X_train, label=y_train)

    param = {'objective': 'multi:softmax', 'num_class': 3}
    bst = train_xgb(param, dtrain, 10)

    initial_type = [('float_input', FloatTensorType([None, X_train.shape[1]]))]
    onx = convert_xgboost_booster(bst, "name", initial_types=initial_type)

    sess = rt.InferenceSession(onx.SerializeToString())
    input_name = sess.get_inputs()[0].name
    label_name = sess.get_outputs()[0].name
    pred_onx = sess.run(
        [label_name], {input_name: X_test.astype(numpy.float32)})[0]
    print(pred_onx)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    [14:18:20] WARNING: ../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softmax' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
    [0 0 1 1 0 1 0 1 0 1 0 0 1 1 1 0 0 1 1 1 1 0 0 1 0 0 0 1 1 1 0 1 1 0 1 1 1
     0 1 1 1 0 0 1 1 0 0 0 1 0]


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  2.907 seconds)


.. _sphx_glr_download_auto_examples_plot_gexternal_xgboost.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: binder-badge

    .. image:: images/binder_badge_logo.svg
      :target: https://mybinder.org/v2/gh/sdpython/onnxcustom/master?urlpath=lab/tree/notebooks/auto_examples/plot_gexternal_xgboost.ipynb
      :alt: Launch binder
      :width: 150 px


  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_gexternal_xgboost.py <plot_gexternal_xgboost.py>`


  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_gexternal_xgboost.ipynb <plot_gexternal_xgboost.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_