.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_pipeline_xgboost.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_pipeline_xgboost.py: .. _example-xgboost: Convert a pipeline with a XGBoost model ======================================== .. index:: XGBoost *sklearn-onnx* only converts *scikit-learn* models into *ONNX* but many libraries implement *scikit-learn* API so that their models can be included in a *scikit-learn* pipeline. This example considers a pipeline including a *XGBoost* model. *sklearn-onnx* can convert the whole pipeline as long as it knows the converter associated to a *XGBClassifier*. Let's see how to do it. .. contents:: :local: Train a XGBoost classifier ++++++++++++++++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 25-75 .. code-block:: default import os import numpy import matplotlib.pyplot as plt import onnx from onnx.tools.net_drawer import GetPydotGraph, GetOpNodeProducer import onnxruntime as rt import sklearn from sklearn.datasets import load_iris from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler import xgboost from xgboost import XGBClassifier import skl2onnx from skl2onnx.common.data_types import FloatTensorType from skl2onnx import convert_sklearn, update_registered_converter from skl2onnx.common.shape_calculator import calculate_linear_classifier_output_shapes # noqa import onnxmltools from onnxmltools.convert.xgboost.operator_converters.XGBoost import convert_xgboost # noqa import onnxmltools.convert.common.data_types data = load_iris() X = data.data[:, :2] y = data.target ind = numpy.arange(X.shape[0]) numpy.random.shuffle(ind) X = X[ind, :].copy() y = y[ind].copy() pipe = Pipeline([('scaler', StandardScaler()), ('lgbm', XGBClassifier(n_estimators=3))]) pipe.fit(X, y) # The conversion fails but it is expected. try: convert_sklearn(pipe, 'pipeline_xgboost', [('input', FloatTensorType([None, 2]))], target_opset={'': 12, 'ai.onnx.ml': 2}) except Exception as e: print(e) # The error message tells no converter was found # for XGBoost models. By default, *sklearn-onnx* # only handles models from *scikit-learn* but it can # be extended to every model following *scikit-learn* # API as long as the module knows there exists a converter # for every model used in a pipeline. That's why # we need to register a converter. .. rst-class:: sphx-glr-script-out .. code-block:: none Unable to find a shape calculator for type ''. It usually means the pipeline being converted contains a transformer or a predictor with no corresponding converter implemented in sklearn-onnx. If the converted is implemented in another library, you need to register the converted so that it can be used by sklearn-onnx (function update_registered_converter). If the model is not yet covered by sklearn-onnx, you may raise an issue to https://github.com/onnx/sklearn-onnx/issues to get the converter implemented or even contribute to the project. If the model is a custom model, a new converter must be implemented. Examples can be found in the gallery. .. GENERATED FROM PYTHON SOURCE LINES 76-87 Register the converter for XGBClassifier ++++++++++++++++++++++++++++++++++++++++ The converter is implemented in *onnxmltools*: `onnxmltools...XGBoost.py `_. and the shape calculator: `onnxmltools...Classifier.py `_. .. GENERATED FROM PYTHON SOURCE LINES 89-90 Then we import the converter and shape calculator. .. GENERATED FROM PYTHON SOURCE LINES 92-93 Let's register the new converter. .. GENERATED FROM PYTHON SOURCE LINES 93-98 .. code-block:: default update_registered_converter( XGBClassifier, 'XGBoostXGBClassifier', calculate_linear_classifier_output_shapes, convert_xgboost, options={'nocl': [True, False], 'zipmap': [True, False, 'columns']}) .. GENERATED FROM PYTHON SOURCE LINES 99-101 Convert again +++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 101-111 .. code-block:: default model_onnx = convert_sklearn( pipe, 'pipeline_xgboost', [('input', FloatTensorType([None, 2]))], target_opset={'': 12, 'ai.onnx.ml': 2}) # And save. with open("pipeline_xgboost.onnx", "wb") as f: f.write(model_onnx.SerializeToString()) .. GENERATED FROM PYTHON SOURCE LINES 112-116 Compare the predictions +++++++++++++++++++++++ Predictions with XGBoost. .. GENERATED FROM PYTHON SOURCE LINES 116-120 .. code-block:: default print("predict", pipe.predict(X[:5])) print("predict_proba", pipe.predict_proba(X[:1])) .. rst-class:: sphx-glr-script-out .. code-block:: none predict [1 2 0 1 2] predict_proba [[0.1893399 0.46317056 0.3474895 ]] .. GENERATED FROM PYTHON SOURCE LINES 121-122 Predictions with onnxruntime. .. GENERATED FROM PYTHON SOURCE LINES 122-128 .. code-block:: default sess = rt.InferenceSession("pipeline_xgboost.onnx") pred_onx = sess.run(None, {"input": X[:5].astype(numpy.float32)}) print("predict", pred_onx[0]) print("predict_proba", pred_onx[1][:1]) .. rst-class:: sphx-glr-script-out .. code-block:: none predict [1 2 0 1 2] predict_proba [{0: 0.18933990597724915, 1: 0.4631705582141876, 2: 0.34748950600624084}] .. GENERATED FROM PYTHON SOURCE LINES 129-131 Display the ONNX graph ++++++++++++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 131-146 .. code-block:: default pydot_graph = GetPydotGraph( model_onnx.graph, name=model_onnx.graph.name, rankdir="TB", node_producer=GetOpNodeProducer( "docstring", color="yellow", fillcolor="yellow", style="filled")) pydot_graph.write_dot("pipeline.dot") os.system('dot -O -Gdpi=300 -Tpng pipeline.dot') image = plt.imread("pipeline.dot.png") fig, ax = plt.subplots(figsize=(40, 20)) ax.imshow(image) ax.axis('off') .. image-sg:: /auto_examples/images/sphx_glr_plot_pipeline_xgboost_001.png :alt: plot pipeline xgboost :srcset: /auto_examples/images/sphx_glr_plot_pipeline_xgboost_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none (-0.5, 1947.5, 2558.5, -0.5) .. GENERATED FROM PYTHON SOURCE LINES 147-148 **Versions used for this example** .. GENERATED FROM PYTHON SOURCE LINES 148-156 .. code-block:: default print("numpy:", numpy.__version__) print("scikit-learn:", sklearn.__version__) print("onnx: ", onnx.__version__) print("onnxruntime: ", rt.__version__) print("skl2onnx: ", skl2onnx.__version__) print("onnxmltools: ", onnxmltools.__version__) print("xgboost: ", xgboost.__version__) .. rst-class:: sphx-glr-script-out .. code-block:: none numpy: 1.23.5 scikit-learn: 1.2.2 onnx: 1.13.1 onnxruntime: 1.14.1 skl2onnx: 1.14.0 onnxmltools: 1.11.1 xgboost: 1.6.2 .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 7.668 seconds) .. _sphx_glr_download_auto_examples_plot_pipeline_xgboost.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_pipeline_xgboost.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_pipeline_xgboost.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_