.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_gexternal_xgboost.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_gexternal_xgboost.py: .. _example-xgboost: Convert a pipeline with a XGBoost model ======================================== .. index:: XGBoost :epkg:`sklearn-onnx` only converts :epkg:`scikit-learn` models into :epkg:`ONNX` but many libraries implement :epkg:`scikit-learn` API so that their models can be included in a :epkg:`scikit-learn` pipeline. This example considers a pipeline including a :epkg:`XGBoost` model. :epkg:`sklearn-onnx` can convert the whole pipeline as long as it knows the converter associated to a *XGBClassifier*. Let's see how to do it. .. contents:: :local: Train a XGBoost classifier ++++++++++++++++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 23-72 .. code-block:: default from pyquickhelper.helpgen.graphviz_helper import plot_graphviz from mlprodict.onnxrt import OnnxInference import numpy import onnxruntime as rt from sklearn.datasets import load_iris, load_diabetes, make_classification from sklearn.model_selection import train_test_split from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from xgboost import XGBClassifier, XGBRegressor, DMatrix, train as train_xgb from skl2onnx.common.data_types import FloatTensorType from skl2onnx import convert_sklearn, to_onnx, update_registered_converter from skl2onnx.common.shape_calculator import ( calculate_linear_classifier_output_shapes, calculate_linear_regressor_output_shapes) from onnxmltools.convert.xgboost.operator_converters.XGBoost import ( convert_xgboost) from onnxmltools.convert import convert_xgboost as convert_xgboost_booster data = load_iris() X = data.data[:, :2] y = data.target ind = numpy.arange(X.shape[0]) numpy.random.shuffle(ind) X = X[ind, :].copy() y = y[ind].copy() pipe = Pipeline([('scaler', StandardScaler()), ('xgb', XGBClassifier(n_estimators=3))]) pipe.fit(X, y) # The conversion fails but it is expected. try: convert_sklearn(pipe, 'pipeline_xgboost', [('input', FloatTensorType([None, 2]))], target_opset=12) except Exception as e: print(e) # The error message tells no converter was found # for :epkg:`XGBoost` models. By default, :epkg:`sklearn-onnx` # only handles models from :epkg:`scikit-learn` but it can # be extended to every model following :epkg:`scikit-learn` # API as long as the module knows there exists a converter # for every model used in a pipeline. That's why # we need to register a converter. .. rst-class:: sphx-glr-script-out Out: .. code-block:: none /usr/local/lib/python3.9/site-packages/xgboost/sklearn.py:1146: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1]. warnings.warn(label_encoder_deprecation_msg, UserWarning) [14:18:17] WARNING: ../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. Unable to find a shape calculator for type ''. It usually means the pipeline being converted contains a transformer or a predictor with no corresponding converter implemented in sklearn-onnx. If the converted is implemented in another library, you need to register the converted so that it can be used by sklearn-onnx (function update_registered_converter). If the model is not yet covered by sklearn-onnx, you may raise an issue to https://github.com/onnx/sklearn-onnx/issues to get the converter implemented or even contribute to the project. If the model is a custom model, a new converter must be implemented. Examples can be found in the gallery. .. GENERATED FROM PYTHON SOURCE LINES 73-84 Register the converter for XGBClassifier ++++++++++++++++++++++++++++++++++++++++ The converter is implemented in :epkg:`onnxmltools`: `onnxmltools...XGBoost.py `_. and the shape calculator: `onnxmltools...Classifier.py `_. .. GENERATED FROM PYTHON SOURCE LINES 84-90 .. code-block:: default update_registered_converter( XGBClassifier, 'XGBoostXGBClassifier', calculate_linear_classifier_output_shapes, convert_xgboost, options={'nocl': [True, False], 'zipmap': [True, False, 'columns']}) .. GENERATED FROM PYTHON SOURCE LINES 91-93 Convert again +++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 93-103 .. code-block:: default model_onnx = convert_sklearn( pipe, 'pipeline_xgboost', [('input', FloatTensorType([None, 2]))], target_opset=12) # And save. with open("pipeline_xgboost.onnx", "wb") as f: f.write(model_onnx.SerializeToString()) .. GENERATED FROM PYTHON SOURCE LINES 104-108 Compare the predictions +++++++++++++++++++++++ Predictions with XGBoost. .. GENERATED FROM PYTHON SOURCE LINES 108-112 .. code-block:: default print("predict", pipe.predict(X[:5])) print("predict_proba", pipe.predict_proba(X[:1])) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none predict [0 2 1 2 0] predict_proba [[0.69600695 0.1526681 0.15132491]] .. GENERATED FROM PYTHON SOURCE LINES 113-114 Predictions with onnxruntime. .. GENERATED FROM PYTHON SOURCE LINES 114-120 .. code-block:: default sess = rt.InferenceSession("pipeline_xgboost.onnx") pred_onx = sess.run(None, {"input": X[:5].astype(numpy.float32)}) print("predict", pred_onx[0]) print("predict_proba", pred_onx[1][:1]) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none predict [0 2 1 2 0] predict_proba [{0: 0.6960069537162781, 1: 0.15266810357570648, 2: 0.15132491290569305}] .. GENERATED FROM PYTHON SOURCE LINES 121-123 Final graph +++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 123-131 .. code-block:: default oinf = OnnxInference(model_onnx) ax = plot_graphviz(oinf.to_dot()) ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) .. image:: /auto_examples/images/sphx_glr_plot_gexternal_xgboost_001.png :alt: plot gexternal xgboost :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 132-134 Same example with XGBRegressor ++++++++++++++++++++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 134-151 .. code-block:: default update_registered_converter( XGBRegressor, 'XGBoostXGBRegressor', calculate_linear_regressor_output_shapes, convert_xgboost) data = load_diabetes() x = data.data y = data.target X_train, X_test, y_train, _ = train_test_split(x, y, test_size=0.5) pipe = Pipeline([('scaler', StandardScaler()), ('xgb', XGBRegressor(n_estimators=3))]) pipe.fit(X_train, y_train) print("predict", pipe.predict(X_test[:5])) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none predict [ 39.68634 118.4799 176.94164 50.84716 51.449596] .. GENERATED FROM PYTHON SOURCE LINES 152-153 ONNX .. GENERATED FROM PYTHON SOURCE LINES 153-160 .. code-block:: default onx = to_onnx(pipe, X_train.astype(numpy.float32)) sess = rt.InferenceSession(onx.SerializeToString()) pred_onx = sess.run(None, {"X": X_test[:5].astype(numpy.float32)}) print("predict", pred_onx[0].ravel()) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none predict [ 39.68634 118.4799 176.94164 50.84716 51.449596] .. GENERATED FROM PYTHON SOURCE LINES 161-163 Some discrepencies may appear. In that case, you should read :ref:`l-example-discrepencies-float-double`. .. GENERATED FROM PYTHON SOURCE LINES 165-171 Same with a Booster +++++++++++++++++++ A booster cannot be inserted in a pipeline. It requires a different conversion function because it does not follow :epkg:`scikit-learn` API. .. GENERATED FROM PYTHON SOURCE LINES 171-192 .. code-block:: default x, y = make_classification(n_classes=2, n_features=5, n_samples=100, random_state=42, n_informative=3) X_train, X_test, y_train, _ = train_test_split(x, y, test_size=0.5, random_state=42) dtrain = DMatrix(X_train, label=y_train) param = {'objective': 'multi:softmax', 'num_class': 3} bst = train_xgb(param, dtrain, 10) initial_type = [('float_input', FloatTensorType([None, X_train.shape[1]]))] onx = convert_xgboost_booster(bst, "name", initial_types=initial_type) sess = rt.InferenceSession(onx.SerializeToString()) input_name = sess.get_inputs()[0].name label_name = sess.get_outputs()[0].name pred_onx = sess.run( [label_name], {input_name: X_test.astype(numpy.float32)})[0] print(pred_onx) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none [14:18:20] WARNING: ../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softmax' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior. [0 0 1 1 0 1 0 1 0 1 0 0 1 1 1 0 0 1 1 1 1 0 0 1 0 0 0 1 1 1 0 1 1 0 1 1 1 0 1 1 1 0 0 1 1 0 0 0 1 0] .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 2.907 seconds) .. _sphx_glr_download_auto_examples_plot_gexternal_xgboost.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/sdpython/onnxcustom/master?urlpath=lab/tree/notebooks/auto_examples/plot_gexternal_xgboost.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_gexternal_xgboost.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_gexternal_xgboost.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_