.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_tutorial/plot_wext_pyod_forest.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_tutorial_plot_wext_pyod_forest.py: .. _example-pyod-iforest: Converter for pyod.models.iforest.IForest ========================================= .. index:: pyod, iforest This example answers issues `685 `_. It implements a custom converter for model `pyod.models.iforest.IForest `_. This example uses :ref:`l-plot-custom-converter` as a start. .. contents:: :local: Trains a model ++++++++++++++ All imports. It also registered onnx converters for :epgk:`xgboost` and *lightgbm*. .. GENERATED FROM PYTHON SOURCE LINES 27-63 .. code-block:: default import numpy as np import pandas as pd from onnxruntime import InferenceSession from sklearn.preprocessing import MinMaxScaler from skl2onnx.proto import onnx_proto from skl2onnx.common.data_types import ( FloatTensorType, Int64TensorType, guess_numpy_type) from skl2onnx import to_onnx, update_registered_converter, get_model_alias from skl2onnx.algebra.onnx_ops import ( OnnxIdentity, OnnxMul, OnnxLess, OnnxConcat, OnnxCast, OnnxAdd, OnnxClip) from skl2onnx.algebra.onnx_operator import OnnxSubEstimator try: from pyod.models.iforest import IForest except (ValueError, ImportError) as e: print("Unable to import pyod:", e) IForest = None if IForest is not None: data1 = {'First': [500, 500, 400, 100, 200, 300, 100], 'Second': ['a', 'b', 'a', 'b', 'a', 'b', 'c']} df1 = pd.DataFrame(data1, columns=['First', 'Second']) dumdf1 = pd.get_dummies(df1) scaler = MinMaxScaler() scaler.partial_fit(dumdf1) sc_data = scaler.transform(dumdf1) model1 = IForest(n_estimators=10, bootstrap=True, behaviour='new', contamination=0.1, random_state=np.random.RandomState(42), verbose=1, n_jobs=-1).fit(sc_data) feature_names2 = dumdf1.columns initial_type = [('float_input', FloatTensorType([None, len(feature_names2)]))] .. rst-class:: sphx-glr-script-out .. code-block:: none [Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers. [Parallel(n_jobs=8)]: Done 2 out of 8 | elapsed: 0.1s remaining: 0.3s [Parallel(n_jobs=8)]: Done 8 out of 8 | elapsed: 0.1s finished .. GENERATED FROM PYTHON SOURCE LINES 64-65 We check that the conversion fails as expected. .. GENERATED FROM PYTHON SOURCE LINES 65-73 .. code-block:: default if IForest is not None: try: to_onnx(model1, initial_types=initial_type) except Exception as e: print(e) .. rst-class:: sphx-glr-script-out .. code-block:: none Unable to find a shape calculator for type ''. It usually means the pipeline being converted contains a transformer or a predictor with no corresponding converter implemented in sklearn-onnx. If the converted is implemented in another library, you need to register the converted so that it can be used by sklearn-onnx (function update_registered_converter). If the model is not yet covered by sklearn-onnx, you may raise an issue to https://github.com/onnx/sklearn-onnx/issues to get the converter implemented or even contribute to the project. If the model is a custom model, a new converter must be implemented. Examples can be found in the gallery. .. GENERATED FROM PYTHON SOURCE LINES 74-80 Custom converter ++++++++++++++++ First the parser and the shape calculator. The parser defines the number of outputs and their type. The shape calculator defines their dimensions. .. GENERATED FROM PYTHON SOURCE LINES 80-104 .. code-block:: default def pyod_iforest_parser(scope, model, inputs, custom_parsers=None): alias = get_model_alias(type(model)) this_operator = scope.declare_local_operator(alias, model) # inputs this_operator.inputs.append(inputs[0]) # outputs cls_type = inputs[0].type.__class__ val_y1 = scope.declare_local_variable('label', Int64TensorType()) val_y2 = scope.declare_local_variable('probability', cls_type()) this_operator.outputs.append(val_y1) this_operator.outputs.append(val_y2) # end return this_operator.outputs def pyod_iforest_shape_calculator(operator): N = operator.inputs[0].get_first_dimension() operator.outputs[0].type.shape = [N, 1] operator.outputs[1].type.shape = [N, 2] .. GENERATED FROM PYTHON SOURCE LINES 105-106 Then the converter. .. GENERATED FROM PYTHON SOURCE LINES 106-159 .. code-block:: default def pyod_iforest_converter(scope, operator, container): op = operator.raw_operator opv = container.target_opset out = operator.outputs # We retrieve the unique input. X = operator.inputs[0] # In most case, computation happen in floats. # But it might be with double. ONNX is very strict # about types, every constant should have the same # type as the input. dtype = guess_numpy_type(X.type) detector = op.detector_ # Should be IForest from scikit-learn. lab_pred = OnnxSubEstimator(detector, X, op_version=opv) scores = OnnxIdentity(lab_pred[1], op_version=opv) # labels threshold = op.threshold_ above = OnnxLess(scores, np.array([threshold], dtype=dtype), op_version=opv) labels = OnnxCast(above, op_version=opv, to=onnx_proto.TensorProto.INT64, output_names=out[:1]) # probabilities train_scores = op.decision_scores_ scaler = MinMaxScaler().fit(train_scores.reshape(-1, 1)) scores_ = OnnxMul(scores, np.array([-1], dtype=dtype), op_version=opv) print(scaler.min_) print(scaler.scale_) scaled = OnnxMul(scores_, scaler.scale_.astype(dtype), op_version=opv) scaled_centered = OnnxAdd(scaled, scaler.min_.astype(dtype), op_version=opv) clipped = OnnxClip(scaled_centered, np.array([0], dtype=dtype), np.array([1], dtype=dtype), op_version=opv) clipped_ = OnnxAdd( OnnxMul(clipped, np.array([-1], dtype=dtype), op_version=opv), np.array([1], dtype=dtype), op_version=opv) scores_2d = OnnxConcat(clipped_, clipped, axis=1, op_version=opv, output_names=out[1:]) labels.add_to(scope, container) scores_2d.add_to(scope, container) .. GENERATED FROM PYTHON SOURCE LINES 160-161 Finally the registration. .. GENERATED FROM PYTHON SOURCE LINES 161-170 .. code-block:: default if IForest is not None: update_registered_converter( IForest, "PyodIForest", pyod_iforest_shape_calculator, pyod_iforest_converter, parser=pyod_iforest_parser) .. GENERATED FROM PYTHON SOURCE LINES 171-172 And the conversion. .. GENERATED FROM PYTHON SOURCE LINES 172-177 .. code-block:: default if IForest is not None: onx = to_onnx(model1, initial_types=initial_type, target_opset={'': 14, 'ai.onnx.ml': 2}) .. rst-class:: sphx-glr-script-out .. code-block:: none [0.75171798] [13.95064645] .. GENERATED FROM PYTHON SOURCE LINES 178-180 Checking discrepencies ++++++++++++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 180-200 .. code-block:: default if IForest is not None: data = sc_data.astype(np.float32) expected_labels = model1.predict(data) expected_proba = model1.predict_proba(data) sess = InferenceSession(onx.SerializeToString()) res = sess.run(None, {'float_input': data}) onx_labels = res[0] onx_proba = res[1] diff_labels = np.abs(onx_labels.ravel() - expected_labels.ravel()).max() diff_proba = np.abs(onx_proba.ravel() - expected_proba.ravel()).max() print("dicrepencies:", diff_labels, diff_proba) print("ONNX labels", onx_labels) print("ONNX probabilities", onx_proba) .. rst-class:: sphx-glr-script-out .. code-block:: none dicrepencies: 0 8.684300415451318e-07 ONNX labels [[0] [0] [0] [0] [0] [0] [1]] ONNX probabilities [[1. 0. ] [0.809063 0.19093698] [1. 0. ] [0.41380423 0.58619577] [0.61369824 0.38630173] [0.809063 0.19093698] [0. 1. ]] .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 2.549 seconds) .. _sphx_glr_download_auto_tutorial_plot_wext_pyod_forest.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_wext_pyod_forest.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_wext_pyod_forest.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_