.. blogpost:: :title: Numpy API for ONNX and scikit-learn (part II) :keywords: ONNX, API, numpy, scikit-learn :date: 2021-05-05 :categories: API :lid: blog-onnx-api-part2 This follows blog post :ref:`Numpy API for ONNX and scikit-learn (part I) `. It demonstrated how to insert a custom function in a pipeline and still be able to convert that pipeline into ONNX. This blog post shows how to implement a custom transformer. This time, we need to implement method not a function but the method `transform` of a custom transformer. The design is the same and relies on a decorator before the class declaration. In the following example, a method `onnx_transform` implements the method transform with the API mentioned in the first part: :ref:`f-numpyonnximpl`. The decorator `onnxsklearn_class` detects that the decorated class is a transform. It then assumes that method `onnx_transform` contains the ONNX implementation of method `transform`. The decorator adds an implementation for method `transform`. It behaves like the custom function described in part I. Once called, this method will detects the input type, generates the ONNX graph if not available and executes it with a runtimme. That explains why the first call is much slower. .. runpython:: :showcode: :process: import numpy from pandas import DataFrame from sklearn.base import TransformerMixin, BaseEstimator from sklearn.decomposition import PCA from sklearn.model_selection import train_test_split from sklearn.datasets import make_classification from mlprodict.npy import onnxsklearn_class from mlprodict.onnx_conv import to_onnx from mlprodict.plotting.text_plot import onnx_simple_text_plot import mlprodict.npy.numpy_onnx_impl as nxnp import mlprodict.npy.numpy_onnx_impl_skl as nxnpskl X, y = make_classification(200, n_classes=2, n_features=2, n_informative=2, n_redundant=0, n_clusters_per_class=2, hypercube=False) X_train, X_test, y_train, y_test = train_test_split(X, y) @onnxsklearn_class("onnx_transform", op_version=14) # opset=13, 14, ... class DecorrelateTransformerOnnx(TransformerMixin, BaseEstimator): def __init__(self, alpha=0.): BaseEstimator.__init__(self) TransformerMixin.__init__(self) self.alpha = alpha def fit(self, X, y=None, sample_weights=None): self.pca_ = PCA(X.shape[1]) # pylint: disable=W0201 self.pca_.fit(X) return self def onnx_transform(self, X): if X.dtype is None: raise AssertionError("X.dtype cannot be None.") mean = self.pca_.mean_.astype(X.dtype) cmp = self.pca_.components_.T.astype(X.dtype) return (X - mean) @ cmp model = DecorrelateTransformerOnnx() model.fit(X_train) print(model.transform(X_test[:5])) onx = to_onnx(model, X_test[:5], target_opset=14) # opset=13, 14, ... print() print(onnx_simple_text_plot(onx)) print() print(onx) The tutorial :ref:`l-numpy-api-for-onnx` extends this example to regressors or classifiers. It also mentions a couple of frequent errors that may appear along the way.