.. _l-onnx-tutorial-optim: Converters with options ======================= Some converters have options to change the way a specific operator is converted. The whole list is described at :epkg:`Converters with options`. .. contents:: :local: Option cdist for GaussianProcessRegressor +++++++++++++++++++++++++++++++++++++++++ Notebooks :ref:`onnxpdistrst` shows how much slower an :epkg:`ONNX` implementation of function :epkg:`cdist`, from 3 to 10 times slower. One way to optimize the converted model is to create dedicated operators such as the one for function :epkg:`cdist`. The first example shows how to convert a :epkg:`GaussianProcessRegressor` into standard :epkg:`ONNX` (see also :class:`CDist `). .. gdot:: :script: DOT-SECTION import numpy from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.gaussian_process import GaussianProcessRegressor from sklearn.gaussian_process.kernels import ExpSineSquared from mlprodict.onnx_conv import to_onnx from mlprodict.onnxrt import OnnxInference iris = load_iris() X, y = iris.data, iris.target X_train, _, y_train, __ = train_test_split(X, y, random_state=11) clr = GaussianProcessRegressor(ExpSineSquared(), alpha=20.) clr.fit(X_train, y_train) model_def = to_onnx(clr, X_train) oinf = OnnxInference(model_def) print("DOT-SECTION", oinf.to_dot()) Now the new model with the operator `CDist`. .. gdot:: :script: DOT-SECTION import numpy from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.gaussian_process import GaussianProcessRegressor from sklearn.gaussian_process.kernels import ExpSineSquared from mlprodict.onnx_conv import to_onnx from mlprodict.onnxrt import OnnxInference iris = load_iris() X, y = iris.data, iris.target X_train, _, y_train, __ = train_test_split(X, y, random_state=11) clr = GaussianProcessRegressor(ExpSineSquared(), alpha=20.) clr.fit(X_train, y_train) model_def = to_onnx(clr, X_train, options={GaussianProcessRegressor: {'optim': 'cdist'}}) oinf = OnnxInference(model_def) print("DOT-SECTION", oinf.to_dot()) The only change is parameter *options* set to ``options={GaussianProcessRegressor: {'optim': 'cdist'}}``. It tells the conversion fonction that every every model :epkg:`sklearn:gaussian_process:GaussianProcessRegressor` must be converted with the option ``optim='cdist'``. The converter of this model checks that that options and uses custom operator `CDist` instead of its standard implementation based on operator `Scan `_. Section :ref:`lpy-GaussianProcess` shows how much the gain is depending on the number of observations for this example. Other model supported cdist +++++++++++++++++++++++++++ Pairwise distances are also is all nearest neighbours models. That same *cdist* option is also supported for these models. Option *zipmap* for classifiers +++++++++++++++++++++++++++++++ By default, the library *sklearn-onnx* produces a list of dictionaries ``{label: prediction}`` but this data structure takes a significant time to be build. The converted model can stick to matrices by removing operator *ZipMap*. This is done by using option ``{'zipmap': False}``. .. gdot:: :script: DOT-SECTION import numpy from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from mlprodict.onnx_conv import to_onnx from mlprodict.onnxrt import OnnxInference iris = load_iris() X, y = iris.data, iris.target X_train, _, y_train, __ = train_test_split(X, y, random_state=11) clr = LogisticRegression() clr.fit(X_train, y_train) model_def = to_onnx(clr, X_train, options={LogisticRegression: {'zipmap': False}}) oinf = OnnxInference(model_def) print("DOT-SECTION", oinf.to_dot()) Option *raw_scores* for classifiers +++++++++++++++++++++++++++++++++++ By default, the library *sklearn-onnx* produces probabilities whenever it is possible for a classifier. Raw scores can usually be still obtained by using option ``{'raw_scores': True}``. .. gdot:: :script: DOT-SECTION import numpy from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from mlprodict.onnx_conv import to_onnx from mlprodict.onnxrt import OnnxInference iris = load_iris() X, y = iris.data, iris.target X_train, _, y_train, __ = train_test_split(X, y, random_state=11) clr = LogisticRegression() clr.fit(X_train, y_train) model_def = to_onnx(clr, X_train, options={LogisticRegression: { 'zipmap': False, 'raw_scores': True}}) oinf = OnnxInference(model_def) print("DOT-SECTION", oinf.to_dot()) Pickability and Pipeline ++++++++++++++++++++++++ The proposed way to specify options is not always pickable. Function ``id(model)`` depends on the execution and map an option to one class may be not enough to customize the conversion. However, it is possible to specify an option the same way parameters are referenced in a *scikit-learn* pipeline with method `get_params `_. Following syntax are supported: :: pipe = Pipeline([('pca', PCA()), ('classifier', LogisticRegression())]) options = {'classifier': {'zipmap': False}} Or :: options = {'classifier__zipmap': False} Options applied to one model, not a pipeline as the converter replaces the pipeline structure by a single onnx graph. Following that rule, option *zipmap* would not have any impact if applied to a pipeline and to the last step of the pipeline. However, because there is no ambiguity about what the conversion should be, for options *zipmap* and *nocl*, the following options would have the same effect: :: pipe = Pipeline([('pca', PCA()), ('classifier', LogisticRegression())]) options = {id(pipe.steps[-1][1]): {'zipmap': False}} options = {id(pipe): {'zipmap': False}} options = {'classifier': {'zipmap': False}} options = {'classifier__zipmap': False}