Converters with options#

Some converters have options to change the way a specific operator is converted. The whole list is described at Converters with options.

Option cdist for GaussianProcessRegressor#

Notebooks Pairwise distances with ONNX (pdist) shows how much slower an ONNX implementation of function cdist, from 3 to 10 times slower. One way to optimize the converted model is to create dedicated operators such as the one for function cdist. The first example shows how to convert a GaussianProcessRegressor into standard ONNX (see also CDist).

Now the new model with the operator CDist.

The only change is parameter options set to options={GaussianProcessRegressor: {'optim': 'cdist'}}. It tells the conversion fonction that every every model sklearn.gaussian_process.GaussianProcessRegressor must be converted with the option optim='cdist'. The converter of this model checks that that options and uses custom operator CDist instead of its standard implementation based on operator Scan. Section GaussianProcess shows how much the gain is depending on the number of observations for this example.

Other model supported cdist#

Pairwise distances are also is all nearest neighbours models. That same cdist option is also supported for these models.

Option zipmap for classifiers#

By default, the library sklearn-onnx produces a list of dictionaries {label: prediction} but this data structure takes a significant time to be build. The converted model can stick to matrices by removing operator ZipMap. This is done by using option {'zipmap': False}.

Option raw_scores for classifiers#

By default, the library sklearn-onnx produces probabilities whenever it is possible for a classifier. Raw scores can usually be still obtained by using option {'raw_scores': True}.

Pickability and Pipeline#

The proposed way to specify options is not always pickable. Function id(model) depends on the execution and map an option to one class may be not enough to customize the conversion. However, it is possible to specify an option the same way parameters are referenced in a scikit-learn pipeline with method get_params. Following syntax are supported:

pipe = Pipeline([('pca', PCA()), ('classifier', LogisticRegression())])

options = {'classifier': {'zipmap': False}}

Or

options = {'classifier__zipmap': False}

Options applied to one model, not a pipeline as the converter replaces the pipeline structure by a single onnx graph. Following that rule, option zipmap would not have any impact if applied to a pipeline and to the last step of the pipeline. However, because there is no ambiguity about what the conversion should be, for options zipmap and nocl, the following options would have the same effect:

pipe = Pipeline([('pca', PCA()), ('classifier', LogisticRegression())])

options = {id(pipe.steps[-1][1]): {'zipmap': False}}
options = {id(pipe): {'zipmap': False}}
options = {'classifier': {'zipmap': False}}
options = {'classifier__zipmap': False}