{"cells": [{"cell_type": "markdown", "id": "78f74622", "metadata": {}, "source": ["# Use function when converting into ONNX\n", "\n", "Once a a scikit-learn model is converting into ONNX, there is no easy way to retrieve the original scikit-learn model. The following notebook explores an alternative way to convert a model into ONNX by using functions. In this new method, every piece of a pipeline becomes a function."]}, {"cell_type": "code", "execution_count": 1, "id": "29fac993", "metadata": {}, "outputs": [{"data": {"text/html": ["
run previous cell, wait for 2 seconds
\n", ""], "text/plain": [""]}, "execution_count": 2, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "code", "execution_count": 2, "id": "f16158a4", "metadata": {}, "outputs": [], "source": ["%matplotlib inline"]}, {"cell_type": "code", "execution_count": 3, "id": "e41ab68c", "metadata": {}, "outputs": [], "source": ["%load_ext mlprodict"]}, {"cell_type": "markdown", "id": "0e7d5c44", "metadata": {}, "source": ["## A pipeline"]}, {"cell_type": "code", "execution_count": 4, "id": "2298a80e", "metadata": {}, "outputs": [{"data": {"text/html": ["
Pipeline(steps=[('preprocessing', StandardScaler()),\n", "                ('classifier',\n", "                 LogisticRegression(penalty='l1', solver='liblinear'))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
"], "text/plain": ["Pipeline(steps=[('preprocessing', StandardScaler()),\n", " ('classifier',\n", " LogisticRegression(penalty='l1', solver='liblinear'))])"]}, "execution_count": 5, "metadata": {}, "output_type": "execute_result"}], "source": ["from sklearn.pipeline import Pipeline\n", "from sklearn.datasets import load_iris\n", "from sklearn.preprocessing import StandardScaler\n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn import set_config\n", "set_config(display=\"diagram\")\n", "\n", "data = load_iris()\n", "X, y = data.data, data.target\n", "steps = [\n", " (\"preprocessing\", StandardScaler()),\n", " (\"classifier\", LogisticRegression(penalty='l1', solver=\"liblinear\"))]\n", "pipe = Pipeline(steps)\n", "pipe.fit(X, y)"]}, {"cell_type": "markdown", "id": "c63a1d2a", "metadata": {}, "source": ["## Its conversion into ONNX"]}, {"cell_type": "markdown", "id": "d240cac4", "metadata": {}, "source": ["### Without functions"]}, {"cell_type": "code", "execution_count": 5, "id": "0eb53ecd", "metadata": {"scrolled": false}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["opset: domain='' version=14\n", "opset: domain='ai.onnx.ml' version=1\n", "input: name='X' type=dtype('float64') shape=[None, 4]\n", "init: name='Su_Subcst' type=dtype('float64') shape=(4,) -- array([5.84333333, 3.05733333, 3.758 , 1.19933333])\n", "init: name='Di_Divcst' type=dtype('float64') shape=(4,) -- array([0.82530129, 0.43441097, 1.75940407, 0.75969263])\n", "init: name='coef' type=dtype('float64') shape=(12,)\n", "init: name='intercept' type=dtype('float64') shape=(3,) -- array([-1.86506089, -0.89658497, -4.56614529])\n", "init: name='classes' type=dtype('int32') shape=(3,) -- array([0, 1, 2])\n", "init: name='shape_tensor' type=dtype('int64') shape=(1,) -- array([-1], dtype=int64)\n", "init: name='axis' type=dtype('int64') shape=(1,) -- array([1], dtype=int64)\n", "Sub(X, Su_Subcst) -> Su_C0\n", " Div(Su_C0, Di_Divcst) -> variable\n", " MatMul(variable, coef) -> multiplied\n", " Add(multiplied, intercept) -> raw_scores\n", " Sigmoid(raw_scores) -> raw_scoressig\n", " Abs(raw_scoressig) -> norm_abs\n", " ReduceSum(norm_abs, axis, keepdims=1) -> norm\n", " Div(raw_scoressig, norm) -> probabilities\n", " ArgMax(raw_scores, axis=1) -> label1\n", " ArrayFeatureExtractor(classes, label1) -> array_feature_extractor_result\n", " Cast(array_feature_extractor_result, to=11) -> cast2_result\n", " Reshape(cast2_result, shape_tensor) -> reshaped_result\n", " Cast(reshaped_result, to=7) -> label\n", "output: name='label' type=dtype('int64') shape=[None]\n", "output: name='probabilities' type=dtype('float64') shape=[None, 3]\n"]}], "source": ["from mlprodict.plotting.text_plot import onnx_simple_text_plot\n", "from mlprodict.onnx_conv import to_onnx\n", "\n", "onx = to_onnx(pipe, X, options={'zipmap': False})\n", "print(onnx_simple_text_plot(onx))"]}, {"cell_type": "code", "execution_count": 6, "id": "adbaf06d", "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", ""], "text/plain": [""]}, "execution_count": 7, "metadata": {}, "output_type": "execute_result"}], "source": ["%onnxview onx"]}, {"cell_type": "markdown", "id": "4868a3a9", "metadata": {}, "source": ["### With functions"]}, {"cell_type": "code", "execution_count": 7, "id": "9953bddb", "metadata": {"scrolled": false}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["No CUDA runtime is found, using CUDA_HOME='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.5'\n", "opset: domain='' version=15\n", "opset: domain='sklearn' version=1\n", "input: name='X' type=dtype('float64') shape=[None, 4]\n", "main___Pipeline_1734459081968[sklearn](X) -> main_classifier_label, main_classifier_probabilities\n", "output: name='main_classifier_label' type=dtype('int64') shape=[None]\n", "output: name='main_classifier_probabilities' type=dtype('float64') shape=[None, 3]\n", "----- function name=main__preprocessing___StandardScaler_1734202136896 domain=sklearn\n", "----- doc_string: HYPER:{\"StandardScaler\":{\"copy\": true, \"with_mean\": true, \"with_std\": true}}\n", "opset: domain='' version=14\n", "input: 'X'\n", "Constant(value=[5.8433333...) -> Su_Subcst\n", " Sub(X, Su_Subcst) -> Su_C0\n", "Constant(value=[0.8253012...) -> Di_Divcst\n", " Div(Su_C0, Di_Divcst) -> variable\n", "output: name='variable' type=? shape=?\n", "----- function name=main__classifier___LogisticRegression_1734202137184 domain=sklearn\n", "----- doc_string: HYPER:{\"LogisticRegression\":{\"C\": 1.0, \"class_weight\": null, \"dual\": false, \"fit_intercept\": true, \"intercept_scaling\": 1, \"l1_ratio\": null, \"max_iter\": 100, \"multi_class\": \"auto\", \"n_jobs\": null, \"penalty\": \"l1\", \"random_state\": null, \"solver\": \"liblinear\", \"tol\": 0.0001, \"verbose\": 0, \"warm_start\": false}}\n", "opset: domain='' version=13\n", "opset: domain='ai.onnx.ml' version=1\n", "input: 'X0'\n", "Constant(value=[[0.0, 0.0...) -> coef\n", " MatMul(X0, coef) -> multiplied\n", "Constant(value=[[-1.86506...) -> intercept\n", " Add(multiplied, intercept) -> raw_scores\n", " ArgMax(raw_scores, axis=1) -> label1\n", "Constant(value=[0, 1, 2]) -> classes\n", " ArrayFeatureExtractor(classes, label1) -> array_feature_extractor_result\n", " Cast(array_feature_extractor_result, to=11) -> cast2_result\n", "Constant(value=[-1]) -> shape_tensor\n", " Reshape(cast2_result, shape_tensor) -> reshaped_result\n", " Cast(reshaped_result, to=7) -> label\n", "Constant(value=[1]) -> axis\n", "Sigmoid(raw_scores) -> raw_scoressig\n", " Abs(raw_scoressig) -> norm_abs\n", " ReduceSum(norm_abs, axis, keepdims=1) -> norm\n", " Div(raw_scoressig, norm) -> probabilities\n", "output: name='label' type=? shape=?\n", "output: name='probabilities' type=? shape=?\n", "----- function name=main___Pipeline_1734459081968 domain=sklearn\n", "----- doc_string: HYPER:{\"Pipeline\":{\"memory\": null, \"steps\": [[\"preprocessing\", \"{\\\"classname\\\": \\\"StandardScaler\\\", \\\"EXC\\\": \\\"Object of type StandardScaler is not JSON serializable\\\"}\"], [\"classifier\", \"{\\\"classname\\\": \\\"LogisticRegression\\\", \\\"EXC\\\": \\\"Object of type LogisticRegression is not JSON serializable\\\"}\"]], \"verbose\": false}}\n", "opset: domain='' version=15\n", "opset: domain='sklearn' version=1\n", "input: 'X'\n", "main__preprocessing___StandardScaler_1734202136896[sklearn](X) -> preprocessing_variable\n", " main__classifier___LogisticRegression_1734202137184[sklearn](preprocessing_variable) -> classifier_label, classifier_probabilities\n", "output: name='classifier_label' type=? shape=?\n", "output: name='classifier_probabilities' type=? shape=?\n"]}], "source": ["onxf = to_onnx(pipe, X, as_function=True, options={'zipmap': False})\n", "print(onnx_simple_text_plot(onxf))"]}, {"cell_type": "code", "execution_count": 8, "id": "ad103436", "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", ""], "text/plain": [""]}, "execution_count": 9, "metadata": {}, "output_type": "execute_result"}], "source": ["%onnxview onxf"]}, {"cell_type": "markdown", "id": "3b2023f7", "metadata": {}, "source": ["Based on that, it should be possible to rebuild the original scikit-learn pipeline. Hyperparameters are stored in the attribute `doc_string`."]}, {"cell_type": "markdown", "id": "76f005df", "metadata": {}, "source": ["## A more complex one"]}, {"cell_type": "code", "execution_count": 9, "id": "fb333f4f", "metadata": {}, "outputs": [{"data": {"text/html": ["
Pipeline(steps=[('preprocessing',\n", "                 ColumnTransformer(transformers=[('A', StandardScaler(),\n", "                                                  [0, 1]),\n", "                                                 ('B', MinMaxScaler(),\n", "                                                  [2, 3])])),\n", "                ('classifier',\n", "                 LogisticRegression(penalty='l1', solver='liblinear'))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
"], "text/plain": ["Pipeline(steps=[('preprocessing',\n", " ColumnTransformer(transformers=[('A', StandardScaler(),\n", " [0, 1]),\n", " ('B', MinMaxScaler(),\n", " [2, 3])])),\n", " ('classifier',\n", " LogisticRegression(penalty='l1', solver='liblinear'))])"]}, "execution_count": 10, "metadata": {}, "output_type": "execute_result"}], "source": ["from sklearn.compose import ColumnTransformer\n", "from sklearn.preprocessing import MinMaxScaler\n", "\n", "data = load_iris()\n", "X, y = data.data, data.target\n", "steps = [\n", " (\"preprocessing\", ColumnTransformer([\n", " ('A', StandardScaler(), [0, 1]),\n", " ('B', MinMaxScaler(), [2, 3])])),\n", " (\"classifier\", LogisticRegression(penalty='l1', solver=\"liblinear\"))]\n", "pipe = Pipeline(steps)\n", "pipe.fit(X, y)"]}, {"cell_type": "code", "execution_count": 10, "id": "5406593d", "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["opset: domain='' version=15\n", "opset: domain='sklearn' version=1\n", "input: name='X' type=dtype('float64') shape=[None, 4]\n", "main___Pipeline_1734198554880[sklearn](X) -> main_classifier_label, main_classifier_probabilities\n", "output: name='main_classifier_label' type=dtype('int64') shape=[None]\n", "output: name='main_classifier_probabilities' type=dtype('float64') shape=[None, 3]\n", "----- function name=main__preprocessing__B___MinMaxScaler_1734196938256 domain=sklearn\n", "----- doc_string: HYPER:{\"MinMaxScaler\":{\"clip\": false, \"copy\": true, \"feature_range\": [0, 1]}}\n", "opset: domain='' version=14\n", "input: 'X'\n", "Cast(X, to=11) -> Ca_output0\n", "Constant(value=[0.1694915...) -> Mu_Mulcst\n", " Mul(Ca_output0, Mu_Mulcst) -> Mu_C0\n", "Constant(value=[-0.169491...) -> Ad_Addcst\n", " Add(Mu_C0, Ad_Addcst) -> variable\n", "output: name='variable' type=? shape=?\n", "----- function name=main__preprocessing__A___StandardScaler_1734196937584 domain=sklearn\n", "----- doc_string: HYPER:{\"StandardScaler\":{\"copy\": true, \"with_mean\": true, \"with_std\": true}}\n", "opset: domain='' version=14\n", "input: 'X'\n", "Constant(value=[5.8433333...) -> Su_Subcst\n", " Sub(X, Su_Subcst) -> Su_C0\n", "Constant(value=[0.8253012...) -> Di_Divcst\n", " Div(Su_C0, Di_Divcst) -> variable\n", "output: name='variable' type=? shape=?\n", "----- function name=main__preprocessing___ColumnTransformer_1734520793072 domain=sklearn\n", "----- doc_string: HYPER:{\"ColumnTransformer\":{\"n_jobs\": null, \"remainder\": \"drop\", \"sparse_threshold\": 0.3, \"transformer_weights\": null, \"transformers\": [[\"A\", \"{\\\"classname\\\": \\\"StandardScaler\\\", \\\"EXC\\\": \\\"Object of type StandardScaler is not JSON serializable\\\"}\", [0, 1]], [\"B\", \"{\\\"classname\\\": \\\"MinMaxScaler\\\", \\\"EXC\\\": \\\"Object of type MinMaxScaler is not JSON serializable\\\"}\", [2, 3]]], \"verbose\": false, \"verbose_feature_names_out\": true}}\n", "opset: domain='' version=15\n", "opset: domain='sklearn' version=1\n", "input: 'X'\n", "Constant(value=[2]) -> init\n", "Constant(value=[4]) -> init_1\n", "Constant(value=[1]) -> init_2\n", " Slice(X, init, init_1, init_2) -> out_sli_0\n", " main__preprocessing__B___MinMaxScaler_1734196938256[sklearn](out_sli_0) -> B_variable\n", "Constant(value=[0]) -> init_3\n", " Slice(X, init_3, init, init_2) -> out_sli_0_1\n", " main__preprocessing__A___StandardScaler_1734196937584[sklearn](out_sli_0_1) -> A_variable\n", " Concat(A_variable, B_variable, axis=1) -> out_con_0\n", "output: name='out_con_0' type=? shape=?\n", "----- function name=main__classifier___LogisticRegression_1734520717568 domain=sklearn\n", "----- doc_string: HYPER:{\"LogisticRegression\":{\"C\": 1.0, \"class_weight\": null, \"dual\": false, \"fit_intercept\": true, \"intercept_scaling\": 1, \"l1_ratio\": null, \"max_iter\": 100, \"multi_class\": \"auto\", \"n_jobs\": null, \"penalty\": \"l1\", \"random_state\": null, \"solver\": \"liblinear\", \"tol\": 0.0001, \"verbose\": 0, \"warm_start\": false}}\n", "opset: domain='' version=13\n", "opset: domain='ai.onnx.ml' version=1\n", "input: 'X0'\n", "Constant(value=[[-2.74108...) -> coef\n", " MatMul(X0, coef) -> multiplied\n", "Constant(value=[[0.0, -0....) -> intercept\n", " Add(multiplied, intercept) -> raw_scores\n", " ArgMax(raw_scores, axis=1) -> label1\n", "Constant(value=[0, 1, 2]) -> classes\n", " ArrayFeatureExtractor(classes, label1) -> array_feature_extractor_result\n", " Cast(array_feature_extractor_result, to=11) -> cast2_result\n", "Constant(value=[-1]) -> shape_tensor\n", " Reshape(cast2_result, shape_tensor) -> reshaped_result\n", " Cast(reshaped_result, to=7) -> label\n", "Constant(value=[1]) -> axis\n", "Sigmoid(raw_scores) -> raw_scoressig\n", " Abs(raw_scoressig) -> norm_abs\n", " ReduceSum(norm_abs, axis, keepdims=1) -> norm\n", " Div(raw_scoressig, norm) -> probabilities\n", "output: name='label' type=? shape=?\n", "output: name='probabilities' type=? shape=?\n", "----- function name=main___Pipeline_1734198554880 domain=sklearn\n", "----- doc_string: HYPER:{\"Pipeline\":{\"memory\": null, \"steps\": [[\"preprocessing\", \"{\\\"classname\\\": \\\"ColumnTransformer\\\", \\\"EXC\\\": \\\"Object of type ColumnTransformer is not JSON serializable\\\"}\"], [\"classifier\", \"{\\\"classname\\\": \\\"LogisticRegression\\\", \\\"EXC\\\": \\\"Object of type LogisticRegression is not JSON serializable\\\"}\"]], \"verbose\": false}}\n", "opset: domain='' version=15\n", "opset: domain='sklearn' version=1\n", "input: 'X'\n", "main__preprocessing___ColumnTransformer_1734520793072[sklearn](X) -> preprocessing_out_con_0\n", " main__classifier___LogisticRegression_1734520717568[sklearn](preprocessing_out_con_0) -> classifier_label, classifier_probabilities\n", "output: name='classifier_label' type=? shape=?\n", "output: name='classifier_probabilities' type=? shape=?\n"]}], "source": ["onxf = to_onnx(pipe, X, as_function=True, options={'zipmap': False})\n", "print(onnx_simple_text_plot(onxf))"]}, {"cell_type": "code", "execution_count": 11, "id": "699e4d25", "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", ""], "text/plain": [""]}, "execution_count": 12, "metadata": {}, "output_type": "execute_result"}], "source": ["%onnxview onxf"]}, {"cell_type": "code", "execution_count": 12, "id": "507cef55", "metadata": {}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5"}}, "nbformat": 4, "nbformat_minor": 5}