{"cells": [{"cell_type": "markdown", "id": "aeee7e44", "metadata": {}, "source": ["# Convolution and Matrix Multiplication\n", "\n", "The [convolution](https://en.wikipedia.org/wiki/Kernel_(image_processing)) is a well known image transformation used to transform an image. It can be used to blur, to compute the gradient in one direction and it is widely used in deep neural networks. Having a fast implementation is important."]}, {"cell_type": "code", "execution_count": 1, "id": "3bb4f0a5", "metadata": {}, "outputs": [{"data": {"text/html": ["<div id=\"my_id_menu_nb\">run previous cell, wait for 2 seconds</div>\n", "<script>\n", "function repeat_indent_string(n){\n", "    var a = \"\" ;\n", "    for ( ; n > 0 ; --n)\n", "        a += \"    \";\n", "    return a;\n", "}\n", "// look up into all sections and builds an automated menu //\n", "var update_menu_string = function(begin, lfirst, llast, sformat, send, keep_item, begin_format, end_format) {\n", "    var anchors = document.getElementsByClassName(\"section\");\n", "    if (anchors.length == 0) {\n", "        anchors = document.getElementsByClassName(\"text_cell_render rendered_html\");\n", "    }\n", "    var i,t;\n", "    var text_menu = begin;\n", "    var text_memo = \"<pre>\\nlength:\" + anchors.length + \"\\n\";\n", "    var ind = \"\";\n", "    var memo_level = 1;\n", "    var href;\n", "    var tags = [];\n", "    var main_item = 0;\n", "    var format_open = 0;\n", "    for (i = 0; i <= llast; i++)\n", "        tags.push(\"h\" + i);\n", "\n", "    for (i = 0; i < anchors.length; i++) {\n", "        text_memo += \"**\" + anchors[i].id + \"--\\n\";\n", "\n", "        var child = null;\n", "        for(t = 0; t < tags.length; t++) {\n", "            var r = anchors[i].getElementsByTagName(tags[t]);\n", "            if (r.length > 0) {\n", "child = r[0];\n", "break;\n", "            }\n", "        }\n", "        if (child == null) {\n", "            text_memo += \"null\\n\";\n", "            continue;\n", "        }\n", "        if (anchors[i].hasAttribute(\"id\")) {\n", "            // when converted in RST\n", "            href = anchors[i].id;\n", "            text_memo += \"#1-\" + href;\n", "            // passer \u00e0 child suivant (le chercher)\n", "        }\n", "        else if (child.hasAttribute(\"id\")) {\n", "            // in a notebook\n", "            href = child.id;\n", "            text_memo += \"#2-\" + href;\n", "        }\n", "        else {\n", "            text_memo += \"#3-\" + \"*\" + \"\\n\";\n", "            continue;\n", "        }\n", "        var title = child.textContent;\n", "        var level = parseInt(child.tagName.substring(1,2));\n", "\n", "        text_memo += \"--\" + level + \"?\" + lfirst + \"--\" + title + \"\\n\";\n", "\n", "        if ((level < lfirst) || (level > llast)) {\n", "            continue ;\n", "        }\n", "        if (title.endsWith('\u00b6')) {\n", "            title = title.substring(0,title.length-1).replace(\"<\", \"&lt;\")\n", "         .replace(\">\", \"&gt;\").replace(\"&\", \"&amp;\");\n", "        }\n", "        if (title.length == 0) {\n", "            continue;\n", "        }\n", "\n", "        while (level < memo_level) {\n", "            text_menu += end_format + \"</ul>\\n\";\n", "            format_open -= 1;\n", "            memo_level -= 1;\n", "        }\n", "        if (level == lfirst) {\n", "            main_item += 1;\n", "        }\n", "        if (keep_item != -1 && main_item != keep_item + 1) {\n", "            // alert(main_item + \" - \" + level + \" - \" + keep_item);\n", "            continue;\n", "        }\n", "        while (level > memo_level) {\n", "            text_menu += \"<ul>\\n\";\n", "            memo_level += 1;\n", "        }\n", "        text_menu += repeat_indent_string(level-2);\n", "        text_menu += begin_format + sformat.replace(\"__HREF__\", href).replace(\"__TITLE__\", title);\n", "        format_open += 1;\n", "    }\n", "    while (1 < memo_level) {\n", "        text_menu += end_format + \"</ul>\\n\";\n", "        memo_level -= 1;\n", "        format_open -= 1;\n", "    }\n", "    text_menu += send;\n", "    //text_menu += \"\\n\" + text_memo;\n", "\n", "    while (format_open > 0) {\n", "        text_menu += end_format;\n", "        format_open -= 1;\n", "    }\n", "    return text_menu;\n", "};\n", "var update_menu = function() {\n", "    var sbegin = \"\";\n", "    var sformat = '<a href=\"#__HREF__\">__TITLE__</a>';\n", "    var send = \"\";\n", "    var begin_format = '<li>';\n", "    var end_format = '</li>';\n", "    var keep_item = -1;\n", "    var text_menu = update_menu_string(sbegin, 2, 4, sformat, send, keep_item,\n", "       begin_format, end_format);\n", "    var menu = document.getElementById(\"my_id_menu_nb\");\n", "    menu.innerHTML=text_menu;\n", "};\n", "window.setTimeout(update_menu,2000);\n", "            </script>"], "text/plain": ["<IPython.core.display.HTML object>"]}, "execution_count": 2, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "code", "execution_count": 2, "id": "15fdc3ed", "metadata": {}, "outputs": [], "source": ["%matplotlib inline"]}, {"cell_type": "code", "execution_count": 3, "id": "c21bc5ff", "metadata": {}, "outputs": [], "source": ["%load_ext mlprodict"]}, {"cell_type": "markdown", "id": "fa367092", "metadata": {}, "source": ["## numpy\n", "\n", "Image have often 4 dimensions (N, C, H, W) = (batch, channels, height, width). Let's first start with a 2D image."]}, {"cell_type": "code", "execution_count": 4, "id": "73efa320", "metadata": {}, "outputs": [{"data": {"text/plain": ["(5, 7)"]}, "execution_count": 5, "metadata": {}, "output_type": "execute_result"}], "source": ["import numpy\n", "\n", "shape = (5, 7)\n", "N = numpy.prod(shape)\n", "data = numpy.arange(N).astype(numpy.float32).reshape(shape)\n", "# data[:, :] = 0\n", "# data[2, 3] = 1\n", "data.shape"]}, {"cell_type": "markdown", "id": "69ae57cd", "metadata": {}, "source": ["Let's a 2D kernel, the same one."]}, {"cell_type": "code", "execution_count": 5, "id": "26eb8bbe", "metadata": {}, "outputs": [{"data": {"text/plain": ["array([[1., 2., 3.],\n", "       [4., 5., 6.],\n", "       [7., 8., 9.]], dtype=float32)"]}, "execution_count": 6, "metadata": {}, "output_type": "execute_result"}], "source": ["kernel = (numpy.arange(9) + 1).reshape(3, 3).astype(numpy.float32)\n", "kernel"]}, {"cell_type": "markdown", "id": "0997ee66", "metadata": {}, "source": ["### raw convolution\n", "\n", "A raw version of a 2D convolution."]}, {"cell_type": "code", "execution_count": 6, "id": "6c7bf5e1", "metadata": {}, "outputs": [{"data": {"text/plain": ["(5, 7)"]}, "execution_count": 7, "metadata": {}, "output_type": "execute_result"}], "source": ["def raw_convolution(data, kernel):\n", "    rx = (kernel.shape[0] - 1) // 2\n", "    ry = (kernel.shape[1] - 1) // 2\n", "    res = numpy.zeros(data.shape, dtype=data.dtype)\n", "    for i in range(data.shape[0]):\n", "        for j in range(data.shape[1]):\n", "            for x in range(kernel.shape[0]):\n", "                for y in range(kernel.shape[1]):\n", "                    a = i + x - rx\n", "                    b = j + y - ry\n", "                    if a < 0 or b < 0 or a >= data.shape[0] or b >= data.shape[1]:\n", "                        continue\n", "                    res[i, j] += kernel[x, y] * data[a, b]\n", "    return res\n", "\n", "res = raw_convolution(data, kernel)\n", "res.shape"]}, {"cell_type": "code", "execution_count": 7, "id": "8d5d5ea8", "metadata": {}, "outputs": [{"data": {"text/plain": ["array([[ 134.,  211.,  250.,  289.,  328.,  367.,  238.],\n", "       [ 333.,  492.,  537.,  582.,  627.,  672.,  423.],\n", "       [ 564.,  807.,  852.,  897.,  942.,  987.,  612.],\n", "       [ 795., 1122., 1167., 1212., 1257., 1302.,  801.],\n", "       [ 422.,  571.,  592.,  613.,  634.,  655.,  382.]], dtype=float32)"]}, "execution_count": 8, "metadata": {}, "output_type": "execute_result"}], "source": ["res"]}, {"cell_type": "markdown", "id": "49ae0942", "metadata": {}, "source": ["### With pytorch\n", "\n", "*pytorch* is optimized for deep learning and prefers 4D tenors to represent multiple images. We add two empty dimension to the previous example."]}, {"cell_type": "code", "execution_count": 8, "id": "38bbba0d", "metadata": {}, "outputs": [], "source": ["from torch import from_numpy\n", "from torch.nn.functional import conv2d"]}, {"cell_type": "code", "execution_count": 9, "id": "48afd3c7", "metadata": {}, "outputs": [{"data": {"text/plain": ["torch.Size([1, 1, 5, 7])"]}, "execution_count": 10, "metadata": {}, "output_type": "execute_result"}], "source": ["rest = conv2d(from_numpy(data[numpy.newaxis, numpy.newaxis, ...]), \n", "              from_numpy(kernel[numpy.newaxis, numpy.newaxis, ...]),\n", "              padding=(1, 1))\n", "rest.shape"]}, {"cell_type": "code", "execution_count": 10, "id": "1918ef44", "metadata": {}, "outputs": [{"data": {"text/plain": ["tensor([[[[ 134.,  211.,  250.,  289.,  328.,  367.,  238.],\n", "          [ 333.,  492.,  537.,  582.,  627.,  672.,  423.],\n", "          [ 564.,  807.,  852.,  897.,  942.,  987.,  612.],\n", "          [ 795., 1122., 1167., 1212., 1257., 1302.,  801.],\n", "          [ 422.,  571.,  592.,  613.,  634.,  655.,  382.]]]])"]}, "execution_count": 11, "metadata": {}, "output_type": "execute_result"}], "source": ["rest"]}, {"cell_type": "markdown", "id": "a0996bcc", "metadata": {}, "source": ["Everything works."]}, {"cell_type": "code", "execution_count": 11, "id": "8be181a9", "metadata": {}, "outputs": [], "source": ["from numpy.testing import assert_almost_equal\n", "assert_almost_equal(res, rest[0, 0].numpy())"]}, {"cell_type": "markdown", "id": "6a095260", "metadata": {}, "source": ["### using Gemm?\n", "\n", "A fast implementation could reuse whatever exists with a fast implementation such as a matrix multiplication. The goal is to transform the tensor `data` into a new matrix which can be mutiplied with a flatten kernel and finally reshaped into the expected result. pytorch calls this function [Unfold](https://pytorch.org/docs/stable/generated/torch.nn.Unfold.html). This function is also called [im2col](https://caffe.berkeleyvision.org/tutorial/layers/im2col.html)."]}, {"cell_type": "code", "execution_count": 12, "id": "f46791f5", "metadata": {}, "outputs": [{"data": {"text/plain": ["torch.Size([1, 9, 35])"]}, "execution_count": 13, "metadata": {}, "output_type": "execute_result"}], "source": ["from torch.nn import Unfold\n", "unfold = Unfold(kernel_size=(3, 3), padding=(1, 1))(from_numpy(data[numpy.newaxis, numpy.newaxis, ...]))\n", "unfold.shape"]}, {"cell_type": "markdown", "id": "d50aef0a", "metadata": {}, "source": ["We then multiply this matrix with the flattened kernel and reshape it."]}, {"cell_type": "code", "execution_count": 13, "id": "2acce304", "metadata": {}, "outputs": [{"data": {"text/plain": ["(5, 7)"]}, "execution_count": 14, "metadata": {}, "output_type": "execute_result"}], "source": ["impl = kernel.flatten() @ unfold.numpy()\n", "impl = impl.reshape(data.shape)\n", "impl.shape"]}, {"cell_type": "code", "execution_count": 14, "id": "32b8ead6", "metadata": {}, "outputs": [{"data": {"text/plain": ["array([[ 134.,  211.,  250.,  289.,  328.,  367.,  238.],\n", "       [ 333.,  492.,  537.,  582.,  627.,  672.,  423.],\n", "       [ 564.,  807.,  852.,  897.,  942.,  987.,  612.],\n", "       [ 795., 1122., 1167., 1212., 1257., 1302.,  801.],\n", "       [ 422.,  571.,  592.,  613.,  634.,  655.,  382.]], dtype=float32)"]}, "execution_count": 15, "metadata": {}, "output_type": "execute_result"}], "source": ["impl"]}, {"cell_type": "markdown", "id": "fd11c282", "metadata": {}, "source": ["Everything works as expected."]}, {"cell_type": "code", "execution_count": 15, "id": "bb016ae0", "metadata": {}, "outputs": [], "source": ["assert_almost_equal(res, impl)"]}, {"cell_type": "markdown", "id": "2bfd881b", "metadata": {}, "source": ["## What is ConvTranspose?\n", "\n", "Deep neural network are trained with a stochastic gradient descent. The gradient of every layer needs to be computed including the gradient of a convolution transpose. That seems easier with the second expression of a convolution relying on a matrix multiplication and function `im2col`. `im2col` is just a new matrix built from `data` where every value was copied in 9=3x3 locations. The gradient against an input value `data[i,j]` is the sum of 9=3x3 values from the output gradient. If `im2col` plays with indices, the gradient requires to do the same thing in the other way."]}, {"cell_type": "code", "execution_count": 16, "id": "632e0cd8", "metadata": {}, "outputs": [{"data": {"text/plain": ["array([[ 134.,  211.,  250.,  289.,  328.,  367.,  238.],\n", "       [ 333.,  492.,  537.,  582.,  627.,  672.,  423.],\n", "       [ 564.,  807.,  852.,  897.,  942.,  987.,  612.],\n", "       [ 795., 1122., 1167., 1212., 1257., 1302.,  801.],\n", "       [ 422.,  571.,  592.,  613.,  634.,  655.,  382.]], dtype=float32)"]}, "execution_count": 17, "metadata": {}, "output_type": "execute_result"}], "source": ["# impl[:, :] = 0\n", "# impl[2, 3] = 1\n", "impl"]}, {"cell_type": "code", "execution_count": 17, "id": "22fd7b7b", "metadata": {}, "outputs": [{"data": {"text/plain": ["array([[[[ 2672.,  5379.,  6804.,  7659.,  8514.,  8403.,  6254.],\n", "         [ 8117., 15408., 18909., 20790., 22671., 21780., 15539.],\n", "         [14868., 27315., 32400., 34425., 36450., 34191., 23922.],\n", "         [20039., 35544., 41283., 43164., 45045., 41508., 28325.],\n", "         [18608., 32055., 36756., 38151., 39546., 35943., 23966.]]]],\n", "      dtype=float32)"]}, "execution_count": 18, "metadata": {}, "output_type": "execute_result"}], "source": ["from torch.nn.functional import conv_transpose2d\n", "\n", "ct = conv_transpose2d(from_numpy(impl.reshape(data.shape)[numpy.newaxis, numpy.newaxis, ...]),\n", "                      from_numpy(kernel[numpy.newaxis, numpy.newaxis, ...]),\n", "                      padding=(1, 1)).numpy()\n", "ct"]}, {"cell_type": "markdown", "id": "53aab3cf", "metadata": {}, "source": ["And now the version with `col2im` or [Fold](https://pytorch.org/docs/stable/generated/torch.nn.Fold.html#torch.nn.Fold) applied on the result product of the output from `Conv` and the kernel: the output of `Conv` is multiplied by every coefficient of the kernel. Then all these matrices are concatenated to build a matrix of the same shape of `unfold`."]}, {"cell_type": "code", "execution_count": 18, "id": "05269f31", "metadata": {}, "outputs": [{"data": {"text/plain": ["(9, 35)"]}, "execution_count": 19, "metadata": {}, "output_type": "execute_result"}], "source": ["p = kernel.flatten().reshape((-1, 1)) @ impl.flatten().reshape((1, -1))\n", "p.shape"]}, {"cell_type": "code", "execution_count": 19, "id": "685ce888", "metadata": {}, "outputs": [{"data": {"text/plain": ["torch.Size([1, 1, 5, 7])"]}, "execution_count": 20, "metadata": {}, "output_type": "execute_result"}], "source": ["from torch.nn import Fold\n", "\n", "fold = Fold(kernel_size=(3, 3), output_size=(5, 7), padding=(1, 1))(from_numpy(p[numpy.newaxis, ...]))\n", "fold.shape"]}, {"cell_type": "code", "execution_count": 20, "id": "3e609aa2", "metadata": {}, "outputs": [{"data": {"text/plain": ["tensor([[[[ 2672.,  5379.,  6804.,  7659.,  8514.,  8403.,  6254.],\n", "          [ 8117., 15408., 18909., 20790., 22671., 21780., 15539.],\n", "          [14868., 27315., 32400., 34425., 36450., 34191., 23922.],\n", "          [20039., 35544., 41283., 43164., 45045., 41508., 28325.],\n", "          [18608., 32055., 36756., 38151., 39546., 35943., 23966.]]]])"]}, "execution_count": 21, "metadata": {}, "output_type": "execute_result"}], "source": ["fold"]}, {"cell_type": "markdown", "id": "101b50ba", "metadata": {}, "source": ["## onnxruntime-training\n", "\n", "Following lines shows how *onnxruntime* handles the gradient computation. This section still needs work."]}, {"cell_type": "markdown", "id": "bf4c3ada", "metadata": {}, "source": ["### Conv"]}, {"cell_type": "code", "execution_count": 21, "id": "8fb68ccd", "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["No CUDA runtime is found, using CUDA_HOME='C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.5'\n"]}, {"data": {"text/html": ["<div id=\"M6e4b3f5960a7493295700d3e0d745c22-cont\"><div id=\"M6e4b3f5960a7493295700d3e0d745c22\" style=\"width:;height:;\"></div></div>\n", "<script>\n", "\n", "require(['http://www.xavierdupre.fr/js/vizjs/viz.js'], function() { var svgGraph = Viz(\"digraph{\\n  nodesep=0.05;\\n  orientation=portrait;\\n  size=7;\\n  ranksep=0.25;\\n\\n  X [shape=box color=red label=\\\"X\\nfloat(('?',))\\\" fontsize=10];\\n\\n  out_con_0 [shape=box color=green label=\\\"out_con_0\\nfloat(('?',))\\\" fontsize=10];\\n\\n  init [shape=box label=\\\"init\\nfloat32((1, 1, 3, 3))\\n[[[[1. 2. 3.]\\n   [4. 5. 6.]\\n   [7. 8. 9.]]]]\\\" fontsize=10];\\n\\n  _conv [shape=box style=\\\"filled,rounded\\\" color=orange label=\\\"Conv\\n(_conv)\\npads=[1 1 1 1]\\\" fontsize=10];\\n  X -> _conv;\\n  init -> _conv;\\n  _conv -> out_con_0;\\n}\");\n", "document.getElementById('M6e4b3f5960a7493295700d3e0d745c22').innerHTML = svgGraph; });\n", "\n", "</script>"], "text/plain": ["<jyquickhelper.jspy.render_nb_js_dot.RenderJsDot at 0x1ab4521ae90>"]}, "execution_count": 22, "metadata": {}, "output_type": "execute_result"}], "source": ["from mlprodict.npy.xop import loadop\n", "OnnxConv = loadop(('', 'Conv'))\n", "node = OnnxConv('X', kernel[numpy.newaxis, numpy.newaxis, ...], pads=[1, 1, 1, 1])\n", "onx = node.to_onnx(numpy.float32, numpy.float32)\n", "%onnxview onx"]}, {"cell_type": "code", "execution_count": 22, "id": "2d34f521", "metadata": {}, "outputs": [{"data": {"text/plain": ["array([[[[ 134.,  211.,  250.,  289.,  328.,  367.,  238.],\n", "         [ 333.,  492.,  537.,  582.,  627.,  672.,  423.],\n", "         [ 564.,  807.,  852.,  897.,  942.,  987.,  612.],\n", "         [ 795., 1122., 1167., 1212., 1257., 1302.,  801.],\n", "         [ 422.,  571.,  592.,  613.,  634.,  655.,  382.]]]],\n", "      dtype=float32)"]}, "execution_count": 23, "metadata": {}, "output_type": "execute_result"}], "source": ["from mlprodict.onnxrt import OnnxInference\n", "oinf = OnnxInference(onx, runtime='onnxruntime1')\n", "oinf.run({'X': data[numpy.newaxis, numpy.newaxis, ...]})['out_con_0']"]}, {"cell_type": "markdown", "id": "0098eb16", "metadata": {}, "source": ["It is the same."]}, {"cell_type": "code", "execution_count": 23, "id": "66a592f5", "metadata": {"scrolled": false}, "outputs": [], "source": ["from onnxcustom.training.grad_helper import onnx_derivative, DerivativeOptions\n", "grad = onnx_derivative(onx, options=DerivativeOptions.FillGrad | DerivativeOptions.KeepOutputs)"]}, {"cell_type": "code", "execution_count": 24, "id": "2b042ac4", "metadata": {}, "outputs": [{"data": {"text/html": ["<div id=\"M5cce4ce785834f039a973abe1c5ea317-cont\"><div id=\"M5cce4ce785834f039a973abe1c5ea317\" style=\"width:;height:;\"></div></div>\n", "<script>\n", "\n", "require(['http://www.xavierdupre.fr/js/vizjs/viz.js'], function() { var svgGraph = Viz(\"digraph{\\n  nodesep=0.05;\\n  orientation=portrait;\\n  size=7;\\n  ranksep=0.25;\\n\\n  X [shape=box color=red label=\\\"X\\nfloat(('?',))\\\" fontsize=10];\\n  init [shape=box color=red label=\\\"init\\nfloat((1, 1, 3, 3))\\\" fontsize=10];\\n\\n  X_grad [shape=box color=green label=\\\"X_grad\\nfloat(('?',))\\\" fontsize=10];\\n  init_grad [shape=box color=green label=\\\"init_grad\\nfloat((1, 1, 3, 3))\\\" fontsize=10];\\n  out_con_0 [shape=box color=green label=\\\"out_con_0\\nfloat(('?',))\\\" fontsize=10];\\n\\n\\n  _conv [shape=box style=\\\"filled,rounded\\\" color=orange label=\\\"Conv\\n(_conv)\\nauto_pad=b'NOTSET'\\ngroup=1\\npads=[1 1 1 1]\\\" fontsize=10];\\n  X -> _conv;\\n  init -> _conv;\\n  _conv -> out_con_0;\\n\\n  out_con_0_shape [shape=box label=\\\"out_con_0_shape\\\" fontsize=10];\\n  Shape [shape=box style=\\\"filled,rounded\\\" color=orange label=\\\"Shape\\n(Shape)\\\" fontsize=10];\\n  out_con_0 -> Shape;\\n  Shape -> out_con_0_shape;\\n\\n  out_con_0_grad [shape=box label=\\\"out_con_0_grad\\\" fontsize=10];\\n  ConstantOfShape [shape=box style=\\\"filled,rounded\\\" color=orange label=\\\"ConstantOfShape\\n(ConstantOfShape)\\nvalue=[1.]\\\" fontsize=10];\\n  out_con_0_shape -> ConstantOfShape;\\n  ConstantOfShape -> out_con_0_grad;\\n\\n  _conv_Grad_ConvGrad_0 [shape=box style=\\\"filled,rounded\\\" color=orange label=\\\"ConvGrad\\n(_conv_Grad_ConvGrad_0)\\nauto_pad=b'NOTSET'\\ngroup=1\\npads=[1 1 1 1]\\\" fontsize=10];\\n  out_con_0_grad -> _conv_Grad_ConvGrad_0;\\n  X -> _conv_Grad_ConvGrad_0;\\n  init -> _conv_Grad_ConvGrad_0;\\n  _conv_Grad_ConvGrad_0 -> X_grad;\\n  _conv_Grad_ConvGrad_0 -> init_grad;\\n}\");\n", "document.getElementById('M5cce4ce785834f039a973abe1c5ea317').innerHTML = svgGraph; });\n", "\n", "</script>"], "text/plain": ["<jyquickhelper.jspy.render_nb_js_dot.RenderJsDot at 0x1ab45f003d0>"]}, "execution_count": 25, "metadata": {}, "output_type": "execute_result"}], "source": ["%onnxview grad"]}, {"cell_type": "code", "execution_count": 25, "id": "372eb667", "metadata": {}, "outputs": [], "source": ["oinf = OnnxInference(grad, runtime='onnxruntime1')"]}, {"cell_type": "code", "execution_count": 26, "id": "6d416375", "metadata": {}, "outputs": [{"data": {"text/plain": ["{'X_grad': array([[[[12., 21., 21., 21., 21., 21., 16.],\n", "          [27., 45., 45., 45., 45., 45., 33.],\n", "          [27., 45., 45., 45., 45., 45., 33.],\n", "          [27., 45., 45., 45., 45., 45., 33.],\n", "          [24., 39., 39., 39., 39., 39., 28.]]]], dtype=float32),\n", " 'init_grad': array([[[[312., 378., 336.],\n", "          [495., 595., 525.],\n", "          [480., 574., 504.]]]], dtype=float32),\n", " 'out_con_0': array([[[[ 134.,  211.,  250.,  289.,  328.,  367.,  238.],\n", "          [ 333.,  492.,  537.,  582.,  627.,  672.,  423.],\n", "          [ 564.,  807.,  852.,  897.,  942.,  987.,  612.],\n", "          [ 795., 1122., 1167., 1212., 1257., 1302.,  801.],\n", "          [ 422.,  571.,  592.,  613.,  634.,  655.,  382.]]]],\n", "       dtype=float32)}"]}, "execution_count": 27, "metadata": {}, "output_type": "execute_result"}], "source": ["res = oinf.run({'X': data[numpy.newaxis, numpy.newaxis, ...],\n", "                'init': kernel[numpy.newaxis, numpy.newaxis, ...]})\n", "res"]}, {"cell_type": "markdown", "id": "cfc7478f", "metadata": {}, "source": ["### ConvTranspose"]}, {"cell_type": "code", "execution_count": 27, "id": "9f1dedf0", "metadata": {}, "outputs": [{"data": {"text/html": ["<div id=\"M29d322ddfd174b3899c8b52a8e349de3-cont\"><div id=\"M29d322ddfd174b3899c8b52a8e349de3\" style=\"width:;height:;\"></div></div>\n", "<script>\n", "\n", "require(['http://www.xavierdupre.fr/js/vizjs/viz.js'], function() { var svgGraph = Viz(\"digraph{\\n  nodesep=0.05;\\n  orientation=portrait;\\n  size=7;\\n  ranksep=0.25;\\n\\n  X [shape=box color=red label=\\\"X\\nfloat(('?',))\\\" fontsize=10];\\n\\n  out_con_0 [shape=box color=green label=\\\"out_con_0\\nfloat(('?',))\\\" fontsize=10];\\n\\n  init [shape=box label=\\\"init\\nfloat32((1, 1, 3, 3))\\n[[[[1. 2. 3.]\\n   [4. 5. 6.]\\n   [7. 8. 9.]]]]\\\" fontsize=10];\\n\\n  _convtranspose [shape=box style=\\\"filled,rounded\\\" color=orange label=\\\"ConvTranspose\\n(_convtranspose)\\npads=[1 1 1 1]\\\" fontsize=10];\\n  X -> _convtranspose;\\n  init -> _convtranspose;\\n  _convtranspose -> out_con_0;\\n}\");\n", "document.getElementById('M29d322ddfd174b3899c8b52a8e349de3').innerHTML = svgGraph; });\n", "\n", "</script>"], "text/plain": ["<jyquickhelper.jspy.render_nb_js_dot.RenderJsDot at 0x1ab45f00fd0>"]}, "execution_count": 28, "metadata": {}, "output_type": "execute_result"}], "source": ["from mlprodict.npy.xop import loadop\n", "\n", "OnnxConvTranspose = loadop('ConvTranspose')\n", "node = OnnxConvTranspose('X', kernel[numpy.newaxis, numpy.newaxis, ...], pads=[1, 1, 1, 1])\n", "onx = node.to_onnx(numpy.float32, numpy.float32)\n", "%onnxview onx"]}, {"cell_type": "code", "execution_count": 28, "id": "3b069a8e", "metadata": {}, "outputs": [{"data": {"text/plain": ["array([[[[ 2672.,  5379.,  6804.,  7659.,  8514.,  8403.,  6254.],\n", "         [ 8117., 15408., 18909., 20790., 22671., 21780., 15539.],\n", "         [14868., 27315., 32400., 34425., 36450., 34191., 23922.],\n", "         [20039., 35544., 41283., 43164., 45045., 41508., 28325.],\n", "         [18608., 32055., 36756., 38151., 39546., 35943., 23966.]]]],\n", "      dtype=float32)"]}, "execution_count": 29, "metadata": {}, "output_type": "execute_result"}], "source": ["oinf = OnnxInference(onx, runtime='onnxruntime1')\n", "ct = oinf.run({'X': impl[numpy.newaxis, numpy.newaxis, ...]})['out_con_0']\n", "ct"]}, {"cell_type": "markdown", "id": "41509e74", "metadata": {}, "source": ["## im2col and col2im\n", "\n", "Function `im2col` transforms an image so that the convolution of this image can be expressed as a matrix multiplication. It takes the image and the kernel shape."]}, {"cell_type": "code", "execution_count": 29, "id": "d6af2597", "metadata": {}, "outputs": [{"data": {"text/plain": ["array([[0., 0., 1.],\n", "       [0., 1., 2.],\n", "       [1., 2., 3.],\n", "       [2., 3., 4.],\n", "       [3., 4., 0.]], dtype=float32)"]}, "execution_count": 30, "metadata": {}, "output_type": "execute_result"}], "source": ["from mlprodict.onnxrt.ops_cpu.op_conv_helper import im2col\n", "\n", "v = numpy.arange(5).astype(numpy.float32)\n", "w = im2col(v, (3, ))\n", "w"]}, {"cell_type": "code", "execution_count": 30, "id": "bb460e99", "metadata": {}, "outputs": [{"data": {"text/plain": ["array([1., 3., 6., 9., 7.], dtype=float32)"]}, "execution_count": 31, "metadata": {}, "output_type": "execute_result"}], "source": ["k = numpy.array([1, 1, 1], dtype=numpy.float32)\n", "conv = w @ k\n", "conv"]}, {"cell_type": "markdown", "id": "b30a6c2c", "metadata": {}, "source": ["Let's compare with the numpy function."]}, {"cell_type": "code", "execution_count": 31, "id": "4929e54e", "metadata": {}, "outputs": [{"data": {"text/plain": ["array([1., 3., 6., 9., 7.], dtype=float32)"]}, "execution_count": 32, "metadata": {}, "output_type": "execute_result"}], "source": ["numpy.convolve(v, k, mode='same')"]}, {"cell_type": "markdown", "id": "39cdf0b8", "metadata": {}, "source": ["$conv(v, k) = im2col(v, shape(k)) \\; k = w \\; k$ where $w = im2col(v, shape(k))$. \n", "\n", "In deep neural network, the gradient is propagated from the last layer to the first one. At some point, the backpropagation produces the gradient $\\frac{d(E)}{d(conv)}$, the gradient of the error against the outputs of the convolution layer. Then $\\frac{d(E)}{d(v)} = \\frac{d(E)}{d(conv(v, k))}\\frac{d(conv(v, k))}{d(v)}$.\n", "\n", "We need to compute $\\frac{d(conv(v, k))}{d(v)} = \\frac{d(conv(v, k))}{d(w)}\\frac{d(w)}{d(v)}$.\n", "\n", "We can say that $\\frac{d(conv(v, k))}{d(w)} = k$.\n", "\n", "That leaves $\\frac{d(w)}{d(v)} = \\frac{d(im2col(v, shape(k)))}{d(v)}$. And this last term is equal to $im2col(m, shape(k))$ where $m$ is a matrix identical to $v$ except that all not null parameter are replaced by 1. To summarize: $\\frac{d(im2col(v, shape(k)))}{d(v)} = im2col(v \\neq 0, shape(k))$.\n", "\n", "Finally, $\\frac{d(E)}{d(v)} = \\frac{d(E)}{d(conv(v, k))}\\frac{d(conv(v, k))}{d(v)} = \\frac{d(E)}{d(conv(v, k))} \\; k \\; im2col(v \\neq 0, shape(k))$.\n", "\n", "Now, $im2col(v \\neq 0, shape(k))$ is a very simple matrix with only ones or zeros. Is there a way we can avoid doing the matrix multiplication but simply adding terms? That's the purpose of function $col2im$ defined so that:\n", "\n", "$\\frac{d(E)}{d(v)} = \\frac{d(E)}{d(conv(v, k))} \\; k \\; im2col(v \\neq 0, shape(k)) = col2im\\left(\\frac{d(E)}{d(conv(v, k))} \\; k, shape(k) \\right)$\n", "\n"]}, {"cell_type": "markdown", "id": "88f1c481", "metadata": {}, "source": []}, {"cell_type": "code", "execution_count": 32, "id": "fe19fd03", "metadata": {}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.5"}}, "nbformat": 4, "nbformat_minor": 5}