.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "gyexamples/plot_op_einsum.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_gyexamples_plot_op_einsum.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_gyexamples_plot_op_einsum.py:


.. _l-einsum:

Compares implementations of Einsum
==================================

This example compares different equations for function :epkg:`numpy:einsum`.
It compares *numpy* implementation to a custom implementation,
:epkg:`onnxruntime` implementation and :epkg:`opt-einsum` optimisation.
If available, :epkg:`tensorflow` and :epkg:`pytorch` are included as well.
The custom implementation does not do any transpose.
It uses parallelisation and SIMD optimization when the summation
happens on the last axis of both matrices. It only implements
matrix multiplication. We also measure the improvment made with
function :func:`einsum <mlprodict.testing.einsum.einsum_fct.einsum>`.

.. contents::
    :local:

Available optimisation
++++++++++++++++++++++

The code shows which optimisation is used for the custom
implementation, *AVX* or *SSE* and the number of available processors,
equal to the default number of used threads to parallelize.

.. GENERATED FROM PYTHON SOURCE LINES 27-41

.. code-block:: default

    import numpy
    import pandas
    import matplotlib.pyplot as plt
    from onnxruntime import InferenceSession
    from skl2onnx.common.data_types import FloatTensorType
    from skl2onnx.algebra.onnx_ops import OnnxEinsum
    from cpyquickhelper.numbers import measure_time
    from tqdm import tqdm
    from opt_einsum import contract
    from mlprodict.testing.experimental_c_impl.experimental_c import (
        custom_einsum_float, code_optimisation)
    from mlprodict.testing.einsum.einsum_fct import _einsum
    print(code_optimisation())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    AVX-omp=8


.. GENERATED FROM PYTHON SOURCE LINES 42-44

Einsum: common code
+++++++++++++++++++

.. GENERATED FROM PYTHON SOURCE LINES 44-226

.. code-block:: default


    try:
        from tensorflow import einsum as tf_einsum, convert_to_tensor
    except ImportError:
        tf_einsum = None
    try:
        from torch import einsum as torch_einsum, from_numpy
    except ImportError:
        torch_einsum = None


    def build_ort_einsum(equation, op_version=14):  # opset=13, 14, ...
        node = OnnxEinsum('x', 'y', equation=equation,
                          op_version=op_version,
                          output_names=['z'])
        onx = node.to_onnx(inputs=[('x', FloatTensorType()),
                                   ('y', FloatTensorType())],
                           target_opset=op_version)
        sess = InferenceSession(onx.SerializeToString())
        return lambda x, y: sess.run(None, {'x': x, 'y': y})


    def build_ort_decomposed(equation, op_version=14):  # opset=13, 14, ...
        cache = _einsum(equation, numpy.float32, opset=op_version,
                        optimize=True, verbose=True, runtime="python")
        if not hasattr(cache, 'onnx_'):
            cache.build()
        sess = InferenceSession(cache.onnx_.SerializeToString())
        return lambda x, y: sess.run(None, {'X0': x, 'X1': y})


    def loop_einsum_eq(fct, equation, xs, ys):
        for x, y in zip(xs, ys):
            fct(equation, x, y)


    def loop_einsum_eq_th(fct, equation, xs, ys):
        for x, y in zip(xs, ys):
            fct(equation, x, y, nthread=-1)


    def loop_einsum(fct, xs, ys):
        for x, y in zip(xs, ys):
            fct(x, y)


    def custom_einsum_float_tr(eq, x, y):
        if eq == "bshn,bthn->bnts":
            x = x.transpose((0, 1, 3, 2))
            y = y.transpose((0, 1, 3, 2))
            return custom_einsum_float("bsnh,btnh->bnts", x, y, nthread=-1)
        if eq == "bhsn,bhtn->bnts":
            x = x.transpose((0, 2, 3, 1))
            y = y.transpose((0, 2, 3, 1))
            return custom_einsum_float("bsnh,btnh->bnts", x, y, nthread=-1)
        return custom_einsum_float(eq, x, y, nthread=-1)


    def benchmark_equation(equation):
        # equations
        ort_einsum = build_ort_einsum(equation)
        ort_einsum_decomposed = build_ort_decomposed(equation)
        res = []
        for dim in tqdm([8, 16, 32, 64, 100, 128, 200,
                         256, 500, 512]):
            xs = [numpy.random.rand(2, dim, 12, 64).astype(numpy.float32)
                  for _ in range(5)]
            ys = [numpy.random.rand(2, dim, 12, 64).astype(numpy.float32)
                  for _ in range(5)]

            # numpy
            ctx = dict(equation=equation, xs=xs, ys=ys, einsum=numpy.einsum,
                       loop_einsum=loop_einsum, loop_einsum_eq=loop_einsum_eq,
                       loop_einsum_eq_th=loop_einsum_eq_th)
            obs = measure_time(
                "loop_einsum_eq(einsum, equation, xs, ys)",
                div_by_number=True, context=ctx, repeat=5, number=1)
            obs['dim'] = dim
            obs['fct'] = 'numpy.einsum'
            res.append(obs)

            # opt-einsum
            ctx['einsum'] = contract
            obs = measure_time(
                "loop_einsum_eq(einsum, equation, xs, ys)",
                div_by_number=True, context=ctx, repeat=5, number=1)
            obs['dim'] = dim
            obs['fct'] = 'opt-einsum'
            res.append(obs)

            # onnxruntime
            ctx['einsum'] = ort_einsum
            obs = measure_time(
                "loop_einsum(einsum, xs, ys)",
                div_by_number=True, context=ctx, repeat=5, number=1)
            obs['dim'] = dim
            obs['fct'] = 'ort_einsum'
            res.append(obs)

            # onnxruntime decomposed
            ctx['einsum'] = ort_einsum_decomposed
            obs = measure_time(
                "loop_einsum(einsum, xs, ys)",
                div_by_number=True, context=ctx, repeat=5, number=1)
            obs['dim'] = dim
            obs['fct'] = 'ort_dec'
            res.append(obs)

            # custom implementation
            ctx['einsum'] = custom_einsum_float
            obs = measure_time(
                "loop_einsum_eq_th(einsum, equation, xs, ys)",
                div_by_number=True, context=ctx, repeat=5, number=1)
            obs['dim'] = dim
            obs['fct'] = 'c_einsum'
            res.append(obs)

            # transpose + custom implementation
            ctx['einsum'] = custom_einsum_float_tr
            obs = measure_time(
                "loop_einsum_eq(einsum, equation, xs, ys)",
                div_by_number=True, context=ctx, repeat=5, number=1)
            obs['dim'] = dim
            obs['fct'] = 'c_einsum_tr'
            res.append(obs)

            if tf_einsum is not None:
                # tensorflow
                ctx['einsum'] = tf_einsum
                ctx['xs'] = [convert_to_tensor(x) for x in xs]
                ctx['ys'] = [convert_to_tensor(y) for y in ys]
                obs = measure_time(
                    "loop_einsum_eq(einsum, equation, xs, ys)",
                    div_by_number=True, context=ctx, repeat=5, number=1)
                obs['dim'] = dim
                obs['fct'] = 'tf_einsum'
                res.append(obs)

            if torch_einsum is not None:
                # torch
                ctx['einsum'] = torch_einsum
                ctx['xs'] = [from_numpy(x) for x in xs]
                ctx['ys'] = [from_numpy(y) for y in ys]
                obs = measure_time(
                    "loop_einsum_eq(einsum, equation, xs, ys)",
                    div_by_number=True, context=ctx, repeat=5, number=1)
                obs['dim'] = dim
                obs['fct'] = 'torch_einsum'
                res.append(obs)

        # Dataframes
        df = pandas.DataFrame(res)
        piv = df.pivot('dim', 'fct', 'average')

        rs = piv.copy()
        rs['c_einsum'] = rs['numpy.einsum'] / rs['c_einsum']
        rs['ort_einsum'] = rs['numpy.einsum'] / rs['ort_einsum']
        rs['ort_dec'] = rs['numpy.einsum'] / rs['ort_dec']
        rs['opt-einsum'] = rs['numpy.einsum'] / rs['opt-einsum']
        if 'c_einsum_tr' in rs.columns:
            rs['c_einsum_tr'] = rs['numpy.einsum'] / rs['c_einsum_tr']
        if 'tf_einsum' in rs.columns:
            rs['tf_einsum'] = rs['numpy.einsum'] / rs['tf_einsum']
        if 'torch_einsum' in rs.columns:
            rs['torch_einsum'] = rs['numpy.einsum'] / rs['torch_einsum']
        rs['numpy.einsum'] = 1.

        # Graphs.
        fig, ax = plt.subplots(1, 2, figsize=(14, 5))
        piv.plot(logx=True, logy=True, ax=ax[0],
                 title=f"Einsum benchmark\n{equation} -- (2, N, 12, 64) lower better")
        ax[0].legend(prop={"size": 9})
        rs.plot(logx=True, logy=True, ax=ax[1],
                title="Einsum Speedup, baseline=numpy\n%s -- (2, N, 12, 64)"
                      " higher better" % equation)
        ax[1].plot([min(rs.index), max(rs.index)], [0.5, 0.5], 'g--')
        ax[1].plot([min(rs.index), max(rs.index)], [2., 2.], 'g--')
        ax[1].legend(prop={"size": 9})

        return df, rs, ax


.. GENERATED FROM PYTHON SOURCE LINES 227-240

First equation: bsnh,btnh->bnts
+++++++++++++++++++++++++++++++

The decomposition of this equation without einsum function gives
the following.

 .. gdot::
      :script:

      from mlprodict.testing.einsum import decompose_einsum_equation
      dec = decompose_einsum_equation(
          'bsnh,btnh->bnts', strategy='numpy', clean=True)
      print(dec.to_dot())

.. GENERATED FROM PYTHON SOURCE LINES 240-247

.. code-block:: default


    dfs = []
    equation = "bsnh,btnh->bnts"
    df, piv, ax = benchmark_equation(equation)
    df.pivot("fct", "dim", "average")
    dfs.append(df)


.. image-sg:: /gyexamples/images/sphx_glr_plot_op_einsum_001.png
   :alt: Einsum benchmark bsnh,btnh->bnts -- (2, N, 12, 64) lower better, Einsum Speedup, baseline=numpy bsnh,btnh->bnts -- (2, N, 12, 64) higher better
   :srcset: /gyexamples/images/sphx_glr_plot_op_einsum_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


      0%|          | 0/121 [00:00<?, ?it/s]
    0.027 rtbest='bsnh,btnh->bnts':   0%|          | 0/121 [00:00<?, ?it/s]
    0.027 rtbest='bsnh,btnh->bnts':   1%|          | 1/121 [00:00<00:15,  7.73it/s]
    0.027 rtbest='bsnh,btnh->bnts':   3%|3         | 4/121 [00:00<00:07, 15.83it/s]
    0.027 rtbest='bsnh,btnh->bnts':   6%|5         | 7/121 [00:00<00:06, 18.20it/s]
    0.027 rtbest='bsnh,btnh->bnts':   7%|7         | 9/121 [00:00<00:05, 18.77it/s]
    0.027 rtbest='bsnh,btnh->bnts':  10%|9         | 12/121 [00:00<00:05, 19.65it/s]
    0.027 rtbest='bsnh,btnh->bnts':  12%|#2        | 15/121 [00:00<00:05, 20.14it/s]
    0.027 rtbest='bsnh,btnh->bnts':  15%|#4        | 18/121 [00:00<00:05, 20.46it/s]
    0.027 rtbest='bhst,bnst->bsnh':  15%|#4        | 18/121 [00:00<00:05, 20.46it/s]
    0.027 rtbest='bhst,bnst->bsnh':  17%|#7        | 21/121 [00:01<00:04, 20.26it/s]
    0.027 rtbest='bhst,bnst->bsnh':  20%|#9        | 24/121 [00:01<00:04, 20.48it/s]
    0.027 rtbest='bhst,bnst->bsnh':  22%|##2       | 27/121 [00:01<00:04, 20.66it/s]
    0.027 rtbest='bhst,bnst->bsnh':  25%|##4       | 30/121 [00:01<00:04, 20.76it/s]
    0.027 rtbest='nshb,nthb->nhts':  25%|##4       | 30/121 [00:01<00:04, 20.76it/s]
    0.027 rtbest='nthb,nshb->nhst':  25%|##4       | 30/121 [00:01<00:04, 20.76it/s]
    0.027 rtbest='nthb,nshb->nhst':  27%|##7       | 33/121 [00:01<00:04, 20.81it/s]
    0.027 rtbest='tnhb,tshb->thsn':  27%|##7       | 33/121 [00:01<00:04, 20.81it/s]
    0.027 rtbest='tnhb,tshb->thsn':  30%|##9       | 36/121 [00:01<00:04, 20.54it/s]
    0.027 rtbest='tnhb,tshb->thsn':  32%|###2      | 39/121 [00:01<00:03, 20.65it/s]
    0.027 rtbest='thnb,tsnb->tnsh':  32%|###2      | 39/121 [00:02<00:03, 20.65it/s]
    0.027 rtbest='thnb,tsnb->tnsh':  35%|###4      | 42/121 [00:02<00:03, 20.75it/s]
    0.027 rtbest='thnb,tsnb->tnsh':  37%|###7      | 45/121 [00:02<00:03, 20.87it/s]
    0.027 rtbest='thnb,tsnb->tnsh':  40%|###9      | 48/121 [00:02<00:03, 20.61it/s]
    0.027 rtbest='thnb,tsnb->tnsh':  42%|####2     | 51/121 [00:02<00:03, 20.76it/s]
    0.027 rtbest='thnb,tsnb->tnsh':  45%|####4     | 54/121 [00:02<00:03, 20.90it/s]
    0.027 rtbest='thnb,tsnb->tnsh':  47%|####7     | 57/121 [00:02<00:03, 20.98it/s]
    0.027 rtbest='snbh,stbh->sbtn':  47%|####7     | 57/121 [00:02<00:03, 20.98it/s]
    0.027 rtbest='snbh,stbh->sbtn':  50%|####9     | 60/121 [00:02<00:02, 21.02it/s]
    0.027 rtbest='snbh,stbh->sbtn':  52%|#####2    | 63/121 [00:03<00:02, 20.70it/s]
    0.027 rtbest='snbh,stbh->sbtn':  55%|#####4    | 66/121 [00:03<00:02, 20.81it/s]
    0.027 rtbest='snbh,stbh->sbtn':  57%|#####7    | 69/121 [00:03<00:02, 20.90it/s]
    0.027 rtbest='snbh,stbh->sbtn':  60%|#####9    | 72/121 [00:03<00:02, 20.99it/s]
    0.027 rtbest='snbh,stbh->sbtn':  62%|######1   | 75/121 [00:03<00:02, 20.70it/s]
    0.027 rtbest='snbh,stbh->sbtn':  64%|######4   | 78/121 [00:03<00:02, 20.77it/s]
    0.027 rtbest='snbh,stbh->sbtn':  67%|######6   | 81/121 [00:03<00:01, 20.81it/s]
    0.027 rtbest='snbh,stbh->sbtn':  69%|######9   | 84/121 [00:04<00:01, 20.87it/s]
    0.027 rtbest='snbh,stbh->sbtn':  72%|#######1  | 87/121 [00:04<00:01, 20.99it/s]
    0.027 rtbest='snbh,stbh->sbtn':  74%|#######4  | 90/121 [00:04<00:01, 20.67it/s]
    0.027 rtbest='snbh,stbh->sbtn':  77%|#######6  | 93/121 [00:04<00:01, 20.75it/s]
    0.027 rtbest='snbh,stbh->sbtn':  79%|#######9  | 96/121 [00:04<00:01, 20.82it/s]
    0.027 rtbest='snbh,stbh->sbtn':  82%|########1 | 99/121 [00:04<00:01, 20.88it/s]
    0.027 rtbest='snbh,stbh->sbtn':  84%|########4 | 102/121 [00:04<00:00, 20.91it/s]
    0.027 rtbest='snbh,stbh->sbtn':  87%|########6 | 105/121 [00:05<00:00, 20.58it/s]
    0.027 rtbest='snbh,stbh->sbtn':  89%|########9 | 108/121 [00:05<00:00, 20.68it/s]
    0.027 rtbest='snbh,stbh->sbtn':  92%|#########1| 111/121 [00:05<00:00, 20.80it/s]
    0.027 rtbest='snbh,stbh->sbtn':  94%|#########4| 114/121 [00:05<00:00, 20.90it/s]
    0.027 rtbest='snbh,stbh->sbtn':  97%|#########6| 117/121 [00:05<00:00, 20.61it/s]
    0.027 rtbest='snbh,stbh->sbtn':  99%|#########9| 120/121 [00:05<00:00, 20.70it/s]
    0.027 rtbest='snbh,stbh->sbtn': 100%|##########| 121/121 [00:05<00:00, 20.50it/s]

      0%|          | 0/10 [00:00<?, ?it/s]
     10%|#         | 1/10 [00:00<00:07,  1.16it/s]
     20%|##        | 2/10 [00:01<00:05,  1.45it/s]
     30%|###       | 3/10 [00:02<00:05,  1.17it/s]
     40%|####      | 4/10 [00:03<00:05,  1.07it/s]
     50%|#####     | 5/10 [00:05<00:07,  1.46s/it]
     60%|######    | 6/10 [00:09<00:08,  2.10s/it]
     70%|#######   | 7/10 [00:15<00:10,  3.58s/it]
     80%|########  | 8/10 [00:26<00:11,  5.70s/it]
     90%|######### | 9/10 [01:04<00:16, 16.03s/it]
    100%|##########| 10/10 [01:46<00:00, 23.80s/it]
    100%|##########| 10/10 [01:46<00:00, 10.61s/it]
    somewhere/workspace/mlprodict/mlprodict_UT_39_std/_doc/examples/plot_op_einsum.py:196: FutureWarning: In a future version of pandas all arguments of DataFrame.pivot will be keyword-only.
      piv = df.pivot('dim', 'fct', 'average')
    somewhere/workspace/mlprodict/mlprodict_UT_39_std/_doc/examples/plot_op_einsum.py:244: FutureWarning: In a future version of pandas all arguments of DataFrame.pivot will be keyword-only.
      df.pivot("fct", "dim", "average")


.. GENERATED FROM PYTHON SOURCE LINES 248-264

Second equation: bshn,bthn->bnts
++++++++++++++++++++++++++++++++

The summation does not happen on the last axis but
on the previous one.
Is it worth transposing before doing the summation...
The decomposition of this equation without einsum function gives
the following.

 .. gdot::
      :script:

      from mlprodict.testing.einsum import decompose_einsum_equation
      dec = decompose_einsum_equation(
          'bshn,bthn->bnts', strategy='numpy', clean=True)
      print(dec.to_dot())

.. GENERATED FROM PYTHON SOURCE LINES 264-270

.. code-block:: default


    equation = "bshn,bthn->bnts"
    df, piv, ax = benchmark_equation(equation)
    df.pivot("fct", "dim", "average")
    dfs.append(df)


.. image-sg:: /gyexamples/images/sphx_glr_plot_op_einsum_002.png
   :alt: Einsum benchmark bshn,bthn->bnts -- (2, N, 12, 64) lower better, Einsum Speedup, baseline=numpy bshn,bthn->bnts -- (2, N, 12, 64) higher better
   :srcset: /gyexamples/images/sphx_glr_plot_op_einsum_002.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


      0%|          | 0/121 [00:00<?, ?it/s]
    0.026 rtbest='bshn,bthn->bnts':   0%|          | 0/121 [00:00<?, ?it/s]
    0.025 rtbest='bshn,bthn->bnts':   0%|          | 0/121 [00:00<?, ?it/s]
    0.025 rtbest='bshn,bthn->bnts':   2%|1         | 2/121 [00:00<00:06, 19.08it/s]
    0.025 rtbest='bshn,bthn->bnts':   4%|4         | 5/121 [00:00<00:05, 20.92it/s]
    0.025 rtbest='bshn,bthn->bnts':   7%|6         | 8/121 [00:00<00:05, 21.05it/s]
    0.025 rtbest='bshn,bthn->bnts':   9%|9         | 11/121 [00:00<00:05, 21.43it/s]
    0.025 rtbest='bshn,bthn->bnts':  12%|#1        | 14/121 [00:00<00:04, 21.68it/s]
    0.025 rtbest='bshn,bthn->bnts':  14%|#4        | 17/121 [00:00<00:04, 21.78it/s]
    0.025 rtbest='bshn,bthn->bnts':  17%|#6        | 20/121 [00:00<00:04, 21.56it/s]
    0.025 rtbest='bshn,bthn->bnts':  19%|#9        | 23/121 [00:01<00:04, 21.61it/s]
    0.025 rtbest='bshn,bthn->bnts':  21%|##1       | 26/121 [00:01<00:04, 21.68it/s]
    0.025 rtbest='bshn,bthn->bnts':  24%|##3       | 29/121 [00:01<00:04, 21.75it/s]
    0.025 rtbest='bshn,bthn->bnts':  26%|##6       | 32/121 [00:01<00:04, 21.70it/s]
    0.025 rtbest='bshn,bthn->bnts':  29%|##8       | 35/121 [00:01<00:04, 21.31it/s]
    0.025 rtbest='bshn,bthn->bnts':  31%|###1      | 38/121 [00:01<00:03, 21.30it/s]
    0.025 rtbest='bshn,bthn->bnts':  34%|###3      | 41/121 [00:01<00:03, 21.32it/s]
    0.025 rtbest='bshn,bthn->bnts':  36%|###6      | 44/121 [00:02<00:03, 21.41it/s]
    0.025 rtbest='bshn,bthn->bnts':  39%|###8      | 47/121 [00:02<00:03, 21.45it/s]
    0.025 rtbest='bshn,bthn->bnts':  41%|####1     | 50/121 [00:02<00:03, 21.13it/s]
    0.025 rtbest='bshn,bthn->bnts':  44%|####3     | 53/121 [00:02<00:03, 21.17it/s]
    0.025 rtbest='bshn,bthn->bnts':  46%|####6     | 56/121 [00:02<00:03, 21.22it/s]
    0.025 rtbest='bshn,bthn->bnts':  49%|####8     | 59/121 [00:02<00:02, 21.20it/s]
    0.025 rtbest='bshn,bthn->bnts':  51%|#####1    | 62/121 [00:02<00:02, 20.98it/s]
    0.025 rtbest='bshn,bthn->bnts':  54%|#####3    | 65/121 [00:03<00:02, 21.01it/s]
    0.025 rtbest='bshn,bthn->bnts':  56%|#####6    | 68/121 [00:03<00:02, 21.04it/s]
    0.025 rtbest='bshn,bthn->bnts':  59%|#####8    | 71/121 [00:03<00:02, 21.08it/s]
    0.025 rtbest='bshn,bthn->bnts':  61%|######1   | 74/121 [00:03<00:02, 21.21it/s]
    0.025 rtbest='bshn,bthn->bnts':  64%|######3   | 77/121 [00:03<00:02, 21.08it/s]
    0.025 rtbest='bshn,bthn->bnts':  66%|######6   | 80/121 [00:03<00:01, 21.23it/s]
    0.025 rtbest='bshn,bthn->bnts':  69%|######8   | 83/121 [00:03<00:01, 21.24it/s]
    0.025 rtbest='bshn,bthn->bnts':  71%|#######1  | 86/121 [00:04<00:01, 21.27it/s]
    0.025 rtbest='bshn,bthn->bnts':  74%|#######3  | 89/121 [00:04<00:01, 21.25it/s]
    0.025 rtbest='bshn,bthn->bnts':  76%|#######6  | 92/121 [00:04<00:01, 21.01it/s]
    0.025 rtbest='bshn,bthn->bnts':  79%|#######8  | 95/121 [00:04<00:01, 21.13it/s]
    0.025 rtbest='bshn,bthn->bnts':  81%|########  | 98/121 [00:04<00:01, 21.18it/s]
    0.025 rtbest='bshn,bthn->bnts':  83%|########3 | 101/121 [00:04<00:00, 21.33it/s]
    0.025 rtbest='bshn,bthn->bnts':  86%|########5 | 104/121 [00:04<00:00, 21.49it/s]
    0.025 rtbest='bshn,bthn->bnts':  88%|########8 | 107/121 [00:05<00:00, 21.13it/s]
    0.025 rtbest='bshn,bthn->bnts':  91%|######### | 110/121 [00:05<00:00, 21.17it/s]
    0.025 rtbest='bshn,bthn->bnts':  93%|#########3| 113/121 [00:05<00:00, 21.15it/s]
    0.025 rtbest='bshn,bthn->bnts':  96%|#########5| 116/121 [00:05<00:00, 21.20it/s]
    0.025 rtbest='bshn,bthn->bnts':  98%|#########8| 119/121 [00:05<00:00, 21.32it/s]
    0.025 rtbest='bshn,bthn->bnts': 100%|##########| 121/121 [00:05<00:00, 21.25it/s]

      0%|          | 0/10 [00:00<?, ?it/s]
     10%|#         | 1/10 [00:00<00:04,  1.86it/s]
     20%|##        | 2/10 [00:01<00:04,  1.79it/s]
     30%|###       | 3/10 [00:02<00:05,  1.19it/s]
     40%|####      | 4/10 [00:03<00:06,  1.16s/it]
     50%|#####     | 5/10 [00:07<00:09,  1.88s/it]
     60%|######    | 6/10 [00:11<00:11,  2.78s/it]
     70%|#######   | 7/10 [00:21<00:15,  5.23s/it]
     80%|########  | 8/10 [00:39<00:18,  9.31s/it]
     90%|######### | 9/10 [02:26<00:39, 39.76s/it]
    100%|##########| 10/10 [04:24<00:00, 63.74s/it]
    100%|##########| 10/10 [04:24<00:00, 26.41s/it]
    somewhere/workspace/mlprodict/mlprodict_UT_39_std/_doc/examples/plot_op_einsum.py:196: FutureWarning: In a future version of pandas all arguments of DataFrame.pivot will be keyword-only.
      piv = df.pivot('dim', 'fct', 'average')
    somewhere/workspace/mlprodict/mlprodict_UT_39_std/_doc/examples/plot_op_einsum.py:267: FutureWarning: In a future version of pandas all arguments of DataFrame.pivot will be keyword-only.
      df.pivot("fct", "dim", "average")


.. GENERATED FROM PYTHON SOURCE LINES 271-286

Third equation: bhsn,bhtn->bnts
+++++++++++++++++++++++++++++++

The summation does not happen on the last axis but
on the second one. It is worth transposing before multiplying.
The decomposition of this equation without einsum function gives
the following.

 .. gdot::
      :script:

      from mlprodict.testing.einsum import decompose_einsum_equation
      dec = decompose_einsum_equation(
          'bhsn,bhtn->bnts', strategy='numpy', clean=True)
      print(dec.to_dot())

.. GENERATED FROM PYTHON SOURCE LINES 286-292

.. code-block:: default


    equation = "bhsn,bhtn->bnts"
    df, piv, ax = benchmark_equation(equation)
    df.pivot("fct", "dim", "average")
    dfs.append(df)


.. image-sg:: /gyexamples/images/sphx_glr_plot_op_einsum_003.png
   :alt: Einsum benchmark bhsn,bhtn->bnts -- (2, N, 12, 64) lower better, Einsum Speedup, baseline=numpy bhsn,bhtn->bnts -- (2, N, 12, 64) higher better
   :srcset: /gyexamples/images/sphx_glr_plot_op_einsum_003.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


      0%|          | 0/121 [00:00<?, ?it/s]
    0.027 rtbest='bhsn,bhtn->bnts':   0%|          | 0/121 [00:00<?, ?it/s]
    0.027 rtbest='bhsn,bhtn->bnts':   0%|          | 0/121 [00:00<?, ?it/s]
    0.027 rtbest='bhsn,bhtn->bnts':   2%|1         | 2/121 [00:00<00:06, 19.60it/s]
    0.027 rtbest='bhns,bhts->bstn':   2%|1         | 2/121 [00:00<00:06, 19.60it/s]
    0.027 rtbest='bhns,bhts->bstn':   4%|4         | 5/121 [00:00<00:05, 20.75it/s]
    0.027 rtbest='bhst,bhnt->btns':   4%|4         | 5/121 [00:00<00:05, 20.75it/s]
    0.027 rtbest='bhst,bhnt->btns':   7%|6         | 8/121 [00:00<00:05, 21.04it/s]
    0.027 rtbest='bsnh,bsth->bhtn':   7%|6         | 8/121 [00:00<00:05, 21.04it/s]
    0.027 rtbest='bsnh,bsth->bhtn':   9%|9         | 11/121 [00:00<00:05, 21.20it/s]
    0.027 rtbest='bnhs,bnts->bsth':   9%|9         | 11/121 [00:00<00:05, 21.20it/s]
    0.027 rtbest='bnhs,bnts->bsth':  12%|#1        | 14/121 [00:00<00:05, 21.27it/s]
    0.027 rtbest='bnhs,bnts->bsth':  14%|#4        | 17/121 [00:00<00:04, 21.01it/s]
    0.027 rtbest='bnhs,bnts->bsth':  17%|#6        | 20/121 [00:00<00:04, 21.10it/s]
    0.027 rtbest='bnhs,bnts->bsth':  19%|#9        | 23/121 [00:01<00:04, 21.18it/s]
    0.027 rtbest='bnhs,bnts->bsth':  21%|##1       | 26/121 [00:01<00:04, 21.24it/s]
    0.027 rtbest='bnhs,bnts->bsth':  24%|##3       | 29/121 [00:01<00:04, 21.31it/s]
    0.027 rtbest='bnhs,bnts->bsth':  26%|##6       | 32/121 [00:01<00:04, 20.96it/s]
    0.027 rtbest='bnhs,bnts->bsth':  29%|##8       | 35/121 [00:01<00:04, 20.92it/s]
    0.027 rtbest='bnhs,bnts->bsth':  31%|###1      | 38/121 [00:01<00:03, 20.92it/s]
    0.027 rtbest='bnhs,bnts->bsth':  34%|###3      | 41/121 [00:01<00:03, 20.96it/s]
    0.027 rtbest='bnhs,bnts->bsth':  36%|###6      | 44/121 [00:02<00:03, 20.77it/s]
    0.027 rtbest='bnhs,bnts->bsth':  39%|###8      | 47/121 [00:02<00:03, 20.78it/s]
    0.027 rtbest='bnhs,bnts->bsth':  41%|####1     | 50/121 [00:02<00:03, 20.81it/s]
    0.027 rtbest='bnhs,bnts->bsth':  44%|####3     | 53/121 [00:02<00:03, 20.85it/s]
    0.027 rtbest='bnhs,bnts->bsth':  46%|####6     | 56/121 [00:02<00:03, 20.87it/s]
    0.027 rtbest='bnhs,bnts->bsth':  49%|####8     | 59/121 [00:02<00:03, 20.60it/s]
    0.027 rtbest='bnhs,bnts->bsth':  51%|#####1    | 62/121 [00:02<00:02, 20.63it/s]
    0.027 rtbest='bnhs,bnts->bsth':  54%|#####3    | 65/121 [00:03<00:02, 20.68it/s]
    0.027 rtbest='bnhs,bnts->bsth':  56%|#####6    | 68/121 [00:03<00:02, 20.70it/s]
    0.027 rtbest='bnhs,bnts->bsth':  59%|#####8    | 71/121 [00:03<00:02, 20.76it/s]
    0.027 rtbest='bnhs,bnts->bsth':  61%|######1   | 74/121 [00:03<00:02, 20.54it/s]
    0.027 rtbest='bnhs,bnts->bsth':  64%|######3   | 77/121 [00:03<00:02, 20.73it/s]
    0.027 rtbest='bnhs,bnts->bsth':  66%|######6   | 80/121 [00:03<00:01, 20.87it/s]
    0.027 rtbest='bnhs,bnts->bsth':  69%|######8   | 83/121 [00:03<00:01, 20.89it/s]
    0.027 rtbest='bnhs,bnts->bsth':  71%|#######1  | 86/121 [00:04<00:01, 20.92it/s]
    0.027 rtbest='bnhs,bnts->bsth':  74%|#######3  | 89/121 [00:04<00:01, 20.60it/s]
    0.027 rtbest='bnhs,bnts->bsth':  76%|#######6  | 92/121 [00:04<00:01, 20.67it/s]
    0.027 rtbest='bnhs,bnts->bsth':  79%|#######8  | 95/121 [00:04<00:01, 20.78it/s]
    0.027 rtbest='bnhs,bnts->bsth':  81%|########  | 98/121 [00:04<00:01, 20.82it/s]
    0.027 rtbest='bnhs,bnts->bsth':  83%|########3 | 101/121 [00:04<00:00, 20.98it/s]
    0.027 rtbest='bnhs,bnts->bsth':  86%|########5 | 104/121 [00:04<00:00, 20.78it/s]
    0.027 rtbest='bnhs,bnts->bsth':  88%|########8 | 107/121 [00:05<00:00, 20.77it/s]
    0.027 rtbest='bnhs,bnts->bsth':  91%|######### | 110/121 [00:05<00:00, 20.77it/s]
    0.027 rtbest='bnhs,bnts->bsth':  93%|#########3| 113/121 [00:05<00:00, 20.76it/s]
    0.027 rtbest='bnhs,bnts->bsth':  96%|#########5| 116/121 [00:05<00:00, 20.55it/s]
    0.027 rtbest='bnhs,bnts->bsth':  98%|#########8| 119/121 [00:05<00:00, 20.67it/s]
    0.027 rtbest='bnhs,bnts->bsth': 100%|##########| 121/121 [00:05<00:00, 20.82it/s]

      0%|          | 0/10 [00:00<?, ?it/s]
     10%|#         | 1/10 [00:00<00:05,  1.73it/s]
     20%|##        | 2/10 [00:01<00:05,  1.56it/s]
     30%|###       | 3/10 [00:02<00:05,  1.34it/s]
     40%|####      | 4/10 [00:03<00:05,  1.17it/s]
     50%|#####     | 5/10 [00:04<00:05,  1.13s/it]
     60%|######    | 6/10 [00:06<00:05,  1.46s/it]
     70%|#######   | 7/10 [00:09<00:05,  1.90s/it]
     80%|########  | 8/10 [00:13<00:04,  2.43s/it]
     90%|######### | 9/10 [00:24<00:05,  5.33s/it]
    100%|##########| 10/10 [00:36<00:00,  7.40s/it]
    100%|##########| 10/10 [00:36<00:00,  3.70s/it]
    somewhere/workspace/mlprodict/mlprodict_UT_39_std/_doc/examples/plot_op_einsum.py:196: FutureWarning: In a future version of pandas all arguments of DataFrame.pivot will be keyword-only.
      piv = df.pivot('dim', 'fct', 'average')
    somewhere/workspace/mlprodict/mlprodict_UT_39_std/_doc/examples/plot_op_einsum.py:289: FutureWarning: In a future version of pandas all arguments of DataFrame.pivot will be keyword-only.
      df.pivot("fct", "dim", "average")


.. GENERATED FROM PYTHON SOURCE LINES 293-299

Conclusion
++++++++++

pytorch seems quite efficient on these examples.
The custom implementation was a way to investigate
the implementation of einsum and find some ways to optimize it.

.. GENERATED FROM PYTHON SOURCE LINES 299-307

.. code-block:: default


    merged = pandas.concat(dfs)
    name = "einsum"
    merged.to_csv(f"plot_{name}.csv", index=False)
    merged.to_excel(f"plot_{name}.xlsx", index=False)
    plt.savefig(f"plot_{name}.png")

    plt.show()


.. image-sg:: /gyexamples/images/sphx_glr_plot_op_einsum_004.png
   :alt: plot op einsum
   :srcset: /gyexamples/images/sphx_glr_plot_op_einsum_004.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 7 minutes  12.055 seconds)


.. _sphx_glr_download_gyexamples_plot_op_einsum.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_op_einsum.py <plot_op_einsum.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_op_einsum.ipynb <plot_op_einsum.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_