module testing.einsum.einsum_fct#

Inheritance diagram of mlprodict.testing.einsum.einsum_fct

Short summary#

module mlprodict.testing.einsum.einsum_fct

Main functions decomposing einsum computation into more simple functions.

source on GitHub

Classes#

class

truncated documentation

CachedEinsum

Stores all the necessary information to cache the preprocessing of a an einsum equation.

Functions#

function

truncated documentation

_einsum

einsum

Proposes a new implementation of numpy.einsum. It does not allow expresion using and expects a right …

enumerate_cached_einsum

Enumerates all cached einsum function.

optimize_decompose_einsum_equation

Proposes a new implementation of numpy.einsum. It does not allow expresion using and expects a right …

Static Methods#

staticmethod

truncated documentation

build_einsum

Creates an instance of CachedEinsum.

Methods#

method

truncated documentation

__call__

Calls the runtime self.runtime_.

__init__

__repr__

usual

_build_optimize

_build_optimize_ml

build

Preprocesses the equation builds whatever is necessary to compute the result of the einsum equation.

build_onnx_einsum

Builds an ONNX graph with a single einsum operator.

build_runtime

Builds the runtime associated to the equation self.equation_.

default_inputs

Returns default inputs (reshaped numpy.arange + 0.7i).

Documentation#

Main functions decomposing einsum computation into more simple functions.

source on GitHub

class mlprodict.testing.einsum.einsum_fct.CachedEinsum(equation, runtime='batch_dot', opset=None, optimize=False, dtype=<class 'numpy.float64'>, decompose=True, strategy=None, verbose=None, key=None)#

Bases: object

Stores all the necessary information to cache the preprocessing of a an einsum equation.

Parameters:
  • equation – numpy equation

  • runtime – see einsum

  • opset – ONNX opset

  • optimize – finds the best letter permutation

  • dtype – dtype

  • decompose – to decompose Einsum operator or to keep it as is

  • key – key used to cache this class

  • strategy – optimization strategy

  • verbose – displays progress information

The class creates the following attributes:

  • equation_ corresponding to the best equivalent equation

  • graph_: the corresponding graph returned by function

    :func:`decompose_einsum_equation <mlprodict.testing.einsum.einsum_impl.decompose_einsum_equation> `

  • onnx_: if a conversion to onnx is used, stores the onnx graph

  • runtime_: a function used by __call__, calls the runtime

source on GitHub

__call__(*inputs)#

Calls the runtime self.runtime_.

source on GitHub

__init__(equation, runtime='batch_dot', opset=None, optimize=False, dtype=<class 'numpy.float64'>, decompose=True, strategy=None, verbose=None, key=None)#
__repr__()#

usual

_build_optimize()#
_build_optimize_ml()#
build()#

Preprocesses the equation builds whatever is necessary to compute the result of the einsum equation.

source on GitHub

static build_einsum(equation, runtime, opset, optimize, dtype, decompose=True, strategy=None, verbose=None, key=None)#

Creates an instance of CachedEinsum.

source on GitHub

build_onnx_einsum(input_names)#

Builds an ONNX graph with a single einsum operator.

source on GitHub

build_runtime()#

Builds the runtime associated to the equation self.equation_.

source on GitHub

default_inputs(N=None)#

Returns default inputs (reshaped numpy.arange + 0.7i).

Parameters:

N – dimension (all dimension have the same size)

If N is None, N is given a size depending on the number of letters to avoid spending too much time on optimization.

source on GitHub

mlprodict.testing.einsum.einsum_fct._einsum(equation, dtype, optimize=False, runtime='batch_dot', cache=True, opset=None, decompose=True, strategy=None, verbose=None)#
mlprodict.testing.einsum.einsum_fct.einsum(equation, *inputs, optimize=False, runtime='batch_dot', cache=True, opset=None, decompose=True, strategy=None, verbose=None)#

Proposes a new implementation of numpy.einsum. It does not allow expresion using and expects a right member.

Parameters:
  • equation – einsum equation

  • inputs – inputs

  • optimize – permutes all letters to find the best permutation

  • runtime – runtime used to compute the results once the computation graph is produced (see below)

  • cache – if True, the function stores the preprocessing done for a specific equation, the second call with the same equation is much faster

  • opset – ONNX opset to use for some runtimes

  • decompose – by default, the function decomposes the equation into more simple operators but it can keep the original ONNX einsum operator.

  • strategy – optimisation strategy (see below)

  • verbose – display progress if optimize is True

Returns:

einsum result

The available runtimes are:

  • batch_dot: the runtime is apply_einsum_sequence,

  • python: one ONNX graph executed with a python runtime,

  • onnxruntime1: one ONNX graph executed with onnxruntime.

The optimisation strategy can be:

  • None: the same runtime is used to find the best permutation of letters

  • ‘ml’: a machine learned model is used to predict the

    best permutation of letters, this model comes from notebook Infer operator computation cost.

The function works in two steps:

  • first step analyses the equation to produce a computation graph, this graph can also be converted into ONNX,

  • second step runs the graph whatever the graph is.

Further details are available in the documentation of function optimize_decompose_einsum_equation. The function works the same way as numpy.einsum:

<<<

import numpy
from mlprodict.testing.einsum import einsum

equation = "abc,cd->abd"

m1 = numpy.random.randn(2, 2, 2)
m2 = numpy.random.randn(2, 2)

np = numpy.einsum(equation, m1, m2)
print('numpy.einsum')
print(np)

print('mlprodict.testing.einsum')
mp = einsum(equation, m1, m2)
print(mp)

>>>

    numpy.einsum
    [[[ 0.089 -0.631]
      [ 0.377  0.048]]
    
     [[ 0.311 -0.185]
      [-1.063  1.014]]]
    mlprodict.testing.einsum
    [[[ 0.089 -0.631]
      [ 0.377  0.048]]
    
     [[ 0.311 -0.185]
      [-1.063  1.014]]]

In some case, the einsum implementation can be optimized by looping on possible permutation:

<<<

import timeit
import numpy
from mlprodict.testing.einsum import einsum
from mlprodict.testing.einsum.einsum_fct import enumerate_cached_einsum

equation = "cab,cd->ad"

m1 = numpy.random.randn(20, 20, 20)
m2 = numpy.random.randn(20, 20)

print('numpy.einsum',
      timeit.timeit('numpy.einsum(equation, m1, m2)',
                    number=200,
                    globals=globals()))

einsum(equation, m1, m2)
print('einsum',
      timeit.timeit('einsum(equation, m1, m2)',
                    number=200,
                    globals=globals()))

einsum(equation, m1, m2, runtime='python')
print('einsum-python',
      timeit.timeit('einsum(equation, m1, m2, runtime="python")',
                    number=200,
                    globals=globals()))

einsum(equation, m1, m2, runtime='onnxruntime1')
print('einsum-onnxruntime1',
      timeit.timeit('einsum(equation, m1, m2, runtime="onnxruntime1")',
                    number=200,
                    globals=globals()))

einsum(equation, m1, m2, runtime='onnxruntime1', optimize=True, verbose=1)
print('einsum-onnxruntime1',
      timeit.timeit('einsum(equation, m1, m2, runtime="onnxruntime1", optimize=True)',
                    number=200,
                    globals=globals()))

print("list of cached einsum equations")
for k, v in enumerate_cached_einsum():
    print(k, v.equation, v.equation_)

>>>

    numpy.einsum 0.1292743009980768
    einsum 0.1383741779718548
    einsum-python 0.23377681203419343
    einsum-onnxruntime1 0.4136144199874252
    einsum-onnxruntime1 0.6790399909950793
    list of cached einsum equations
    ('cab,cd->ad', 'batch_dot', None, False, dtype('float64'), True, None) cab,cd->ad cab,cd->ad
    ('cab,cd->ad', 'python', None, False, dtype('float64'), True, None) cab,cd->ad cab,cd->ad
    ('cab,cd->ad', 'onnxruntime1', None, False, dtype('float64'), True, None) cab,cd->ad cab,cd->ad
    ('cab,cd->ad', 'onnxruntime1', None, True, dtype('float64'), True, None) cab,cd->ad bad,bc->ac
    [runpythonerror]
    0%|          | 0/25 [00:00<?, ?it/s]
0.017 rtbest='cab,cd->ad':   0%|          | 0/25 [00:00<?, ?it/s]
0.016 rtbest='dab,dc->ac':   0%|          | 0/25 [00:00<?, ?it/s]
0.016 rtbest='dab,dc->ac':  12%|█▏        | 3/25 [00:00<00:00, 23.31it/s]
0.016 rtbest='bac,bd->ad':  12%|█▏        | 3/25 [00:00<00:00, 23.31it/s]
0.015 rtbest='bad,bc->ac':  12%|█▏        | 3/25 [00:00<00:00, 23.31it/s]
0.015 rtbest='bad,bc->ac':  24%|██▍       | 6/25 [00:00<00:00, 21.75it/s]
0.015 rtbest='bad,bc->ac':  36%|███▌      | 9/25 [00:00<00:00, 21.38it/s]
0.015 rtbest='bad,bc->ac':  48%|████▊     | 12/25 [00:00<00:00, 21.31it/s]
0.015 rtbest='bad,bc->ac':  60%|██████    | 15/25 [00:00<00:00, 22.74it/s]
0.015 rtbest='bad,bc->ac':  72%|███████▏  | 18/25 [00:00<00:00, 22.22it/s]
0.015 rtbest='bad,bc->ac':  84%|████████▍ | 21/25 [00:00<00:00, 21.85it/s]
0.015 rtbest='bad,bc->ac':  96%|█████████▌| 24/25 [00:01<00:00, 21.66it/s]
0.015 rtbest='bad,bc->ac': 100%|██████████| 25/25 [00:01<00:00, 21.99it/s]

The last example shows the time taken by every function:

<<<

import os
from pyquickhelper.pycode.profiling import profile
import numpy
from mlprodict.testing.einsum import einsum
from mlprodict.testing.einsum.einsum_fct import enumerate_cached_einsum
from mlprodict import __file__ as path

root = os.path.dirname(path)

equation = "cab,cd->ad"

m1 = numpy.random.randn(200, 20, 20)
m2 = numpy.random.randn(200, 20)


def clean(txt):
    txt = txt.replace(root, "mlprodict")
    return "\n".join(txt.split("\n")[:30])


def fct1():
    for i in range(100):
        einsum(equation, m1, m2, cache=False)


print("Profile cache with default runtime.")
res = profile(fct1)
print(root)
print(clean(res[1]))


def fct2():
    for i in range(100):
        einsum(equation, m1, m2, cache=False, runtime='python')


print("Profile cache with runtime='python'.")
res = profile(fct2)
print(root)
print(clean(res[1]))


def fct3():
    for i in range(100):
        einsum(equation, m1, m2, cache=True)


einsum(equation, m1, m2, cache=True)
print("Profile execution with default runtime.")
res = profile(fct3)
print(root)
print(clean(res[1]))


def fct4():
    for i in range(100):
        einsum(equation, m1, m2, cache=True, runtime='python')


einsum(equation, m1, m2, cache=True, runtime='python')
print("Profile execution with runtime='python'.")
res = profile(fct4)
print(root)
print(clean(res[1]))


def fct5():
    for i in range(100):
        einsum(equation, m1, m2, cache=True, runtime='onnxruntime1')


einsum(equation, m1, m2, cache=True, runtime='onnxruntime1')
print("Profile execution with runtime='onnxruntime1'.")
res = profile(fct5)
print(root)
print(clean(res[1]))

>>>

    Profile cache with default runtime.
    /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict
             133202 function calls (133002 primitive calls) in 0.524 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.001    0.001    0.524    0.524 <stdin>:27(fct1)
          100    0.002    0.000    0.523    0.005 mlprodict/testing/einsum/einsum_fct.py:457(einsum)
          100    0.000    0.000    0.369    0.004 mlprodict/testing/einsum/einsum_fct.py:379(optimize_decompose_einsum_equation)
          100    0.000    0.000    0.369    0.004 mlprodict/testing/einsum/einsum_fct.py:357(_einsum)
          100    0.001    0.000    0.368    0.004 mlprodict/testing/einsum/einsum_fct.py:339(build_einsum)
          100    0.001    0.000    0.367    0.004 mlprodict/testing/einsum/einsum_fct.py:109(build)
          100    0.001    0.000    0.366    0.004 mlprodict/testing/einsum/einsum_fct.py:275(build_runtime)
          100    0.003    0.000    0.365    0.004 mlprodict/testing/einsum/einsum_impl.py:85(decompose_einsum_equation)
          100    0.046    0.000    0.317    0.003 mlprodict/testing/einsum/einsum_impl.py:411(_decompose_einsum_equation_simple)
          100    0.000    0.000    0.152    0.002 mlprodict/testing/einsum/einsum_fct.py:327(__call__)
          100    0.001    0.000    0.151    0.002 mlprodict/testing/einsum/einsum_fct.py:287(<lambda>)
          100    0.001    0.000    0.151    0.002 mlprodict/testing/einsum/einsum_impl.py:165(apply_einsum_sequence)
          100    0.007    0.000    0.150    0.001 mlprodict/testing/einsum/einsum_impl_classes.py:1206(apply_sequence)
         1200    0.008    0.000    0.142    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:601(apply)
         1200    0.017    0.000    0.124    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:329(compute_output_row)
         1600    0.007    0.000    0.076    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
         4800    0.014    0.000    0.067    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:22(single_axes)
         1900    0.059    0.000    0.059    0.000 {method 'reduce' of 'numpy.ufunc' objects}
         3800    0.053    0.000    0.053    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:38(<listcomp>)
          100    0.008    0.000    0.053    0.001 mlprodict/testing/einsum/einsum_impl_classes.py:496(_apply_batch_dot)
          500    0.006    0.000    0.047    0.000 site-packages/numpy/core/fromnumeric.py:69(_wrapreduction)
          500    0.030    0.000    0.042    0.000 mlprodict/testing/einsum/einsum_impl.py:227(_apply_transpose_reshape)
          100    0.002    0.000    0.035    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:563(_apply_reduce_sum)
          100    0.001    0.000    0.032    0.000 <__array_function__ internals>:177(sum)
          100    0.001    0.000    0.030    0.000 site-packages/numpy/core/fromnumeric.py:2162(sum)
    Profile cache with runtime='python'.
    /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict
             974879 function calls (965064 primitive calls) in 3.648 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.001    0.001    3.661    3.661 <stdin>:36(fct2)
          100    0.002    0.000    3.660    0.037 mlprodict/testing/einsum/einsum_fct.py:457(einsum)
          100    0.000    0.000    3.407    0.034 mlprodict/testing/einsum/einsum_fct.py:379(optimize_decompose_einsum_equation)
          100    0.001    0.000    3.407    0.034 mlprodict/testing/einsum/einsum_fct.py:357(_einsum)
          100    0.001    0.000    3.406    0.034 mlprodict/testing/einsum/einsum_fct.py:339(build_einsum)
          100    0.001    0.000    3.405    0.034 mlprodict/testing/einsum/einsum_fct.py:109(build)
          100    0.018    0.000    3.404    0.034 mlprodict/testing/einsum/einsum_fct.py:275(build_runtime)
          100    0.003    0.000    2.145    0.021 mlprodict/onnxrt/onnx_inference.py:101(__init__)
          100    0.057    0.001    2.141    0.021 mlprodict/onnxrt/onnx_inference.py:178(_init)
         2800    0.051    0.000    1.210    0.000 mlprodict/onnxrt/onnx_inference_node.py:166(setup_runtime)
         2800    0.037    0.000    1.106    0.000 mlprodict/onnxrt/ops.py:9(load_op)
          100    0.030    0.000    0.873    0.009 mlprodict/testing/einsum/einsum_impl_classes.py:1464(to_onnx)
        201/1    0.003    0.000    0.801    0.801 <frozen importlib._bootstrap>:1002(_find_and_load)
        201/1    0.002    0.000    0.801    0.801 <frozen importlib._bootstrap>:967(_find_and_load_unlocked)
        201/1    0.003    0.000    0.801    0.801 <frozen importlib._bootstrap>:659(_load_unlocked)
        185/1    0.001    0.000    0.801    0.801 <frozen importlib._bootstrap_external>:784(exec_module)
        218/1    0.000    0.000    0.800    0.800 <frozen importlib._bootstrap>:220(_call_with_frames_removed)
        186/1    0.001    0.000    0.800    0.800 {built-in method builtins.exec}
            1    0.000    0.000    0.800    0.800 mlprodict/onnxrt/ops_cpu/__init__.py:2(<module>)
            1    0.006    0.006    0.755    0.755 mlprodict/onnxrt/ops_cpu/_op_list.py:3(<module>)
          100    0.162    0.002    0.567    0.006 mlprodict/onnxrt/onnx_inference.py:511(to_sequence)
    9496/8351    0.035    0.000    0.462    0.000 {method 'join' of 'str' objects}
          181    0.007    0.000    0.441    0.002 mlprodict/onnxrt/doc/doc_helper.py:152(get_rst_doc)
          181    0.003    0.000    0.432    0.002 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/jinja2/environment.py:1256(render)
      200/100    0.107    0.001    0.404    0.004 mlprodict/onnx_tools/optim/onnx_optimisation_unused.py:52(onnx_remove_node_unused)
    Profile execution with default runtime.
    /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict
             35402 function calls in 0.151 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.001    0.001    0.151    0.151 <stdin>:46(fct3)
          100    0.002    0.000    0.151    0.002 mlprodict/testing/einsum/einsum_fct.py:457(einsum)
          100    0.000    0.000    0.148    0.001 mlprodict/testing/einsum/einsum_fct.py:327(__call__)
          100    0.001    0.000    0.147    0.001 mlprodict/testing/einsum/einsum_fct.py:287(<lambda>)
          100    0.001    0.000    0.147    0.001 mlprodict/testing/einsum/einsum_impl.py:165(apply_einsum_sequence)
          100    0.007    0.000    0.146    0.001 mlprodict/testing/einsum/einsum_impl_classes.py:1206(apply_sequence)
         1200    0.007    0.000    0.138    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:601(apply)
         1400    0.005    0.000    0.073    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
          100    0.008    0.000    0.052    0.001 mlprodict/testing/einsum/einsum_impl_classes.py:496(_apply_batch_dot)
          500    0.006    0.000    0.048    0.000 site-packages/numpy/core/fromnumeric.py:69(_wrapreduction)
          500    0.040    0.000    0.040    0.000 {method 'reduce' of 'numpy.ufunc' objects}
          100    0.001    0.000    0.036    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:563(_apply_reduce_sum)
          100    0.001    0.000    0.033    0.000 <__array_function__ internals>:177(sum)
          100    0.001    0.000    0.032    0.000 site-packages/numpy/core/fromnumeric.py:2162(sum)
          400    0.001    0.000    0.022    0.000 <__array_function__ internals>:177(prod)
          400    0.002    0.000    0.020    0.000 site-packages/numpy/core/fromnumeric.py:2927(prod)
          200    0.003    0.000    0.019    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:411(_apply_expand_dims)
          100    0.013    0.000    0.015    0.000 mlprodict/testing/einsum/blas_lapack.py:93(gemm_dot)
          300    0.001    0.000    0.015    0.000 <__array_function__ internals>:177(expand_dims)
          400    0.004    0.000    0.014    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:423(_apply_transpose)
          300    0.005    0.000    0.012    0.000 site-packages/numpy/lib/shape_base.py:512(expand_dims)
          400    0.001    0.000    0.006    0.000 <__array_function__ internals>:177(transpose)
         1300    0.003    0.000    0.005    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:374(_get_data)
          100    0.001    0.000    0.005    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:588(_apply_squeeze)
         2000    0.005    0.000    0.005    0.000 {built-in method builtins.getattr}
    Profile execution with runtime='python'.
    /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict
             34102 function calls in 0.243 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.001    0.001    0.243    0.243 <stdin>:58(fct4)
          100    0.002    0.000    0.242    0.002 mlprodict/testing/einsum/einsum_fct.py:457(einsum)
          100    0.000    0.000    0.239    0.002 mlprodict/testing/einsum/einsum_fct.py:327(__call__)
          100    0.001    0.000    0.238    0.002 mlprodict/testing/einsum/einsum_fct.py:303(<lambda>)
          100    0.001    0.000    0.237    0.002 mlprodict/onnxrt/onnx_inference.py:781(run)
          100    0.002    0.000    0.236    0.002 mlprodict/onnxrt/onnx_inference.py:299(_run_sequence_runtime_compiled)
          100    0.012    0.000    0.234    0.002 <string>:1(compiled_run)
         2100    0.022    0.000    0.086    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
          600    0.021    0.000    0.046    0.000 mlprodict/onnxrt/ops_cpu/op_gather.py:28(_run)
          100    0.002    0.000    0.037    0.000 mlprodict/onnxrt/ops_cpu/op_reduce_sum.py:64(_run)
          100    0.001    0.000    0.034    0.000 <__array_function__ internals>:177(sum)
          100    0.001    0.000    0.032    0.000 site-packages/numpy/core/fromnumeric.py:2162(sum)
          100    0.001    0.000    0.031    0.000 site-packages/numpy/core/fromnumeric.py:69(_wrapreduction)
          400    0.003    0.000    0.031    0.000 mlprodict/onnxrt/ops_cpu/op_identity.py:18(_run)
          300    0.001    0.000    0.031    0.000 mlprodict/onnxrt/ops_cpu/op_reshape.py:37(_run)
          300    0.011    0.000    0.030    0.000 mlprodict/onnxrt/ops_cpu/op_reshape.py:14(reshape_reference_implementation)
          100    0.030    0.000    0.030    0.000 {method 'reduce' of 'numpy.ufunc' objects}
          400    0.028    0.000    0.028    0.000 {method 'copy' of 'numpy.ndarray' objects}
          600    0.007    0.000    0.025    0.000 site-packages/numpy/core/_dtype.py:34(__str__)
          200    0.008    0.000    0.025    0.000 mlprodict/onnxrt/ops_cpu/op_unsqueeze.py:54(_run)
          600    0.006    0.000    0.018    0.000 site-packages/numpy/core/_dtype.py:344(_name_get)
          200    0.002    0.000    0.016    0.000 <__array_function__ internals>:177(expand_dims)
          400    0.003    0.000    0.015    0.000 mlprodict/onnxrt/ops_cpu/op_transpose.py:23(_run)
          100    0.000    0.000    0.015    0.000 mlprodict/onnxrt/ops_cpu/op_gemm.py:57(_run)
          100    0.000    0.000    0.014    0.000 mlprodict/onnxrt/ops_cpu/op_gemm.py:27(<lambda>)
    Profile execution with runtime='onnxruntime1'.
    /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict
             2202 function calls in 0.234 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.001    0.001    0.234    0.234 <stdin>:69(fct5)
          100    0.002    0.000    0.233    0.002 mlprodict/testing/einsum/einsum_fct.py:457(einsum)
          100    0.000    0.000    0.229    0.002 mlprodict/testing/einsum/einsum_fct.py:327(__call__)
          100    0.001    0.000    0.229    0.002 mlprodict/testing/einsum/einsum_fct.py:303(<lambda>)
          100    0.001    0.000    0.227    0.002 mlprodict/onnxrt/onnx_inference.py:781(run)
          100    0.002    0.000    0.227    0.002 mlprodict/onnxrt/onnx_inference.py:1313(_run_whole_runtime)
          100    0.224    0.002    0.224    0.002 mlprodict/onnxrt/ops_whole/session.py:97(run)
          100    0.000    0.000    0.001    0.000 mlprodict/testing/einsum/einsum_fct.py:379(optimize_decompose_einsum_equation)
          100    0.000    0.000    0.001    0.000 mlprodict/testing/einsum/einsum_fct.py:357(_einsum)
          100    0.000    0.000    0.000    0.000 mlprodict/onnxrt/onnx_inference.py:1386(<dictcomp>)
          300    0.000    0.000    0.000    0.000 mlprodict/testing/einsum/einsum_fct.py:655(<genexpr>)
          100    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
          200    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
          100    0.000    0.000    0.000    0.000 mlprodict/testing/einsum/einsum_fct.py:304(<dictcomp>)
          200    0.000    0.000    0.000    0.000 {built-in method builtins.len}
          100    0.000    0.000    0.000    0.000 {method 'values' of 'dict' objects}
          100    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
          100    0.000    0.000    0.000    0.000 {built-in method builtins.iter}
          100    0.000    0.000    0.000    0.000 {built-in method builtins.next}
            1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

source on GitHub

mlprodict.testing.einsum.einsum_fct.enumerate_cached_einsum()#

Enumerates all cached einsum function.

source on GitHub

mlprodict.testing.einsum.einsum_fct.optimize_decompose_einsum_equation(equation, dtype, optimize=False, runtime='batch_dot', cache=True, opset=None, decompose=True, strategy=None, verbose=None)#

Proposes a new implementation of numpy.einsum. It does not allow expresion using and expects a right member.

Parameters:
  • equation – einsum equation

  • optimize – permutes all letters to find the best permutation

  • runtime – runtime used to compute the results once the computation graph is produced (see below)

  • cache – if True, the function stores the preprocessing done for a specific equation, the second call with the same equation is much faster

  • opset – ONNX opset to use for some runtimes

  • decompose – by default, the function decomposes the equation into more simple operators but it can keep the original ONNX einsum operator.

  • strategy – optimisation strategy (see below)

  • verbose – display progress if optimize is True

Returns:

einsum result

The available runtimes are:

  • batch_dot: the runtime is apply_einsum_sequence,

  • python: one ONNX graph executed with a python runtime,

  • onnxruntime1: one ONNX graph executed with onnxruntime.

The optimisation strategy can be:

  • None: the same runtime is used to find the best permutation of letters

  • ‘ml’: a machine learned model is used to predict the

    best permutation of letters, this model comes from notebook Infer operator computation cost.

The function works in two steps:

  • first step analyses the equation to produce a computation graph, this graph can also be converted into ONNX,

  • second step runs the graph whatever the graph is.

The function returns an object of type CachedEinsum which has the following members after optimization:

  • equation_ corresponding to the best equivalent equation

  • graph_: the corresponding graph returned by function

    :func:`decompose_einsum_equation <mlprodict.testing.einsum.einsum_impl.decompose_einsum_equation> `

  • onnx_: if a conversion to onnx is used, stores the onnx graph

  • runtime_: a function used by __call__, calls the runtime

  • oinf_: an object of type OnnxInference

  • timed_permutations_: memorizes the results of the optimization

<<<

import numpy
from mlprodict.testing.einsum import optimize_decompose_einsum_equation

seq_opt = optimize_decompose_einsum_equation(
    "bsnh,btnh->bnts", numpy.float64, strategy='ml', verbose=1,
    runtime="python", optimize=True)

print("best equation:", seq_opt.equation_)

>>>

    
  0%|          | 0/121 [00:00<?, ?it/s]
4.5 mlbest='bsnh,btnh->bnts':   0%|          | 0/121 [00:00<?, ?it/s]
4.5 mlbest='bsnh,btnh->bnts':   3%|3         | 4/121 [00:00<00:03, 35.14it/s]
4.5 mlbest='bnth,bsth->btsn':   3%|3         | 4/121 [00:00<00:03, 35.14it/s]
4.5 mlbest='bnth,bsth->btsn':   7%|6         | 8/121 [00:00<00:03, 36.57it/s]
4.5 mlbest='bnht,bsht->bhsn':   7%|6         | 8/121 [00:00<00:03, 36.57it/s]
4.5 mlbest='bnht,bsht->bhsn':  10%|9         | 12/121 [00:00<00:02, 37.30it/s]
4.5 mlbest='bhtn,bstn->btsh':  10%|9         | 12/121 [00:00<00:02, 37.30it/s]
4.5 mlbest='bhtn,bstn->btsh':  13%|#3        | 16/121 [00:00<00:02, 36.72it/s]
4.5 mlbest='bhts,bnts->btnh':  13%|#3        | 16/121 [00:00<00:02, 36.72it/s]
4.5 mlbest='bhts,bnts->btnh':  17%|#6        | 20/121 [00:00<00:02, 37.29it/s]
4.5 mlbest='bhts,bnts->btnh':  20%|#9        | 24/121 [00:00<00:02, 37.72it/s]
4.5 mlbest='bhts,bnts->btnh':  23%|##3       | 28/121 [00:00<00:02, 38.00it/s]
4.5 mlbest='bhts,bnts->btnh':  26%|##6       | 32/121 [00:00<00:02, 37.24it/s]
4.5 mlbest='bhts,bnts->btnh':  30%|##9       | 36/121 [00:00<00:02, 37.58it/s]
4.5 mlbest='bhts,bnts->btnh':  33%|###3      | 40/121 [00:01<00:02, 37.76it/s]
4.5 mlbest='bhts,bnts->btnh':  36%|###6      | 44/121 [00:01<00:02, 37.98it/s]
4.5 mlbest='bhts,bnts->btnh':  40%|###9      | 48/121 [00:01<00:01, 37.31it/s]
4.5 mlbest='bhts,bnts->btnh':  43%|####2     | 52/121 [00:01<00:01, 37.57it/s]
4.5 mlbest='bhts,bnts->btnh':  46%|####6     | 56/121 [00:01<00:01, 37.96it/s]
4.5 mlbest='bhts,bnts->btnh':  50%|####9     | 60/121 [00:01<00:01, 37.24it/s]
4.5 mlbest='bhts,bnts->btnh':  53%|#####2    | 64/121 [00:01<00:01, 37.48it/s]
4.5 mlbest='bhts,bnts->btnh':  56%|#####6    | 68/121 [00:01<00:01, 37.69it/s]
4.5 mlbest='bhts,bnts->btnh':  60%|#####9    | 72/121 [00:01<00:01, 37.90it/s]
4.5 mlbest='bhts,bnts->btnh':  63%|######2   | 76/121 [00:02<00:01, 37.10it/s]
4.5 mlbest='bhts,bnts->btnh':  66%|######6   | 80/121 [00:02<00:01, 37.40it/s]
4.5 mlbest='bhts,bnts->btnh':  69%|######9   | 84/121 [00:02<00:00, 37.62it/s]
4.5 mlbest='bhts,bnts->btnh':  73%|#######2  | 88/121 [00:02<00:00, 36.98it/s]
4.5 mlbest='bhts,bnts->btnh':  76%|#######6  | 92/121 [00:02<00:00, 37.27it/s]
4.5 mlbest='bhts,bnts->btnh':  79%|#######9  | 96/121 [00:02<00:00, 37.51it/s]
4.5 mlbest='bhts,bnts->btnh':  83%|########2 | 100/121 [00:02<00:00, 37.70it/s]
4.5 mlbest='bhts,bnts->btnh':  86%|########5 | 104/121 [00:02<00:00, 36.95it/s]
4.5 mlbest='bhts,bnts->btnh':  89%|########9 | 108/121 [00:02<00:00, 37.25it/s]
4.5 mlbest='bhts,bnts->btnh':  93%|#########2| 112/121 [00:02<00:00, 37.49it/s]
4.5 mlbest='bhts,bnts->btnh':  96%|#########5| 116/121 [00:03<00:00, 36.81it/s]
4.5 mlbest='bhts,bnts->btnh':  99%|#########9| 120/121 [00:03<00:00, 37.08it/s]
4.5 mlbest='bhts,bnts->btnh': 100%|##########| 121/121 [00:03<00:00, 37.35it/s]
    best equation: bhts,bnts->btnh

source on GitHub