module `testing.einsum.einsum_fct`#

Short summary#

module mlprodict.testing.einsum.einsum_fct

Main functions decomposing einsum computation into more simple functions.

Classes#

class	truncated documentation
`CachedEinsum`	Stores all the necessary information to cache the preprocessing of a an einsum equation.

Functions#

function	truncated documentation
`_einsum`
`einsum`	Proposes a new implementation of numpy.einsum. It does not allow expresion using … and expects a right …
`enumerate_cached_einsum`	Enumerates all cached einsum function.
`optimize_decompose_einsum_equation`	Proposes a new implementation of numpy.einsum. It does not allow expresion using … and expects a right …

Static Methods#

staticmethod	truncated documentation
`build_einsum`	Creates an instance of CachedEinsum.

Methods#

method	truncated documentation
`__call__`	Calls the runtime self.runtime_.
`__init__`
`__repr__`	usual
`_build_optimize`
`_build_optimize_ml`
`build`	Preprocesses the equation builds whatever is necessary to compute the result of the einsum equation.
`build_onnx_einsum`	Builds an ONNX graph with a single einsum operator.
`build_runtime`	Builds the runtime associated to the equation self.equation_.
`default_inputs`	Returns default inputs (reshaped numpy.arange + 0.7i).

Documentation#

Main functions decomposing einsum computation into more simple functions.

source on GitHub

class mlprodict.testing.einsum.einsum_fct.CachedEinsum(equation, runtime='batch_dot', opset=None, optimize=False, dtype=<class 'numpy.float64'>, decompose=True, strategy=None, verbose=None, key=None)#

Bases: object

Stores all the necessary information to cache the preprocessing of a an einsum equation.

Parameters:

equation – numpy equation
runtime – see einsum
opset – ONNX opset
optimize – finds the best letter permutation
dtype – dtype
decompose – to decompose Einsum operator or to keep it as is
key – key used to cache this class
strategy – optimization strategy
verbose – displays progress information

The class creates the following attributes:

equation_ corresponding to the best equivalent equation
graph_: the corresponding graph returned by function
:func:`decompose_einsum_equation <mlprodict.testing.einsum.einsum_impl.decompose_einsum_equation> `
onnx_: if a conversion to onnx is used, stores the onnx graph
runtime_: a function used by __call__, calls the runtime

source on GitHub

__call__(*inputs)#

Calls the runtime self.runtime_.

source on GitHub

__init__(equation, runtime='batch_dot', opset=None, optimize=False, dtype=<class 'numpy.float64'>, decompose=True, strategy=None, verbose=None, key=None)#

__repr__()#: usual

_build_optimize()#

_build_optimize_ml()#

build()#

Preprocesses the equation builds whatever is necessary to compute the result of the einsum equation.

source on GitHub

static build_einsum(equation, runtime, opset, optimize, dtype, decompose=True, strategy=None, verbose=None, key=None)#

Creates an instance of CachedEinsum.

source on GitHub

build_onnx_einsum(input_names)#

Builds an ONNX graph with a single einsum operator.

source on GitHub

build_runtime()#

Builds the runtime associated to the equation self.equation_.

source on GitHub

default_inputs(N=None)#

Returns default inputs (reshaped numpy.arange + 0.7i).

Parameters:: N – dimension (all dimension have the same size)

If N is None, N is given a size depending on the number of letters to avoid spending too much time on optimization.

source on GitHub

mlprodict.testing.einsum.einsum_fct._einsum(equation, dtype, optimize=False, runtime='batch_dot', cache=True, opset=None, decompose=True, strategy=None, verbose=None)#

mlprodict.testing.einsum.einsum_fct.einsum(equation, *inputs, optimize=False, runtime='batch_dot', cache=True, opset=None, decompose=True, strategy=None, verbose=None)#

Proposes a new implementation of numpy.einsum. It does not allow expresion using … and expects a right member.

Parameters:

equation – einsum equation
inputs – inputs
optimize – permutes all letters to find the best permutation
runtime – runtime used to compute the results once the computation graph is produced (see below)
cache – if True, the function stores the preprocessing done for a specific equation, the second call with the same equation is much faster
opset – ONNX opset to use for some runtimes
decompose – by default, the function decomposes the equation into more simple operators but it can keep the original ONNX einsum operator.
strategy – optimisation strategy (see below)
verbose – display progress if optimize is True

Returns:

einsum result

The available runtimes are:

batch_dot: the runtime is apply_einsum_sequence,
python: one ONNX graph executed with a python runtime,
onnxruntime1: one ONNX graph executed with onnxruntime.

The optimisation strategy can be:

None: the same runtime is used to find the best permutation of letters
‘ml’: a machine learned model is used to predict the
best permutation of letters, this model comes from notebook Infer operator computation cost.

The function works in two steps:

first step analyses the equation to produce a computation graph, this graph can also be converted into ONNX,
second step runs the graph whatever the graph is.

Further details are available in the documentation of function optimize_decompose_einsum_equation. The function works the same way as numpy.einsum:

<<<

import numpy
from mlprodict.testing.einsum import einsum

equation = "abc,cd->abd"

m1 = numpy.random.randn(2, 2, 2)
m2 = numpy.random.randn(2, 2)

np = numpy.einsum(equation, m1, m2)
print('numpy.einsum')
print(np)

print('mlprodict.testing.einsum')
mp = einsum(equation, m1, m2)
print(mp)

>>>

    numpy.einsum
    [[[-2.188  0.692]
      [-1.017  0.352]]
    
     [[-1.125 -0.248]
      [-0.167  0.136]]]
    mlprodict.testing.einsum
    [[[-2.188  0.692]
      [-1.017  0.352]]
    
     [[-1.125 -0.248]
      [-0.167  0.136]]]

In some case, the einsum implementation can be optimized by looping on possible permutation:

<<<

import timeit
import numpy
from mlprodict.testing.einsum import einsum
from mlprodict.testing.einsum.einsum_fct import enumerate_cached_einsum

equation = "cab,cd->ad"

m1 = numpy.random.randn(20, 20, 20)
m2 = numpy.random.randn(20, 20)

print('numpy.einsum',
      timeit.timeit('numpy.einsum(equation, m1, m2)',
                    number=200,
                    globals=globals()))

einsum(equation, m1, m2)
print('einsum',
      timeit.timeit('einsum(equation, m1, m2)',
                    number=200,
                    globals=globals()))

einsum(equation, m1, m2, runtime='python')
print('einsum-python',
      timeit.timeit('einsum(equation, m1, m2, runtime="python")',
                    number=200,
                    globals=globals()))

einsum(equation, m1, m2, runtime='onnxruntime1')
print('einsum-onnxruntime1',
      timeit.timeit('einsum(equation, m1, m2, runtime="onnxruntime1")',
                    number=200,
                    globals=globals()))

einsum(equation, m1, m2, runtime='onnxruntime1', optimize=True, verbose=1)
print('einsum-onnxruntime1',
      timeit.timeit('einsum(equation, m1, m2, runtime="onnxruntime1", optimize=True)',
                    number=200,
                    globals=globals()))

print("list of cached einsum equations")
for k, v in enumerate_cached_einsum():
    print(k, v.equation, v.equation_)

>>>

    numpy.einsum 0.13381517003290355
    einsum 0.1363776430953294
    einsum-python 0.23073153698351234
    einsum-onnxruntime1 0.33073955099098384
    einsum-onnxruntime1 0.32155248697381467
    list of cached einsum equations
    ('cab,cd->ad', 'batch_dot', None, False, dtype('float64'), True, None) cab,cd->ad cab,cd->ad
    ('cab,cd->ad', 'python', None, False, dtype('float64'), True, None) cab,cd->ad cab,cd->ad
    ('cab,cd->ad', 'onnxruntime1', None, False, dtype('float64'), True, None) cab,cd->ad cab,cd->ad
    ('cab,cd->ad', 'onnxruntime1', None, True, dtype('float64'), True, None) cab,cd->ad acd,ab->cb
    [runpythonerror]
    0%|          | 0/25 [00:00<?, ?it/s]
0.017 rtbest='cab,cd->ad':   0%|          | 0/25 [00:00<?, ?it/s]
0.016 rtbest='dab,dc->ac':   0%|          | 0/25 [00:00<?, ?it/s]
0.016 rtbest='dab,dc->ac':  12%|█▏        | 3/25 [00:00<00:00, 24.15it/s]
0.016 rtbest='bac,bd->ad':  12%|█▏        | 3/25 [00:00<00:00, 24.15it/s]
0.015 rtbest='bad,bc->ac':  12%|█▏        | 3/25 [00:00<00:00, 24.15it/s]
0.015 rtbest='bad,bc->ac':  24%|██▍       | 6/25 [00:00<00:00, 22.47it/s]
0.015 rtbest='dba,dc->bc':  24%|██▍       | 6/25 [00:00<00:00, 22.47it/s]
0.015 rtbest='dba,dc->bc':  36%|███▌      | 9/25 [00:00<00:00, 22.07it/s]
0.015 rtbest='dba,dc->bc':  48%|████▊     | 12/25 [00:00<00:00, 21.77it/s]
0.015 rtbest='cda,cb->db':  48%|████▊     | 12/25 [00:00<00:00, 21.77it/s]
0.015 rtbest='cda,cb->db':  60%|██████    | 15/25 [00:00<00:00, 23.01it/s]
0.015 rtbest='adb,ac->dc':  60%|██████    | 15/25 [00:00<00:00, 23.01it/s]
0.015 rtbest='acd,ab->cb':  60%|██████    | 15/25 [00:00<00:00, 23.01it/s]
0.015 rtbest='acd,ab->cb':  72%|███████▏  | 18/25 [00:00<00:00, 22.38it/s]
0.015 rtbest='acd,ab->cb':  84%|████████▍ | 21/25 [00:00<00:00, 22.46it/s]
0.015 rtbest='acd,ab->cb':  96%|█████████▌| 24/25 [00:01<00:00, 22.06it/s]
0.015 rtbest='acd,ab->cb': 100%|██████████| 25/25 [00:01<00:00, 22.42it/s]

The last example shows the time taken by every function:

<<<

import os
from pyquickhelper.pycode.profiling import profile
import numpy
from mlprodict.testing.einsum import einsum
from mlprodict.testing.einsum.einsum_fct import enumerate_cached_einsum
from mlprodict import __file__ as path

root = os.path.dirname(path)

equation = "cab,cd->ad"

m1 = numpy.random.randn(200, 20, 20)
m2 = numpy.random.randn(200, 20)


def clean(txt):
    txt = txt.replace(root, "mlprodict")
    return "\n".join(txt.split("\n")[:30])


def fct1():
    for i in range(100):
        einsum(equation, m1, m2, cache=False)


print("Profile cache with default runtime.")
res = profile(fct1)
print(root)
print(clean(res[1]))


def fct2():
    for i in range(100):
        einsum(equation, m1, m2, cache=False, runtime='python')


print("Profile cache with runtime='python'.")
res = profile(fct2)
print(root)
print(clean(res[1]))


def fct3():
    for i in range(100):
        einsum(equation, m1, m2, cache=True)


einsum(equation, m1, m2, cache=True)
print("Profile execution with default runtime.")
res = profile(fct3)
print(root)
print(clean(res[1]))


def fct4():
    for i in range(100):
        einsum(equation, m1, m2, cache=True, runtime='python')


einsum(equation, m1, m2, cache=True, runtime='python')
print("Profile execution with runtime='python'.")
res = profile(fct4)
print(root)
print(clean(res[1]))


def fct5():
    for i in range(100):
        einsum(equation, m1, m2, cache=True, runtime='onnxruntime1')


einsum(equation, m1, m2, cache=True, runtime='onnxruntime1')
print("Profile execution with runtime='onnxruntime1'.")
res = profile(fct5)
print(root)
print(clean(res[1]))

>>>

    Profile cache with default runtime.
    /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict
             133202 function calls (133002 primitive calls) in 0.517 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.001    0.001    0.517    0.517 <stdin>:27(fct1)
          100    0.002    0.000    0.516    0.005 mlprodict/testing/einsum/einsum_fct.py:457(einsum)
          100    0.000    0.000    0.366    0.004 mlprodict/testing/einsum/einsum_fct.py:379(optimize_decompose_einsum_equation)
          100    0.000    0.000    0.366    0.004 mlprodict/testing/einsum/einsum_fct.py:357(_einsum)
          100    0.001    0.000    0.365    0.004 mlprodict/testing/einsum/einsum_fct.py:339(build_einsum)
          100    0.001    0.000    0.364    0.004 mlprodict/testing/einsum/einsum_fct.py:109(build)
          100    0.001    0.000    0.363    0.004 mlprodict/testing/einsum/einsum_fct.py:275(build_runtime)
          100    0.003    0.000    0.362    0.004 mlprodict/testing/einsum/einsum_impl.py:85(decompose_einsum_equation)
          100    0.045    0.000    0.315    0.003 mlprodict/testing/einsum/einsum_impl.py:411(_decompose_einsum_equation_simple)
          100    0.000    0.000    0.148    0.001 mlprodict/testing/einsum/einsum_fct.py:327(__call__)
          100    0.001    0.000    0.148    0.001 mlprodict/testing/einsum/einsum_fct.py:287(<lambda>)
          100    0.001    0.000    0.147    0.001 mlprodict/testing/einsum/einsum_impl.py:165(apply_einsum_sequence)
          100    0.007    0.000    0.146    0.001 mlprodict/testing/einsum/einsum_impl_classes.py:1206(apply_sequence)
         1200    0.008    0.000    0.139    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:601(apply)
         1200    0.017    0.000    0.124    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:329(compute_output_row)
         1600    0.007    0.000    0.074    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
         4800    0.015    0.000    0.068    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:22(single_axes)
         1900    0.059    0.000    0.059    0.000 {method 'reduce' of 'numpy.ufunc' objects}
         3800    0.053    0.000    0.053    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:38(<listcomp>)
          100    0.008    0.000    0.053    0.001 mlprodict/testing/einsum/einsum_impl_classes.py:496(_apply_batch_dot)
          500    0.006    0.000    0.047    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:69(_wrapreduction)
          500    0.030    0.000    0.041    0.000 mlprodict/testing/einsum/einsum_impl.py:227(_apply_transpose_reshape)
          100    0.001    0.000    0.034    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:563(_apply_reduce_sum)
          100    0.001    0.000    0.031    0.000 <__array_function__ internals>:177(sum)
          100    0.001    0.000    0.030    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2162(sum)
    Profile cache with runtime='python'.
    /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict
             1025199 function calls (1014148 primitive calls) in 3.956 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.001    0.001    3.969    3.969 <stdin>:36(fct2)
          100    0.002    0.000    3.968    0.040 mlprodict/testing/einsum/einsum_fct.py:457(einsum)
          100    0.000    0.000    3.716    0.037 mlprodict/testing/einsum/einsum_fct.py:379(optimize_decompose_einsum_equation)
          100    0.001    0.000    3.716    0.037 mlprodict/testing/einsum/einsum_fct.py:357(_einsum)
          100    0.001    0.000    3.715    0.037 mlprodict/testing/einsum/einsum_fct.py:339(build_einsum)
          100    0.001    0.000    3.714    0.037 mlprodict/testing/einsum/einsum_fct.py:109(build)
          100    0.018    0.000    3.713    0.037 mlprodict/testing/einsum/einsum_fct.py:275(build_runtime)
          100    0.003    0.000    2.458    0.025 mlprodict/onnxrt/onnx_inference.py:101(__init__)
          100    0.064    0.001    2.455    0.025 mlprodict/onnxrt/onnx_inference.py:178(_init)
         2800    0.050    0.000    1.498    0.001 mlprodict/onnxrt/onnx_inference_node.py:165(setup_runtime)
         2800    0.040    0.000    1.417    0.001 mlprodict/onnxrt/ops.py:9(load_op)
        391/1    0.006    0.000    1.047    1.047 <frozen importlib._bootstrap>:1002(_find_and_load)
        391/1    0.005    0.000    1.047    1.047 <frozen importlib._bootstrap>:967(_find_and_load_unlocked)
        391/1    0.005    0.000    1.046    1.046 <frozen importlib._bootstrap>:659(_load_unlocked)
        374/1    0.003    0.000    1.046    1.046 <frozen importlib._bootstrap_external>:784(exec_module)
        410/1    0.001    0.000    1.046    1.046 <frozen importlib._bootstrap>:220(_call_with_frames_removed)
        375/1    0.002    0.000    1.046    1.046 {built-in method builtins.exec}
            1    0.000    0.000    1.046    1.046 mlprodict/onnxrt/ops_cpu/__init__.py:2(<module>)
          100    0.029    0.000    0.870    0.009 mlprodict/testing/einsum/einsum_impl_classes.py:1464(to_onnx)
            1    0.006    0.006    0.831    0.831 mlprodict/onnxrt/ops_cpu/_op_list.py:3(<module>)
          100    0.164    0.002    0.594    0.006 mlprodict/onnxrt/onnx_inference.py:524(to_sequence)
    11036/9867    0.037    0.000    0.468    0.000 {method 'join' of 'str' objects}
          182    0.007    0.000    0.445    0.002 mlprodict/onnxrt/doc/doc_helper.py:152(get_rst_doc)
          182    0.003    0.000    0.436    0.002 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/jinja2/environment.py:1256(render)
        15947    0.053    0.000    0.403    0.000 <template>:5(root)
    Profile execution with default runtime.
    /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict
             35402 function calls in 0.150 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.001    0.001    0.150    0.150 <stdin>:46(fct3)
          100    0.002    0.000    0.149    0.001 mlprodict/testing/einsum/einsum_fct.py:457(einsum)
          100    0.000    0.000    0.146    0.001 mlprodict/testing/einsum/einsum_fct.py:327(__call__)
          100    0.001    0.000    0.146    0.001 mlprodict/testing/einsum/einsum_fct.py:287(<lambda>)
          100    0.001    0.000    0.145    0.001 mlprodict/testing/einsum/einsum_impl.py:165(apply_einsum_sequence)
          100    0.007    0.000    0.144    0.001 mlprodict/testing/einsum/einsum_impl_classes.py:1206(apply_sequence)
         1200    0.007    0.000    0.136    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:601(apply)
         1400    0.005    0.000    0.072    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
          100    0.008    0.000    0.052    0.001 mlprodict/testing/einsum/einsum_impl_classes.py:496(_apply_batch_dot)
          500    0.006    0.000    0.048    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:69(_wrapreduction)
          500    0.039    0.000    0.039    0.000 {method 'reduce' of 'numpy.ufunc' objects}
          100    0.001    0.000    0.035    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:563(_apply_reduce_sum)
          100    0.001    0.000    0.032    0.000 <__array_function__ internals>:177(sum)
          100    0.001    0.000    0.031    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2162(sum)
          400    0.001    0.000    0.022    0.000 <__array_function__ internals>:177(prod)
          400    0.002    0.000    0.020    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2927(prod)
          200    0.003    0.000    0.019    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:411(_apply_expand_dims)
          100    0.013    0.000    0.015    0.000 mlprodict/testing/einsum/blas_lapack.py:93(gemm_dot)
          300    0.001    0.000    0.014    0.000 <__array_function__ internals>:177(expand_dims)
          400    0.004    0.000    0.014    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:423(_apply_transpose)
          300    0.005    0.000    0.012    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/lib/shape_base.py:512(expand_dims)
         1300    0.003    0.000    0.006    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:374(_get_data)
          400    0.001    0.000    0.006    0.000 <__array_function__ internals>:177(transpose)
         2000    0.005    0.000    0.005    0.000 {built-in method builtins.getattr}
          100    0.001    0.000    0.005    0.000 mlprodict/testing/einsum/einsum_impl_classes.py:588(_apply_squeeze)
    Profile execution with runtime='python'.
    /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict
             34102 function calls in 0.241 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.001    0.001    0.241    0.241 <stdin>:58(fct4)
          100    0.002    0.000    0.240    0.002 mlprodict/testing/einsum/einsum_fct.py:457(einsum)
          100    0.000    0.000    0.237    0.002 mlprodict/testing/einsum/einsum_fct.py:327(__call__)
          100    0.001    0.000    0.237    0.002 mlprodict/testing/einsum/einsum_fct.py:303(<lambda>)
          100    0.001    0.000    0.235    0.002 mlprodict/onnxrt/onnx_inference.py:797(run)
          100    0.002    0.000    0.234    0.002 mlprodict/onnxrt/onnx_inference.py:299(_run_sequence_runtime_compiled)
          100    0.012    0.000    0.233    0.002 <string>:1(compiled_run)
         2100    0.022    0.000    0.086    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
          600    0.021    0.000    0.046    0.000 mlprodict/onnxrt/ops_cpu/op_gather.py:28(_run)
          100    0.002    0.000    0.036    0.000 mlprodict/onnxrt/ops_cpu/op_reduce_sum.py:64(_run)
          100    0.001    0.000    0.033    0.000 <__array_function__ internals>:177(sum)
          100    0.001    0.000    0.032    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2162(sum)
          100    0.001    0.000    0.031    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:69(_wrapreduction)
          300    0.001    0.000    0.030    0.000 mlprodict/onnxrt/ops_cpu/op_reshape.py:37(_run)
          400    0.003    0.000    0.030    0.000 mlprodict/onnxrt/ops_cpu/op_identity.py:17(_run)
          100    0.029    0.000    0.029    0.000 {method 'reduce' of 'numpy.ufunc' objects}
          300    0.010    0.000    0.029    0.000 mlprodict/onnxrt/ops_cpu/op_reshape.py:14(reshape_reference_implementation)
          400    0.027    0.000    0.027    0.000 {method 'copy' of 'numpy.ndarray' objects}
          600    0.006    0.000    0.025    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/_dtype.py:34(__str__)
          200    0.008    0.000    0.025    0.000 mlprodict/onnxrt/ops_cpu/op_unsqueeze.py:54(_run)
          600    0.006    0.000    0.018    0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/_dtype.py:344(_name_get)
          200    0.002    0.000    0.016    0.000 <__array_function__ internals>:177(expand_dims)
          400    0.003    0.000    0.015    0.000 mlprodict/onnxrt/ops_cpu/op_transpose.py:23(_run)
          100    0.000    0.000    0.015    0.000 mlprodict/onnxrt/ops_cpu/op_gemm.py:57(_run)
          100    0.000    0.000    0.014    0.000 mlprodict/onnxrt/ops_cpu/op_gemm.py:27(<lambda>)
    Profile execution with runtime='onnxruntime1'.
    /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict
             2202 function calls in 0.242 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.001    0.001    0.242    0.242 <stdin>:69(fct5)
          100    0.002    0.000    0.241    0.002 mlprodict/testing/einsum/einsum_fct.py:457(einsum)
          100    0.000    0.000    0.238    0.002 mlprodict/testing/einsum/einsum_fct.py:327(__call__)
          100    0.001    0.000    0.237    0.002 mlprodict/testing/einsum/einsum_fct.py:303(<lambda>)
          100    0.001    0.000    0.236    0.002 mlprodict/onnxrt/onnx_inference.py:797(run)
          100    0.002    0.000    0.235    0.002 mlprodict/onnxrt/onnx_inference.py:1329(_run_whole_runtime)
          100    0.233    0.002    0.233    0.002 mlprodict/onnxrt/ops_whole/session.py:97(run)
          100    0.000    0.000    0.001    0.000 mlprodict/testing/einsum/einsum_fct.py:379(optimize_decompose_einsum_equation)
          100    0.000    0.000    0.001    0.000 mlprodict/testing/einsum/einsum_fct.py:357(_einsum)
          100    0.000    0.000    0.000    0.000 mlprodict/onnxrt/onnx_inference.py:1402(<dictcomp>)
          300    0.000    0.000    0.000    0.000 mlprodict/testing/einsum/einsum_fct.py:655(<genexpr>)
          100    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
          100    0.000    0.000    0.000    0.000 mlprodict/testing/einsum/einsum_fct.py:304(<dictcomp>)
          200    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
          200    0.000    0.000    0.000    0.000 {built-in method builtins.len}
          100    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
          100    0.000    0.000    0.000    0.000 {method 'values' of 'dict' objects}
          100    0.000    0.000    0.000    0.000 {built-in method builtins.iter}
          100    0.000    0.000    0.000    0.000 {built-in method builtins.next}
            1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

source on GitHub

mlprodict.testing.einsum.einsum_fct.enumerate_cached_einsum()#

Enumerates all cached einsum function.

source on GitHub

mlprodict.testing.einsum.einsum_fct.optimize_decompose_einsum_equation(equation, dtype, optimize=False, runtime='batch_dot', cache=True, opset=None, decompose=True, strategy=None, verbose=None)#

Proposes a new implementation of numpy.einsum. It does not allow expresion using … and expects a right member.

Parameters:

equation – einsum equation
optimize – permutes all letters to find the best permutation
runtime – runtime used to compute the results once the computation graph is produced (see below)
cache – if True, the function stores the preprocessing done for a specific equation, the second call with the same equation is much faster
opset – ONNX opset to use for some runtimes
decompose – by default, the function decomposes the equation into more simple operators but it can keep the original ONNX einsum operator.
strategy – optimisation strategy (see below)
verbose – display progress if optimize is True

Returns:

einsum result

The available runtimes are:

batch_dot: the runtime is apply_einsum_sequence,
python: one ONNX graph executed with a python runtime,
onnxruntime1: one ONNX graph executed with onnxruntime.

The optimisation strategy can be:

None: the same runtime is used to find the best permutation of letters
‘ml’: a machine learned model is used to predict the
best permutation of letters, this model comes from notebook Infer operator computation cost.

The function works in two steps:

first step analyses the equation to produce a computation graph, this graph can also be converted into ONNX,
second step runs the graph whatever the graph is.

The function returns an object of type CachedEinsum which has the following members after optimization:

equation_ corresponding to the best equivalent equation
graph_: the corresponding graph returned by function
:func:`decompose_einsum_equation <mlprodict.testing.einsum.einsum_impl.decompose_einsum_equation> `
onnx_: if a conversion to onnx is used, stores the onnx graph
runtime_: a function used by __call__, calls the runtime
oinf_: an object of type OnnxInference
timed_permutations_: memorizes the results of the optimization

<<<

import numpy
from mlprodict.testing.einsum import optimize_decompose_einsum_equation

seq_opt = optimize_decompose_einsum_equation(
    "bsnh,btnh->bnts", numpy.float64, strategy='ml', verbose=1,
    runtime="python", optimize=True)

print("best equation:", seq_opt.equation_)

>>>

    
  0%|          | 0/121 [00:00<?, ?it/s]
5 mlbest='bsnh,btnh->bnts':   0%|          | 0/121 [00:00<?, ?it/s]
5 mlbest='bsnh,btnh->bnts':   3%|3         | 4/121 [00:00<00:03, 35.24it/s]
5 mlbest='bnth,bsth->btsn':   3%|3         | 4/121 [00:00<00:03, 35.24it/s]
5 mlbest='bnth,bsth->btsn':   7%|6         | 8/121 [00:00<00:03, 35.28it/s]
5 mlbest='bnht,bsht->bhsn':   7%|6         | 8/121 [00:00<00:03, 35.28it/s]
5 mlbest='bnht,bsht->bhsn':  10%|9         | 12/121 [00:00<00:02, 36.34it/s]
5 mlbest='bhtn,bstn->btsh':  10%|9         | 12/121 [00:00<00:02, 36.34it/s]
5 mlbest='bhtn,bstn->btsh':  13%|#3        | 16/121 [00:00<00:02, 36.81it/s]
5 mlbest='bhts,bnts->btnh':  13%|#3        | 16/121 [00:00<00:02, 36.81it/s]
5 mlbest='bhts,bnts->btnh':  17%|#6        | 20/121 [00:00<00:02, 36.95it/s]
5 mlbest='bhts,bnts->btnh':  20%|#9        | 24/121 [00:00<00:02, 36.32it/s]
5 mlbest='bhts,bnts->btnh':  23%|##3       | 28/121 [00:00<00:02, 36.79it/s]
5 mlbest='bhts,bnts->btnh':  26%|##6       | 32/121 [00:00<00:02, 36.98it/s]
5 mlbest='bhts,bnts->btnh':  30%|##9       | 36/121 [00:00<00:02, 36.35it/s]
5 mlbest='bhts,bnts->btnh':  33%|###3      | 40/121 [00:01<00:02, 36.65it/s]
5 mlbest='bhts,bnts->btnh':  36%|###6      | 44/121 [00:01<00:02, 36.89it/s]
5 mlbest='bhts,bnts->btnh':  40%|###9      | 48/121 [00:01<00:01, 37.11it/s]
5 mlbest='bhts,bnts->btnh':  43%|####2     | 52/121 [00:01<00:01, 36.42it/s]
5 mlbest='bhts,bnts->btnh':  46%|####6     | 56/121 [00:01<00:01, 36.83it/s]
5 mlbest='bhts,bnts->btnh':  50%|####9     | 60/121 [00:01<00:01, 36.98it/s]
5 mlbest='bhts,bnts->btnh':  53%|#####2    | 64/121 [00:01<00:01, 36.33it/s]
5 mlbest='bhts,bnts->btnh':  56%|#####6    | 68/121 [00:01<00:01, 36.63it/s]
5 mlbest='bhts,bnts->btnh':  60%|#####9    | 72/121 [00:01<00:01, 36.85it/s]
5 mlbest='bhts,bnts->btnh':  63%|######2   | 76/121 [00:02<00:01, 37.06it/s]
5 mlbest='bhts,bnts->btnh':  66%|######6   | 80/121 [00:02<00:01, 36.39it/s]
5 mlbest='bhts,bnts->btnh':  69%|######9   | 84/121 [00:02<00:01, 36.68it/s]
5 mlbest='bhts,bnts->btnh':  73%|#######2  | 88/121 [00:02<00:00, 36.86it/s]
5 mlbest='bhts,bnts->btnh':  76%|#######6  | 92/121 [00:02<00:00, 36.25it/s]
5 mlbest='bhts,bnts->btnh':  79%|#######9  | 96/121 [00:02<00:00, 36.56it/s]
5 mlbest='bhts,bnts->btnh':  83%|########2 | 100/121 [00:02<00:00, 36.79it/s]
5 mlbest='bhts,bnts->btnh':  86%|########5 | 104/121 [00:02<00:00, 37.04it/s]
5 mlbest='bhts,bnts->btnh':  89%|########9 | 108/121 [00:02<00:00, 36.31it/s]
5 mlbest='bhts,bnts->btnh':  93%|#########2| 112/121 [00:03<00:00, 36.61it/s]
5 mlbest='bhts,bnts->btnh':  96%|#########5| 116/121 [00:03<00:00, 36.80it/s]
5 mlbest='bhts,bnts->btnh':  99%|#########9| 120/121 [00:03<00:00, 36.15it/s]
5 mlbest='bhts,bnts->btnh': 100%|##########| 121/121 [00:03<00:00, 36.59it/s]
    best equation: bhts,bnts->btnh

source on GitHub

module testing.einsum.einsum_fct#

Short summary#

Classes#

Functions#

Static Methods#

Methods#

Documentation#

module `testing.einsum.einsum_fct`#