Experimental implementations#

Helpers #

mlprodict.testing.experimental_c_impl.experimental_c.code_optimisation ()

code_optimisation() -> str

Returns a string giving some insights about optimisations.

Implementation of ONNX operators #

Experimental implementations for algorithm.

Conv #

Function im2col transforms an image in order to replace a convolution by a matrix multiplication.

mlprodict.onnxrt.ops_cpu.op_conv_helper.im2col (data, kernel_shape = None, fill_value = 0)

Returns the result of im2col on a image NHCW where N is 1. The function is equivalent to torch.nn.Unfold() (but with padding=1 on all dimensions).

mlprodict.onnxrt.ops_cpu.op_conv_helper.im2col_naive_implementation (data, kernel_shape, fill_value = 0)

Naive implementation for im2col or torch.nn.Unfold() (but with padding=1).

mlprodict.onnxrt.ops_cpu.op_conv_helper.im2col_nn (res)

Functions nn_im2col_2d and im2col() returns the same results but with different shapes. This function converts a result from nn_im2col_2d into the same shape as a return from nn_im2col_2d.

mlprodict.onnxrt.ops_cpu.op_conv_helper.im2col_recursive (data, kernel_shape, fill_value = 0, fall_back_dim = 2)

Recursive implementation, falls back to im2col_naive_implementation for dimension <= fall_back_dim. The function is equivalent to torch.nn.Unfold() (but with padding=1 on all dimensions).

mlprodict.onnxrt.ops_cpu.op_conv_helper.nn_im2col_2d (data, kernel_shape, dilations, padding, fill_value = 0)

C++ implementation for im2col or torch.nn.Unfold().

mlprodict.onnxrt.ops_cpu.op_conv_helper.nn_col2im_2d (data, output_shape, kernel_shape, dilations, padding)

C++ implementation for col2im or torch.nn.Fold().

Einsum #

mlprodict.testing.einsum (equation, inputs, optimize = False, runtime = ‘batch_dot’, cache = True, opset = None, decompose = True, strategy = None, verbose = None)

Proposes a new implementation of numpy.einsum. It does not allow expresion using … and expects a right member.

mlprodict.testing.einsum.einsum_fct.CachedEinsum (self, equation, runtime = ‘batch_dot’, opset = None, optimize = False, dtype = <class ‘numpy.float64’>, decompose = True, strategy = None, verbose = None, key = None)

Stores all the necessary information to cache the preprocessing of a an einsum equation.

mlprodict.testing.einsum.optimize_decompose_einsum_equation (equation, dtype, optimize = False, runtime = ‘batch_dot’, cache = True, opset = None, decompose = True, strategy = None, verbose = None)

Proposes a new implementation of numpy.einsum. It does not allow expresion using … and expects a right member.

mlprodict.testing.einsum.einsum_impl.analyse_einsum_equation (equation)

Analyses an einsum equation.

mlprodict.testing.einsum.einsum_impl.apply_sequence

mlprodict.testing.einsum.decompose_einsum_equation (equation, shapes, strategy = ‘simple’, clean = False, verbose = False)

Decomposes an equation used in numpy.einsum knowing the input shapes. It returns a sequence of operations to do to compute the results.

mlprodict.testing.experimental_c_impl.experimental_c.custom_einsum_float (equation, x, y, nthread = 0)

custom_einsum_float(equation: str, x: numpy.ndarray[numpy.float32], y: numpy.ndarray[numpy.float32], nthread: int = 0) -> numpy.ndarray[numpy.float32]

Custom C++ implementation of operator einsum with float. The function only works with contiguous arrays. It does not any explicit transposes. It does not support diagonal operator (repetition of the same letter). See python’s version custom_einsum.

mlprodict.testing.experimental_c_impl.experimental_c.custom_einsum_double (equation, x, y, nthread = 0)

custom_einsum_double(equation: str, x: numpy.ndarray[numpy.float64], y: numpy.ndarray[numpy.float64], nthread: int = 0) -> numpy.ndarray[numpy.float64]

Custom C++ implementation of operator einsum with double. The function only works with contiguous arrays. It does not any explicit transposes. It does not support diagonal operator (repetition of the same letter). See python’s version custom_einsum.

mlprodict.testing.einsum.einsum_benchmark (equation = ‘abc,cd->abd’, shape = 30, perm = False, runtime = ‘python’, use_tqdm = False, number = 5, repeat = 5, opset = 17)

Investigates whether or not the decomposing einsum is faster.

mlprodict.testing.einsum.numpy_diagonal (m, axis, axes)

Extracts diagonal coefficients from an array.

mlprodict.testing.einsum.numpy_extended_dot (m1, m2, axes, left, right, verbose = False)

Extended version of a matrix multiplication (numpy.dot) with two matrices m1, m2 of the same dimensions. Loops over left axes for m1 and right axes for m2, summation is done over axes. Other axes must be empty. This multiplication combines matrix multiplication (dot) and broadcasted multiplication term by term.

mlprodict.testing.einsum.numpy_extended_dot_python (m1, m2, axes, left, right, verbose = False)

Implementation of numpy_extended_dot in pure python. This implementation is not efficient but shows how to implement this operation without numpy.einsum.

mlprodict.testing.einsum.numpy_extended_dot_matrix (m1, m2, axes, left, right, verbose = False)

Implementation of numpy_extended_dot using dot product, multiplication, transpose and reduction but not a custom python implementation like numpy_extended_dot_python.

mlprodict.testing.einsum.einsum_impl_ext.numpy_extended_dot_ouput_shape (m1, m2, axes, left, right)

Computes the output shape of results produced by function numpy_extended_dot or numpy_extended_dot_python.

Pad #

mlprodict.testing.experimental.custom_pad (arr, paddings, constant = 0, verbose = False)

Implements function pad in python, only the constant version.

ReduceSum #

mlprodict.testing.experimental_c_impl.experimental_c.custom_reducesum_rk_double (x, nthread = 0)

custom_reducesum_rk_double(x: numpy.ndarray[numpy.float64], nthread: int = 0) -> numpy.ndarray[numpy.float64]

Custom C++ implementation of operator ReduceSum with double when the reduced matrix has two dimensions and the reduced axis is the first one. x is the reduced matrix. nthread specifies the number of threads used to distribute. Negative means OMP default values.

mlprodict.testing.experimental_c_impl.experimental_c.custom_reducesum_rk_float (x, nthread = 0)

custom_reducesum_rk_float(x: numpy.ndarray[numpy.float32], nthread: int = 0) -> numpy.ndarray[numpy.float32]

Custom C++ implementation of operator ReduceSum with float when the reduced matrix has two dimensions and the reduced axis is the first one. x is the reduced matrix. nthread specifies the number of threads used to distribute. Negative means OMP default values.

Experimental implementations#

Helpers#

Implementation of ONNX operators#

Conv#

Einsum#

Pad#

ReduceSum#