cpyquickhelper.numbers

This class uses pybind11 to created a Python from a C++ class. This is heavily inspired from the example python_example.

C++ classes

The first classes are an example which exposes two C++ classes.

cpyquickhelper.numbers.weighted_number.WeightedDouble (self, value, weight = 1.0)

Implements a weighted double used to speed up computation with aggregation. It contains two attributes:

  • value: unweighted value

  • weight: weight associated to the value, it should be positive, but that’s not enforced

cpyquickhelper.numbers.weighted_number.WeightedFloat (self, value, weight = 1.0)

Implements a weighted float used to speed up computation with aggregation. It contains two attributes:

  • value: unweighted value

  • weight: weight associated to the value, it should be positive, but that’s not enforced

Benchmark dot product

The second example exposes a function doing a benchmark comparing the execution time of a couple of C++ function. The difficulty is the measure cannot happen in Python as the C++ execution time is not significant compare to the time spent in Python. Results are stored in a C++ classes exposes in :epjg:`Python`.

cpyquickhelper.numbers.cbenchmark.ExecutionStat (self)

Holds results to compare execution time of functions.

Next function gives more information what the current processor assuming the package was compiled on the machine it was installed. The result depends on the compiler defined constants.

cpyquickhelper.numbers.cbenchmark.get_simd_available_option ()

get_simd_available_option() -> str

Returns the available compilation options for SIMD. It can simply be called with the following example…

The function to be tested can be found in cbenchmark.cpp and repeat_fct.h. It all began with the blog post Why is it faster to process a sorted array than an unsorted array?. It plays with a function for which the third line is implemented in different ways.

int nb = 0;
for(auto it = values.begin(); it != values.end(); ++it)
    if (*it >= th) nb++; // this line changes
    if (*it >= th) nb++; // and is repeated 10 times inside the loop.
    // ... 10 times
return nb;

And it is replaced by the following scenarios:

cpyquickhelper.numbers.cbenchmark_dot.measure_scenario_A (values, th, repeat = 100, number = 10, verbose = False)

measure_scenario_A(values: List[float], th: float, repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat

Measure C++ implementation. Loop on if (values[i] >= th) ++nb;

cpyquickhelper.numbers.cbenchmark_dot.measure_scenario_B (values, th, repeat = 100, number = 10, verbose = False)

measure_scenario_B(values: List[float], th: float, repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat

Measure C++ implementation. Loop on if (*it >= th) ++nb;

cpyquickhelper.numbers.cbenchmark_dot.measure_scenario_C (values, th, repeat = 100, number = 10, verbose = False)

measure_scenario_C(values: List[float], th: float, repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat

Measure C++ implementation. Loop on if (*it >= th) nb++;

cpyquickhelper.numbers.cbenchmark_dot.measure_scenario_D (values, th, repeat = 100, number = 10, verbose = False)

measure_scenario_D(values: List[float], th: float, repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat

Measure C++ implementation. Loop on nb += *it >= th ? 1 : 0;

.

cpyquickhelper.numbers.cbenchmark_dot.measure_scenario_E (values, th, repeat = 100, number = 10, verbose = False)

measure_scenario_E(values: List[float], th: float, repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat

Measure C++ implementation. Loop on if (*it >= th) nb += 1;

cpyquickhelper.numbers.cbenchmark_dot.measure_scenario_F (values, th, repeat = 100, number = 10, verbose = False)

measure_scenario_F(values: List[float], th: float, repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat

Measure C++ implementation. Loop on nb += (*it - th) >= 0 ? 1 : 0;

cpyquickhelper.numbers.cbenchmark_dot.measure_scenario_G (values, th, repeat = 100, number = 10, verbose = False)

measure_scenario_G(values: List[float], th: float, repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat

Measure C++ implementation. Loop on nb += (*it - th) < 0 ? 1 : 0;

cpyquickhelper.numbers.cbenchmark_dot.measure_scenario_H (values, th, repeat = 100, number = 10, verbose = False)

measure_scenario_H(values: List[float], th: float, repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat

Measure C++ implementation. Loop on nb += *it < th ? 1 : 0;

.

cpyquickhelper.numbers.cbenchmark_dot.measure_scenario_I

measure_scenario_I(values: List[float], th: float, repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat

Measure C++ implementation. Loop on nb += 1 ^ ((unsigned int)(*it) >> (sizeof(int) * CHAR_BIT - 1));

cpyquickhelper.numbers.cbenchmark_dot.measure_scenario_J (values, th, repeat = 100, number = 10, verbose = False)

measure_scenario_J(values: List[float], th: float, repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat

Measure C++ implementation. Loop on nb += values[i] >= th ? 1 : 0;

The last implemented is taken from Checking whether a number is positive or negative using bitwise operators.

cpyquickhelper.numbers.cbenchmark_dot.measure_scenario_I

measure_scenario_I(values: List[float], th: float, repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat

Measure C++ implementation. Loop on nb += 1 ^ ((unsigned int)(*it) >> (sizeof(int) * CHAR_BIT - 1));

The other function implements different dot products between two vectors:

cpyquickhelper.numbers.cbenchmark_dot.vector_dot_product (arg0, arg1)

vector_dot_product(arg0: numpy.ndarray[numpy.float32], arg1: numpy.ndarray[numpy.float32]) -> float

Computes a dot product in C++ with vectors of floats.

The second function does the same dot product but while computing the dot product, if the remaining size is more than 16, it calls a function which does the 16 product in one sequence.

cpyquickhelper.numbers.cbenchmark_dot.vector_dot_product16 (arg0, arg1)

vector_dot_product16(arg0: numpy.ndarray[numpy.float32], arg1: numpy.ndarray[numpy.float32]) -> float

Computes a dot product in C++ with vectors of floats. Goes 16 by 16.

The following use SSE instructions. See documentation on Intel website.

cpyquickhelper.numbers.cbenchmark_dot.vector_dot_product16_sse (arg0, arg1)

vector_dot_product16_sse(arg0: numpy.ndarray[numpy.float32], arg1: numpy.ndarray[numpy.float32]) -> float

Computes a dot product in C++ with vectors of floats. Goes 16 by 16. Use SSE instructions.

The next one is using AVX instruction with 512 bits.

cpyquickhelper.numbers.cbenchmark_dot.vector_dot_product16_avx512 (arg0, arg1)

vector_dot_product16_avx512(arg0: numpy.ndarray[numpy.float32], arg1: numpy.ndarray[numpy.float32]) -> float

Computes a dot product in C++ with vectors of floats. Goes 16 by 16. Use SSE instructions because __AVX512F__ is not defined.

The last function is used to measure the time spent in the python binding, it is the same signature as the dot product but does nothing.

cpyquickhelper.numbers.cbenchmark_dot.empty_vector_dot_product (arg0, arg1)

empty_vector_dot_product(arg0: numpy.ndarray[numpy.float32], arg1: numpy.ndarray[numpy.float32]) -> float

Empty measure to have an idea about the processing due to python binding.

One final version was added to compare how fast a parallelized version could be:

cpyquickhelper.numbers.cbenchmark_dot.vector_dot_product_openmp (p1, p2, nthreads = -1)

vector_dot_product_openmp(p1: numpy.ndarray[numpy.float32], p2: numpy.ndarray[numpy.float32], nthreads: int = -1) -> float

Computes a dot product in C++ with vectors of floats and parallelizes with OPENMP.

Speed measure

Next functions makes it easier to measure processing time once the module was compiled.

cpyquickhelper.numbers.check_speed (dims = [100000], repeat = 10, number = 50, fLOG = <built-in function print>)

Prints out some information about speed computation of this laptop. See Measures branching in C++ from python to compare.

cpyquickhelper.numbers.measure_time (stmt, context = None, repeat = 10, number = 50, div_by_number = False, max_time = None)

Measures a statement and returns the results as a dictionary.

Benchmark sum accumulator

The following benchmark measures the differences while computing a sum of a float vector with a double or float accumulator. The two following functions implements the sum in C++.

cpyquickhelper.numbers.cbenchmark_sum_type.vector_float_sum (arg0)

vector_float_sum(arg0: numpy.ndarray[numpy.float32]) -> float

Computes a sum in C++ with vectors of floats and a float accumulator.

cpyquickhelper.numbers.cbenchmark_sum_type.vector_double_sum (arg0)

vector_double_sum(arg0: numpy.ndarray[numpy.float32]) -> float

Computes a sum in C++ with vectors of floats and a double accumulator.

The two next functions runs the benchmark in C, the measures does not include the python binding.

cpyquickhelper.numbers.cbenchmark_sum_type.measure_scenario_Float (values, repeat = 100, number = 10, verbose = False)

measure_scenario_Float(values: List[float], repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat

Measure C++ implementation. Sum all elements with a float accumulator.

cpyquickhelper.numbers.cbenchmark_sum_type.measure_scenario_Double (values, repeat = 100, number = 10, verbose = False)

measure_scenario_Double(values: List[float], repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat

Measure C++ implementation. Sum all elements with a double accumulator.

Lapack, Blas

cpyquickhelper.numbers.direct_blas_lapack.cblas_ddot

Computes a dot product with cblas_ddot.

cpyquickhelper.numbers.direct_blas_lapack.cblas_sdot

Computes a dot product with cblas_sdot.

cpyquickhelper.numbers.direct_blas_lapack.dgelss

Finds X in the problem AX=B by minimizing \norm{AX - B}^2. Uses function dgels.

C++ implementation

cpyquickhelper.numbers.slowcode.dgemm (arg0, arg1, arg2, arg3, arg4, arg5, arg6)

dgemm(arg0: bool, arg1: bool, arg2: float, arg3: numpy.ndarray[numpy.float64], arg4: numpy.ndarray[numpy.float64], arg5: float, arg6: numpy.ndarray[numpy.float64]) -> None

C++ implementation of gemm function for double floats. Computes one of the following expressions C = a A B + b C, C = a A' B + b C, C = a A B' + b C, C = a A' B' + b C. The function assumes C is allocated.

cpyquickhelper.numbers.slowcode.sgemm (arg0, arg1, arg2, arg3, arg4, arg5, arg6)

sgemm(arg0: bool, arg1: bool, arg2: float, arg3: numpy.ndarray[numpy.float32], arg4: numpy.ndarray[numpy.float32], arg5: float, arg6: numpy.ndarray[numpy.float32]) -> None

C++ implementation of gemm function for single floats. Computes one of the following expressions C = a A B + b C, C = a A' B + b C, C = a A B' + b C, C = a A' B' + b C. The function assumes C is allocated.