cpyquickhelper.numbers¶
This class uses pybind11 to created a Python from a C++ class. This is heavily inspired from the example python_example.
C++ classes¶
The first classes are an example which exposes two C++ classes.
cpyquickhelper.numbers.weighted_number.WeightedDouble
(self, value, weight = 1.0)
Implements a weighted double used to speed up computation with aggregation. It contains two attributes:
value: unweighted value
weight: weight associated to the value, it should be positive, but that’s not enforced
cpyquickhelper.numbers.weighted_number.WeightedFloat
(self, value, weight = 1.0)
Implements a weighted float used to speed up computation with aggregation. It contains two attributes:
value: unweighted value
weight: weight associated to the value, it should be positive, but that’s not enforced
Benchmark dot product¶
The second example exposes a function doing a benchmark comparing the execution time of a couple of C++ function. The difficulty is the measure cannot happen in Python as the C++ execution time is not significant compare to the time spent in Python. Results are stored in a C++ classes exposes in :epjg:`Python`.
cpyquickhelper.numbers.cbenchmark.ExecutionStat
(self)
Holds results to compare execution time of functions.
Next function gives more information what the current processor assuming the package was compiled on the machine it was installed. The result depends on the compiler defined constants.
cpyquickhelper.numbers.cbenchmark.get_simd_available_option
()
get_simd_available_option() -> str
Returns the available compilation options for SIMD. It can simply be called with the following example…
The function to be tested can be found in cbenchmark.cpp and repeat_fct.h. It all began with the blog post Why is it faster to process a sorted array than an unsorted array?. It plays with a function for which the third line is implemented in different ways.
int nb = 0;
for(auto it = values.begin(); it != values.end(); ++it)
if (*it >= th) nb++; // this line changes
if (*it >= th) nb++; // and is repeated 10 times inside the loop.
// ... 10 times
return nb;
And it is replaced by the following scenarios:
cpyquickhelper.numbers.cbenchmark_dot.measure_scenario_A
(values, th, repeat = 100, number = 10, verbose = False)
measure_scenario_A(values: List[float], th: float, repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat
Measure C++ implementation. Loop on
if (values[i] >= th) ++nb;
cpyquickhelper.numbers.cbenchmark_dot.measure_scenario_B
(values, th, repeat = 100, number = 10, verbose = False)
measure_scenario_B(values: List[float], th: float, repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat
Measure C++ implementation. Loop on
if (*it >= th) ++nb;
cpyquickhelper.numbers.cbenchmark_dot.measure_scenario_C
(values, th, repeat = 100, number = 10, verbose = False)
measure_scenario_C(values: List[float], th: float, repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat
Measure C++ implementation. Loop on
if (*it >= th) nb++;
cpyquickhelper.numbers.cbenchmark_dot.measure_scenario_D
(values, th, repeat = 100, number = 10, verbose = False)
measure_scenario_D(values: List[float], th: float, repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat
Measure C++ implementation. Loop on
nb += *it >= th ? 1 : 0;
.
cpyquickhelper.numbers.cbenchmark_dot.measure_scenario_E
(values, th, repeat = 100, number = 10, verbose = False)
measure_scenario_E(values: List[float], th: float, repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat
Measure C++ implementation. Loop on
if (*it >= th) nb += 1;
cpyquickhelper.numbers.cbenchmark_dot.measure_scenario_F
(values, th, repeat = 100, number = 10, verbose = False)
measure_scenario_F(values: List[float], th: float, repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat
Measure C++ implementation. Loop on
nb += (*it - th) >= 0 ? 1 : 0;
cpyquickhelper.numbers.cbenchmark_dot.measure_scenario_G
(values, th, repeat = 100, number = 10, verbose = False)
measure_scenario_G(values: List[float], th: float, repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat
Measure C++ implementation. Loop on
nb += (*it - th) < 0 ? 1 : 0;
cpyquickhelper.numbers.cbenchmark_dot.measure_scenario_H
(values, th, repeat = 100, number = 10, verbose = False)
measure_scenario_H(values: List[float], th: float, repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat
Measure C++ implementation. Loop on
nb += *it < th ? 1 : 0;
.
cpyquickhelper.numbers.cbenchmark_dot.measure_scenario_I
measure_scenario_I(values: List[float], th: float, repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat
Measure C++ implementation. Loop on
nb += 1 ^ ((unsigned int)(*it) >> (sizeof(int) * CHAR_BIT - 1));
cpyquickhelper.numbers.cbenchmark_dot.measure_scenario_J
(values, th, repeat = 100, number = 10, verbose = False)
measure_scenario_J(values: List[float], th: float, repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat
Measure C++ implementation. Loop on
nb += values[i] >= th ? 1 : 0;
The last implemented is taken from Checking whether a number is positive or negative using bitwise operators.
cpyquickhelper.numbers.cbenchmark_dot.measure_scenario_I
measure_scenario_I(values: List[float], th: float, repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat
Measure C++ implementation. Loop on
nb += 1 ^ ((unsigned int)(*it) >> (sizeof(int) * CHAR_BIT - 1));
The other function implements different dot products between two vectors:
cpyquickhelper.numbers.cbenchmark_dot.vector_dot_product
(arg0, arg1)
vector_dot_product(arg0: numpy.ndarray[numpy.float32], arg1: numpy.ndarray[numpy.float32]) -> float
Computes a dot product in C++ with vectors of floats.
The second function does the same dot product but while computing the dot product, if the remaining size is more than 16, it calls a function which does the 16 product in one sequence.
cpyquickhelper.numbers.cbenchmark_dot.vector_dot_product16
(arg0, arg1)
vector_dot_product16(arg0: numpy.ndarray[numpy.float32], arg1: numpy.ndarray[numpy.float32]) -> float
Computes a dot product in C++ with vectors of floats. Goes 16 by 16.
The following use SSE instructions. See documentation on Intel website.
cpyquickhelper.numbers.cbenchmark_dot.vector_dot_product16_sse
(arg0, arg1)
vector_dot_product16_sse(arg0: numpy.ndarray[numpy.float32], arg1: numpy.ndarray[numpy.float32]) -> float
Computes a dot product in C++ with vectors of floats. Goes 16 by 16. Use SSE instructions.
The next one is using AVX instruction with 512 bits.
cpyquickhelper.numbers.cbenchmark_dot.vector_dot_product16_avx512
(arg0, arg1)
vector_dot_product16_avx512(arg0: numpy.ndarray[numpy.float32], arg1: numpy.ndarray[numpy.float32]) -> float
Computes a dot product in C++ with vectors of floats. Goes 16 by 16. Use SSE instructions because
__AVX512F__
is not defined.
The last function is used to measure the time spent in the python binding, it is the same signature as the dot product but does nothing.
cpyquickhelper.numbers.cbenchmark_dot.empty_vector_dot_product
(arg0, arg1)
empty_vector_dot_product(arg0: numpy.ndarray[numpy.float32], arg1: numpy.ndarray[numpy.float32]) -> float
Empty measure to have an idea about the processing due to python binding.
One final version was added to compare how fast a parallelized version could be:
cpyquickhelper.numbers.cbenchmark_dot.vector_dot_product_openmp
(p1, p2, nthreads = -1)
vector_dot_product_openmp(p1: numpy.ndarray[numpy.float32], p2: numpy.ndarray[numpy.float32], nthreads: int = -1) -> float
Computes a dot product in C++ with vectors of floats and parallelizes with OPENMP.
Speed measure¶
Next functions makes it easier to measure processing time once the module was compiled.
cpyquickhelper.numbers.check_speed
(dims = [100000], repeat = 10, number = 50, fLOG = <built-in function print>)
Prints out some information about speed computation of this laptop. See Measures branching in C++ from python to compare.
cpyquickhelper.numbers.measure_time
(stmt, context = None, repeat = 10, number = 50, div_by_number = False, max_time = None)
Measures a statement and returns the results as a dictionary.
Benchmark sum accumulator¶
The following benchmark measures the differences while computing a sum of a float vector with a double or float accumulator. The two following functions implements the sum in C++.
cpyquickhelper.numbers.cbenchmark_sum_type.vector_float_sum
(arg0)
vector_float_sum(arg0: numpy.ndarray[numpy.float32]) -> float
Computes a sum in C++ with vectors of floats and a float accumulator.
cpyquickhelper.numbers.cbenchmark_sum_type.vector_double_sum
(arg0)
vector_double_sum(arg0: numpy.ndarray[numpy.float32]) -> float
Computes a sum in C++ with vectors of floats and a double accumulator.
The two next functions runs the benchmark in C, the measures does not include the python binding.
cpyquickhelper.numbers.cbenchmark_sum_type.measure_scenario_Float
(values, repeat = 100, number = 10, verbose = False)
measure_scenario_Float(values: List[float], repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat
Measure C++ implementation. Sum all elements with a float accumulator.
cpyquickhelper.numbers.cbenchmark_sum_type.measure_scenario_Double
(values, repeat = 100, number = 10, verbose = False)
measure_scenario_Double(values: List[float], repeat: int = 100, number: int = 10, verbose: bool = False) -> ExecutionStat
Measure C++ implementation. Sum all elements with a double accumulator.
Lapack, Blas¶
cpyquickhelper.numbers.direct_blas_lapack.cblas_ddot
Computes a dot product with cblas_ddot.
cpyquickhelper.numbers.direct_blas_lapack.cblas_sdot
Computes a dot product with cblas_sdot.
cpyquickhelper.numbers.direct_blas_lapack.dgelss
Finds X in the problem by minimizing . Uses function dgels.
C++ implementation¶
cpyquickhelper.numbers.slowcode.dgemm
(arg0, arg1, arg2, arg3, arg4, arg5, arg6)
dgemm(arg0: bool, arg1: bool, arg2: float, arg3: numpy.ndarray[numpy.float64], arg4: numpy.ndarray[numpy.float64], arg5: float, arg6: numpy.ndarray[numpy.float64]) -> None
C++ implementation of gemm function for double floats. Computes one of the following expressions
C = a A B + b C
,C = a A' B + b C
,C = a A B' + b C
,C = a A' B' + b C
. The function assumes C is allocated.
cpyquickhelper.numbers.slowcode.sgemm
(arg0, arg1, arg2, arg3, arg4, arg5, arg6)
sgemm(arg0: bool, arg1: bool, arg2: float, arg3: numpy.ndarray[numpy.float32], arg4: numpy.ndarray[numpy.float32], arg5: float, arg6: numpy.ndarray[numpy.float32]) -> None
C++ implementation of gemm function for single floats. Computes one of the following expressions
C = a A B + b C
,C = a A' B + b C
,C = a A B' + b C
,C = a A' B' + b C
. The function assumes C is allocated.