Compares dot implementations (numpy, cython, c++, sse)

numpy has a very fast implementation of the dot product. It is difficult to be better and very easy to be slower. This example looks into a couple of slower implementations with cython. The tested functions are the following:

import numpy
import matplotlib.pyplot as plt
from pandas import DataFrame, concat
from td3a_cpp.tutorial.dot_cython import (
    dot_product, ddot_cython_array,
    ddot_cython_array_optim, ddot_array,
    ddot_array_16, ddot_array_16_sse
)
from td3a_cpp.tutorial.dot_cython import (
    sdot_cython_array,
    sdot_cython_array_optim, sdot_array,
    sdot_array_16, sdot_array_16_sse
)
from td3a_cpp.tools import measure_time_dim


def get_vectors(fct, n, h=100, dtype=numpy.float64):
    ctxs = [dict(va=numpy.random.randn(n).astype(dtype),
                 vb=numpy.random.randn(n).astype(dtype),
                 dot=fct,
                 x_name=n)
            for n in range(10, n, h)]
    return ctxs

numpy dot

ctxs = get_vectors(numpy.dot, 10000)
df = DataFrame(list(measure_time_dim('dot(va, vb)', ctxs, verbose=1)))
df['fct'] = 'numpy.dot'
print(df.tail(n=3))
dfs = [df]

Out:

  0%|          | 0/100 [00:00<?, ?it/s]
 20%|##        | 20/100 [00:00<00:00, 194.46it/s]
 40%|####      | 40/100 [00:00<00:00, 173.16it/s]
 58%|#####8    | 58/100 [00:00<00:00, 156.79it/s]
 74%|#######4  | 74/100 [00:00<00:00, 144.03it/s]
 89%|########9 | 89/100 [00:00<00:00, 131.35it/s]
100%|##########| 100/100 [00:00<00:00, 138.26it/s]
     average     deviation  min_exec  ...  context_size  x_name        fct
97  0.000018  2.133004e-07  0.000018  ...           232    9710  numpy.dot
98  0.000018  4.432010e-07  0.000018  ...           232    9810  numpy.dot
99  0.000018  2.501539e-07  0.000018  ...           232    9910  numpy.dot

[3 rows x 9 columns]

Several cython dot

for fct in [dot_product, ddot_cython_array,
            ddot_cython_array_optim, ddot_array,
            ddot_array_16, ddot_array_16_sse]:
    ctxs = get_vectors(fct, 10000 if fct.__name__ != 'dot_product' else 1000)

    df = DataFrame(list(measure_time_dim('dot(va, vb)', ctxs, verbose=1)))
    df['fct'] = fct.__name__
    dfs.append(df)
    print(df.tail(n=3))

Out:

  0%|          | 0/10 [00:00<?, ?it/s]
 40%|####      | 4/10 [00:00<00:00, 21.82it/s]
 70%|#######   | 7/10 [00:00<00:00, 10.21it/s]
 90%|######### | 9/10 [00:01<00:00,  7.22it/s]
100%|##########| 10/10 [00:01<00:00,  6.14it/s]
100%|##########| 10/10 [00:01<00:00,  7.51it/s]
    average     deviation  min_exec  ...  context_size  x_name          fct
7  0.000410  4.009409e-07  0.000409  ...           232     710  dot_product
8  0.000470  5.318161e-07  0.000469  ...           232     810  dot_product
9  0.000528  5.104736e-07  0.000527  ...           232     910  dot_product

[3 rows x 9 columns]

  0%|          | 0/100 [00:00<?, ?it/s]
 32%|###2      | 32/100 [00:00<00:00, 312.39it/s]
 64%|######4   | 64/100 [00:00<00:00, 206.19it/s]
 87%|########7 | 87/100 [00:00<00:00, 161.73it/s]
100%|##########| 100/100 [00:00<00:00, 162.16it/s]
     average     deviation  min_exec  ...  context_size  x_name                fct
97  0.000019  1.981773e-07  0.000019  ...           232    9710  ddot_cython_array
98  0.000019  1.974915e-07  0.000019  ...           232    9810  ddot_cython_array
99  0.000019  3.031134e-07  0.000019  ...           232    9910  ddot_cython_array

[3 rows x 9 columns]

  0%|          | 0/100 [00:00<?, ?it/s]
 32%|###2      | 32/100 [00:00<00:00, 312.84it/s]
 64%|######4   | 64/100 [00:00<00:00, 206.85it/s]
 87%|########7 | 87/100 [00:00<00:00, 162.16it/s]
100%|##########| 100/100 [00:00<00:00, 162.29it/s]
     average     deviation  ...  x_name                      fct
97  0.000020  8.941966e-07  ...    9710  ddot_cython_array_optim
98  0.000019  7.700002e-07  ...    9810  ddot_cython_array_optim
99  0.000019  4.628326e-07  ...    9910  ddot_cython_array_optim

[3 rows x 9 columns]

  0%|          | 0/100 [00:00<?, ?it/s]
 28%|##8       | 28/100 [00:00<00:00, 271.38it/s]
 56%|#####6    | 56/100 [00:00<00:00, 195.31it/s]
 77%|#######7  | 77/100 [00:00<00:00, 158.58it/s]
 94%|#########3| 94/100 [00:00<00:00, 134.45it/s]
100%|##########| 100/100 [00:00<00:00, 146.73it/s]
     average     deviation  min_exec  ...  context_size  x_name         fct
97  0.000021  2.962283e-07  0.000021  ...           232    9710  ddot_array
98  0.000020  2.593415e-07  0.000020  ...           232    9810  ddot_array
99  0.000020  3.568344e-07  0.000020  ...           232    9910  ddot_array

[3 rows x 9 columns]

  0%|          | 0/100 [00:00<?, ?it/s]
 30%|###       | 30/100 [00:00<00:00, 287.36it/s]
 59%|#####8    | 59/100 [00:00<00:00, 204.24it/s]
 81%|########1 | 81/100 [00:00<00:00, 163.31it/s]
 99%|#########9| 99/100 [00:00<00:00, 138.47it/s]
100%|##########| 100/100 [00:00<00:00, 156.53it/s]
     average     deviation  min_exec  ...  context_size  x_name            fct
97  0.000019  2.651068e-07  0.000019  ...           232    9710  ddot_array_16
98  0.000019  3.326291e-07  0.000019  ...           232    9810  ddot_array_16
99  0.000019  2.316814e-07  0.000019  ...           232    9910  ddot_array_16

[3 rows x 9 columns]

  0%|          | 0/100 [00:00<?, ?it/s]
 32%|###2      | 32/100 [00:00<00:00, 309.48it/s]
 63%|######3   | 63/100 [00:00<00:00, 226.91it/s]
 87%|########7 | 87/100 [00:00<00:00, 184.80it/s]
100%|##########| 100/100 [00:00<00:00, 184.47it/s]
     average     deviation  min_exec  ...  context_size  x_name                fct
97  0.000017  4.119813e-07  0.000016  ...           232    9710  ddot_array_16_sse
98  0.000016  8.379247e-07  0.000015  ...           232    9810  ddot_array_16_sse
99  0.000015  2.417309e-07  0.000015  ...           232    9910  ddot_array_16_sse

[3 rows x 9 columns]

Let’s display the results

cc = concat(dfs)
cc['N'] = cc['x_name']

fig, ax = plt.subplots(2, 2, figsize=(10, 10))
cc[cc.N <= 1100].pivot('N', 'fct', 'average').plot(
    logy=True, logx=True, ax=ax[0, 0])
cc[cc.fct != 'dot_product'].pivot('N', 'fct', 'average').plot(
    logy=True, ax=ax[0, 1])
cc[cc.fct != 'dot_product'].pivot('N', 'fct', 'average').plot(
    logy=True, logx=True, ax=ax[1, 1])
ax[0, 0].set_title("Comparison of cython ddot implementations")
ax[0, 1].set_title("Comparison of cython ddot implementations"
                   "\nwithout dot_product")

###################
# :epkg:`numpy` is faster but we are able to catch up.
Comparison of cython ddot implementations, Comparison of cython ddot implementations without dot_product

Out:

Text(0.5, 1.0, 'Comparison of cython ddot implementations\nwithout dot_product')

Same for floats

Let’s for single floats.

dfs = []
for fct in [numpy.dot, sdot_cython_array,
            sdot_cython_array_optim, sdot_array,
            sdot_array_16, sdot_array_16_sse]:
    ctxs = get_vectors(fct, 10000 if fct.__name__ != 'dot_product' else 1000,
                       dtype=numpy.float32)

    df = DataFrame(list(measure_time_dim('dot(va, vb)', ctxs, verbose=1)))
    df['fct'] = fct.__name__
    dfs.append(df)
    print(df.tail(n=3))


cc = concat(dfs)
cc['N'] = cc['x_name']

fig, ax = plt.subplots(1, 2, figsize=(10, 4))
cc.pivot('N', 'fct', 'average').plot(
         logy=True, ax=ax[0])
cc.pivot('N', 'fct', 'average').plot(
         logy=True, logx=True, ax=ax[1])
ax[0].set_title("Comparison of cython sdot implementations")
ax[1].set_title("Comparison of cython sdot implementations")

plt.show()
Comparison of cython sdot implementations, Comparison of cython sdot implementations

Out:

  0%|          | 0/100 [00:00<?, ?it/s]
 21%|##1       | 21/100 [00:00<00:00, 203.11it/s]
 42%|####2     | 42/100 [00:00<00:00, 188.81it/s]
 61%|######1   | 61/100 [00:00<00:00, 177.44it/s]
 79%|#######9  | 79/100 [00:00<00:00, 168.10it/s]
 96%|#########6| 96/100 [00:00<00:00, 159.25it/s]
100%|##########| 100/100 [00:00<00:00, 166.76it/s]
     average     deviation  min_exec  ...  context_size  x_name  fct
97  0.000013  3.265726e-07  0.000013  ...           232    9710  dot
98  0.000013  1.567959e-07  0.000013  ...           232    9810  dot
99  0.000013  1.453408e-07  0.000013  ...           232    9910  dot

[3 rows x 9 columns]

  0%|          | 0/100 [00:00<?, ?it/s]
 34%|###4      | 34/100 [00:00<00:00, 336.37it/s]
 68%|######8   | 68/100 [00:00<00:00, 228.34it/s]
 93%|#########3| 93/100 [00:00<00:00, 181.50it/s]
100%|##########| 100/100 [00:00<00:00, 189.82it/s]
     average     deviation  min_exec  ...  context_size  x_name                fct
97  0.000016  3.284573e-07  0.000016  ...           232    9710  sdot_cython_array
98  0.000016  2.717492e-07  0.000016  ...           232    9810  sdot_cython_array
99  0.000016  2.066933e-07  0.000016  ...           232    9910  sdot_cython_array

[3 rows x 9 columns]

  0%|          | 0/100 [00:00<?, ?it/s]
 34%|###4      | 34/100 [00:00<00:00, 333.60it/s]
 68%|######8   | 68/100 [00:00<00:00, 223.60it/s]
 93%|#########3| 93/100 [00:00<00:00, 179.80it/s]
100%|##########| 100/100 [00:00<00:00, 188.30it/s]
     average     deviation  ...  x_name                      fct
97  0.000015  3.130543e-07  ...    9710  sdot_cython_array_optim
98  0.000016  2.703366e-07  ...    9810  sdot_cython_array_optim
99  0.000015  8.364185e-08  ...    9910  sdot_cython_array_optim

[3 rows x 9 columns]

  0%|          | 0/100 [00:00<?, ?it/s]
 30%|###       | 30/100 [00:00<00:00, 294.17it/s]
 60%|######    | 60/100 [00:00<00:00, 214.66it/s]
 83%|########2 | 83/100 [00:00<00:00, 175.46it/s]
100%|##########| 100/100 [00:00<00:00, 171.62it/s]
     average     deviation  min_exec  ...  context_size  x_name         fct
97  0.000016  3.507845e-07  0.000016  ...           232    9710  sdot_array
98  0.000017  2.097420e-07  0.000017  ...           232    9810  sdot_array
99  0.000017  2.233313e-07  0.000017  ...           232    9910  sdot_array

[3 rows x 9 columns]

  0%|          | 0/100 [00:00<?, ?it/s]
 30%|###       | 30/100 [00:00<00:00, 292.42it/s]
 60%|######    | 60/100 [00:00<00:00, 207.40it/s]
 83%|########2 | 83/100 [00:00<00:00, 167.00it/s]
100%|##########| 100/100 [00:00<00:00, 162.50it/s]
     average     deviation  min_exec  ...  context_size  x_name            fct
97  0.000018  2.341873e-07  0.000018  ...           232    9710  sdot_array_16
98  0.000018  2.150023e-07  0.000018  ...           232    9810  sdot_array_16
99  0.000019  2.726906e-07  0.000018  ...           232    9910  sdot_array_16

[3 rows x 9 columns]

  0%|          | 0/100 [00:00<?, ?it/s]
 34%|###4      | 34/100 [00:00<00:00, 335.44it/s]
 68%|######8   | 68/100 [00:00<00:00, 257.88it/s]
 95%|#########5| 95/100 [00:00<00:00, 214.73it/s]
100%|##########| 100/100 [00:00<00:00, 224.78it/s]
     average     deviation  min_exec  ...  context_size  x_name                fct
97  0.000012  1.838225e-07  0.000012  ...           232    9710  sdot_array_16_sse
98  0.000012  2.732146e-07  0.000012  ...           232    9810  sdot_array_16_sse
99  0.000012  2.281501e-07  0.000012  ...           232    9910  sdot_array_16_sse

[3 rows x 9 columns]

Total running time of the script: ( 0 minutes 15.137 seconds)

Gallery generated by Sphinx-Gallery