module fctmr.fast_parallel_fctmr

Short summary

module sparkouille.fctmr.fast_parallel_fctmr

Simple parallelization of mapper and reducer based on numba. Python does not easily allow to parallelize functions as the GIL blocks most of the tentatives by imposing a single tunnel for all allocations, creation of python objects. The language implements it but in practice it is not. This file is just a tentative to use numba to parallelize a mapper but the number of round trip between python and compiled :epkg:`C` makes it difficult to write something generic.

source on GitHub

Functions

function

truncated documentation

create_array_numba

Creates an array of size nb knowing its signature.

fast_parallel_mapper

Parallelizes a mapper based on numba and more specifically Automatic parallelization with @jit. …

Documentation

Simple parallelization of mapper and reducer based on numba. Python does not easily allow to parallelize functions as the GIL blocks most of the tentatives by imposing a single tunnel for all allocations, creation of python objects. The language implements it but in practice it is not. This file is just a tentative to use numba to parallelize a mapper but the number of round trip between python and compiled :epkg:`C` makes it difficult to write something generic.

source on GitHub

sparkouille.fctmr.fast_parallel_fctmr.create_array_numba(nb, sig)[source]

Creates an array of size nb knowing its signature.

Paramètres
  • nb – integer

  • signature – signature, ex: 'f8'

Renvoie

container

source on GitHub

sparkouille.fctmr.fast_parallel_fctmr.fast_parallel_mapper(fct, gen, chunk_size=100000, parallel=True, nogil=False, nopython=True, sigin=None, sigout=None)[source]

Parallelizes a mapper based on numba and more specifically Automatic parallelization with @jit. This page indicates what numba optimizes when it parallizes a map.

Paramètres
  • fct – function

  • gen – generator

  • chunk_size – see l-parallel-mapper-chunk-size

  • parallel – see parallel

  • nopython – see nopython

  • nogil – see nogil

  • sigin – signature of input type

  • sigout – signature of output type

Renvoie

generator

The parallelization can only happen if the array is known. So the function splits the array in chunck of size chunk_size. This tentative is not very efficient due to the genericity of the mapper. python is not a good language to do that. See unit test test_parallel_fctmr.py.

source on GitHub