__init__ |
module sparkouille Around Spark. source on GitHub |
__init__ |
module sparkouille.datasets Shorcuts to datasets source on GitHub |
__init__ |
module sparkouille.fctmr Shorcuts to fctmr source on GitHub |
eurostat |
module sparkouille.datasets.eurostat Datasets from Eurostat. source on GitHub |
fast_parallel_fctmr |
module sparkouille.fctmr.fast_parallel_fctmr Simple parallelization of mapper and reducer based on numba. Python does not easily allow to parallelize functions as the GIL blocks most of the tentatives by imposing a single tunnel for all allocations, creation of python objects. The language implements it but in practice it is not. This file is just a tentative to use numba to parallelize a mapper but the number of round trip between python and compiled C makes it difficult to write something generic. source on GitHub |
pyparallel_fctmr |
module sparkouille.fctmr.pyparallel_fctmr joblib uses a module not documented in the official Python documentation: Python’s undocumented ThreadPool. source on GitHub |
simplefctmr |
module sparkouille.fctmr.simplefctmr Simple mapper and reducer implemented in Python source on GitHub |