truncated documentation


module sparkouille Around Spark. source on GitHub


module sparkouille.datasets Shorcuts to datasets source on GitHub


module sparkouille.fctmr Shorcuts to fctmr source on GitHub


module sparkouille.datasets.eurostat Datasets from Eurostat. source on GitHub


module sparkouille.fctmr.fast_parallel_fctmr Simple parallelization of mapper and reducer based on numba. Python does not easily allow to parallelize functions as the GIL blocks most of the tentatives by imposing a single tunnel for all allocations, creation of python objects. The language implements it but in practice it is not. This file is just a tentative to use numba to parallelize a mapper but the number of round trip between python and compiled C makes it difficult to write something generic. source on GitHub


module sparkouille.fctmr.pyparallel_fctmr joblib uses a module not documented in the official Python documentation: Python’s undocumented ThreadPool. source on GitHub


module sparkouille.fctmr.simplefctmr Simple mapper and reducer implemented in Python source on GitHub