XD blog

blog page

dask


2016-02-21 scikit-learn, dask and map reduce and examples on Python about machine learning

This is blog post about a couple of topics. The first one is about parallelizing a scikit-learn pipeline with dask. Pipelines and Reuse with dask. A little bit more and dask Introducing Dask Distributed. The module distributed distribute work on a local machine with syntax very close to map/reduce syntax. An example taken from the documentation:

def square(x): return x ** 2
def neg(x): return -x

from distributed import Executor
executor = Executor('127.0.0.1:8786')

A = executor.map(square, range(10))
B = executor.map(neg, A)
total = executor.submit(sum, B)
total.result()

I was being asked where to find examples, scripts about machine learning. Most of them happen to be written within a notebook such as the one posted on Kaggle: Python script posted on kaggle or examples from this blog: Yhat blog. Other examples can be found at ENSAE / Bibliography / Python for a Data Scientist. If you are looking for more examples of code, pick one kaggle competition and look for it on a search engine after adding the word "github", you may be able to find interesting projects. Just try kaggle github to begin with.


Xavier Dupré