This is blog post about a couple of topics. The first one is about parallelizing a scikit-learn pipeline with dask. Pipelines and Reuse with dask. A little bit more and dask Introducing Dask Distributed. The module distributed distribute work on a local machine with syntax very close to map/reduce syntax. An example taken from the documentation:
def square(x): return x ** 2 def neg(x): return -x from distributed import Executor executor = Executor('127.0.0.1:8786') A = executor.map(square, range(10)) B = executor.map(neg, A) total = executor.submit(sum, B) total.result()
I was being asked where to find examples, scripts about machine learning. Most of them happen to be written within a notebook such as the one posted on Kaggle: Python script posted on kaggle or examples from this blog: Yhat blog. Other examples can be found at ENSAE / Bibliography / Python for a Data Scientist. If you are looking for more examples of code, pick one kaggle competition and look for it on a search engine after adding the word "github", you may be able to find interesting projects. Just try kaggle github to begin with.
<-- --> |