XD blog

blog page

python


2016-02-21 scikit-learn, dask and map reduce and examples on Python about machine learning

This is blog post about a couple of topics. The first one is about parallelizing a scikit-learn pipeline with dask. Pipelines and Reuse with dask. A little bit more and dask Introducing Dask Distributed. The module distributed distribute work on a local machine with syntax very close to map/reduce syntax. An example taken from the documentation:

def square(x): return x ** 2
def neg(x): return -x

from distributed import Executor
executor = Executor('127.0.0.1:8786')

A = executor.map(square, range(10))
B = executor.map(neg, A)
total = executor.submit(sum, B)
total.result()

I was being asked where to find examples, scripts about machine learning. Most of them happen to be written within a notebook such as the one posted on Kaggle: Python script posted on kaggle or examples from this blog: Yhat blog. Other examples can be found at ENSAE / Bibliography / Python for a Data Scientist. If you are looking for more examples of code, pick one kaggle competition and look for it on a search engine after adding the word "github", you may be able to find interesting projects. Just try kaggle github to begin with.

2015-12-22 Deep Learning and others readings

I came accross the following article Evaluation of Deep Learning Toolkits which studies a short list of libraries for deep learning: Caffe, CNTK, TensorFlow, Theano, Torch, and various angles: modeling capability, interfaces, model deployment, performance, architecture, ecosystem, cross-platform. It gives a nice overview and helps choosing the library which fits your needs. Once your deep models has been trained, how to use it? This question should be the first one to be answered.

As machine learning and big data become more and more popular, people look for ways to simplify the implementation of complex chains of processings. Python is quite popular so here is one suggestion in that language for deep learning: Blocks and Fuel: Frameworks for deep learning (Bart van Merriënboer, Dzmitry Bahdanau, Vincent Dumoulin, Dmitriy Serdyuk, David Warde-Farley, Jan Chorowski, Yoshua Bengio). It introduces Fuel which models pipelines of data processing.

Finally, a nice tutorial on machine learning with Python: PyData Seattle 2015 Scikit-learn Tutorial. The author's blog is nice too: Pythonic Perambulations. See Out-of-Core Dataframes in Python: Dask and OpenStreetMap. Some modules are hidden in his blog posts such as gatspy which plots timeseries in many ways or supersmoother to smooth timeseries or line_profiler in Optimizing Python in the Real World: NumPy, Numba, and the NUFFT. Two other readings to conclude: Why Python is Slow: Looking Under the Hood and Frequentism and Bayesianism: A Practical Introduction still from the same source.

01/06/2015 Comparative Study of Caffe, Neon, Theano, and Torch for Deep Learning

2015-12-11 Machine learning automatique

Et si plutôt que d'essayer de caler le meilleur modèle sur votre jeu de données, vous esssayiez d'apprendre un modèle qui le fait pour vous... Il existe une conférence pour cela : AutoML workshop @ ICML'15 et un module auto-sklearn.

Et toujours awesome-machine-learning

2015-12-01 Quelques articles de blog, Rodeo, TensorFlow, Tableau, Autoreload, RLPy

Rodeo facilite l'écriture de rapports avec des équations, du code et des graphes. Il est convertit en markdown et PDF : Rodeo 1.1 - Markdown, Autoupdates, Feedback.

Lorsqu'on met à jour un module, les modifications ne sont pas prises en compte automatiquement dans un notebook. Il faut le recharger. Il existe une extension qui fait ça pour vous : Autoreload des modules sous iPython.

Tableau est une application gratuite dans certains cas qui permet de réaliser facilement des dashboards afin de visualiser rapidement des données avec des graphiques animés.

Module pour faire de l'apprentissage par renforcement : RLPy: A Value-Function-Based Reinforcement Learning Framework for Education and Research. Lire également : Batch Learning from Logged Bandit Feedback through Counterfactual Risk Minimization.

Un article sur TensorFlow : What you wanted to know about TensorFlow.

2015-11-17 Building xgboost on Python 3.5

Go to Build xgboost on Python 3.5.

2015-11-12 Python 3.5, Scipy and scikit-learn on Windows

With Python 3.5 on Windows, the module scipy.sparse requires Visual Studio 2015, Community Edition otherwise it displays an error message Issue with Scipy on Windows.

2015-10-27 Travis, Appveyor, PyPi

I was surprised to see that a module I develop to produce my teaching materials gets downloaded 14k times last month on PiPy (pyquickhelper). It seems a lot... But if I count that Travis downloads it everytime I commit something on GitHub. I did 50 commits last week. I would say that the number of time I manually download this module is not significant compare to the number of times it gets automatically downloaded. It seems difficult to get a sense of those counts given by PiPy.

2015-09-26 Python 3.5 and virtual environments on Windows

I began to test my modules against Python 3.5. As I used virtual environement, I discover the following issue : virtualenv fails with Python 3.5 on Windows. Surprisingly, it works on another machine probably due to the different set of softwares installed on it. I guess I'll wait a little bit before digging into it or trying to fix it on my own.

2015-09-04 IoT in Python

I did not have time to give it a try but it looks promising: homeassistant according to their website: Home Assistant.

2015-08-24 Open the notebook with a different browser

I was looking for an easy to launch the notebook server with a different browser than the default one. I created a batch file (for Windows but easily adaptable to Linux):

set PATH=%PATH%;C:\Python34_x64;C:\Python34_x64\Scripts
set PYTHONPATH=<extra_path>;%PYTHONPATH%
set BROWSER="C:\Program Files (x86)\Google\Chrome\Application\chrome.exe"
jupyter-notebook --notebook-dir=<your_folder_for_notebooks> --port=XXXX

The notebook opens on Chrome with the following url http://localhost:XXXX.

2015-08-23 Building xgboost on Windows for Python

I'm unlucky. The day I decide to deal with xgboost on Windows, a couple of hours later, I see a commit which does that. xgboost is now on pipy even if the version for Python 3 is not ready yet (Missing parentheses in call to 'print').


more...

2015-08-14 IPython becomes Jupyter

It was announced on their blog but the module IPython was split in many modules. Here is the list I needed to install while upgrading to Jupyter 4.0.

ipykernel
ipython
ipython_genutils
ipywidgets
jupyter
jupyter_client
jupyter_core
jupyter-console
nbformat
nbconvert
notebook
path.py
pickleshare
qtconsole
simplegeneric
traitlets

With Anaconda, you get all these updates by typing: conda install jupyter.

2015-07-28 PyData 2015 in Seattle

I attended my first conference pydata in Seattle and I must say I learned a lot. I discovered much what I could ever do by looking on Internet for a library for a precise need. That was really worth taking a plane and attend. Most of all, I felt people very passionnated, constantly looking for improvement. So passionate that I would definitely recommend Python over R as a first choice for a machine learning language. R seems only to grow by the number of available packages. But Python catches up. And its environment is also extending by various initiatives to improve plotting or the handling of very big datasets.

I would not be surprised if a language named Rython pops up one day.


more...

2015-07-20 Drawing tree with Python

I did not find any good module to draw trees with Python. I tried networkx or graphviz. The first one does not handle big tree very well. The second one does not optimize for trees and it needs to be called through the command line. ete seems a much better option and it is available on Python 3: github/ete3. See also the gallery, the notebook integration, an online visualizer, the documentation (for Python 2).

2015-07-18 Use virtual environment to develop robots

A decade ago, France stopped nuclear weapons testing and replaced them by simulations. I guess it now happens in virtual environment. Researchers starts to do the same with robots: Minecraft Shows Robots How to Stop Dithering, Using Minecraft to unboggle the robot mind. We do not need to actually build a robot to create a kind of artificial intelligence. It can just be developped in a virtual environment such as Minecraft and interact it with a python module such as picraft. We could try to build a robot which learns how to build a house which does not collapse.


<-- -->

Xavier Dupré