XD blog

c++

2018-03-31 Mettre un modèle de machine learning en production

J'écrivais un article il y a peu sur le sujet ONNX : apprendre et prédire sur différentes machines où j'évoquais deux pistes pour mettre en production un modèle de machine learning, essentiellement via une application web. Il n'y a pas d'ordre de préférence, certaines sont plus abouties que d'autres. La première tensorflow-serving propose tout clé en main mais elle repose sur l'utilisation de tensorflow. La seconde ONNX, onnxmltools, winmltools convertit un modèle dans un format commun. Il est ensuite exploité là où une librairie de prédiction (un runtime) existe (liste des runtime disponibles). Ce format commun n'est pas encore exploitable en C++ ou javascript mais ces options sont envisageables également. Tensorflow encore propose une façon d'exploiter les modèles en javascript directement avec tensorflow.js (voir https://js.tensorflow.org/tutorials/import-keras.html). ONNX réfléchit à une extension pour le langage C/C++ ONNX in C/C++ qu'il est possible de faire pour le moment pour un modèle de deep learning en convertissant un modèle appris pour la librairie caffe2 avec ONNX : ONNX: deploying a trained model in a C++ project. C'est une direction choisie par le module sklearn-porter qui propose des codes en Java, C++, Go, PhP, Ruby, Javascript pour certains des modèles implémenté par scikit-learn. Cela couvre nettement plus de modèles que ce que j'avais commencé à faire avec mlprodict. Il existe d'autres options comme Seldon, à moitié open source, à moitié tourné vers une proposition de services.

2014-10-17 Python Just In Time Compilation (JIT)

I discovered a new package to do just in time compilation for Python: HOPE. The following paper gives a promising benchmark compare to others alternatives: HOPE: A Python Just-In-Time compiler for astrophysical computations. It was not tested on Windows.

2013-04-28 Migration of old code

I wanted to use some old code I wrote a while ago about training neural networks but I needed to migrate it under a new version of Python and a new version of the compiler. I assumed that migrating my old code would not be too painful but it cost me day. I don't know if there is a good way to maintain some personal toolbox only used only from time to time. A single framework with a single way to add new features. But it is not very convenient with the multiplication of languages and the maintenane of unused old code. A framework which allows any kind of language but gluing projects with a script language (Python?): the issue with this one is you end up write multiple times the same basic functionalities (parsing text...) So far, my memory and the unit test I wrote were the best help to migrate my old code. It turned out I was able to produce a running version of old stuff here.

When I started that, my main issue was, as a teacher, to check if every piece of code I wrote for my students was still running. I also wanted to check which one would work on Python 3 (it was made on Python 2.6). I decided to come up with a solution allowing me to unit test them. I started to plug the biggest scripts and the new one. But what about the code embbeded in a latex file... I would surely do different now.

However, while doing that, I discover that non-ascii characters (latin) are now difficult to handle while writing a C++ extension for Python. They need to be translated into unicode string. I preferred to remove any French accent from my old code (in the documentation specified in boost::python::def) instead of finding a way to keep them. It will be for the next project.

2013-02-18 Références, livres, articles, modules

Quelques livres qu'on m'a conseillé récemment de lire sur tel ou tel sujet ou quelques modules Python qui ont croisé ma route récemment.

Modules Python

networkx : un module permettant d'afficher des graphes avec Graphviz, il a l'air plus simple que d'autres et ne nécessite pas l'installation préalable de Graphviz. Le module yapgvb ne semple plus maintenu.
Pillow : le module PIL n'est plus maintenu, Pillow serait son remplaçant.

Livres

Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology de Dan Gusfield, je cherchais un livre qui pourrait me donner des idées à propos de l'alignement entre deux graphes comme l'idée que j'ai developpée ici Graph matching and alignment.
Random Graphs (Cambridge Studies in Advanced Mathematics) de Bela Ballobas, des étudiants me posaient des questions sur la génération de graphes aléatoires respectant certaines propriétés, ce livre contient quelques réponses.
Compilers: Principles, Techniques, and Tools (connu sous le nom de Dragon book de Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman, c'est la bible pour qui veut écrire un compilateur.
Joel on Software: And on Diverse and Occasionally Related Matters That Will Prove of Interest to Software Developers, Designers, and Managers, and to Those Who, Whether by Good Fortune or Ill Luck, Work with Them in Some Capacity de Joel Spolsky, collections d'article autour de la programmation, C++ en particulier.
exercices d'informatique : conseillé par un collègue
types and programming languages, de Benjamin Pierce (site, un bouquin sur le lambda calcul
Learn You a Haskell For Great Good, de Miran Lipovaca, livre sur le langage Haskell
Purely functional data structures, de Chris Okasaki, livre sur les structures de données et comment les manipuler en langage fonctionnel (lazy interpretation) (version pdf)
Machine Learning in Action, de Peter Harrington, contient un chapitre sur l'utilisation de Hadoop/Pig avec le service AWS d'Amazon.

Xavier Dupré