2013-07-29 Starting a coding projet

When I was young, I was writing code, running, fixing bugs and going on to something else to code. After I joined my first company, I realized many people were doing that. There were very different styles of coding, spaces anywhere, indentation not following a single rule and always some resistance to follow one set of rules. I started to work in a company with various cultures and styles. People usually write code following the style which is the clearest for them, the easiest to understand for them. I'm usually skeptical when somebody explains everybody should do this or this because it makes more sense. As a teacher, I saw too many different logics, each of them writing in their own style. And if you do different, it is because it is more simple that way for you. For example, I usually choose small variables name for a loop (n, i, t, u) because I read it as a mathematical loop (\sigma_{i=1}^n ...). I add a space before semicolon because you do this in French. One rule I follow is when I modify something written by somebody else: I try to follow the same style. I remember some code review which came back with many comments because my spaces were not good. I think there should be a tool for that.

Anyway, when you work on a big project, I'm not sure about the importance of the style but I know some tools and practice are unavoidables:

If there are specific guidelines to follow, I usually implement them in unit tests: you cannot submit a modification if one fails. I prefer an explicit error message than a document to read explaining how I should code. And every repetitive task should be automated.

2013-07-28 my RSS Reader

When Google Reader died, I was reluctant to move to something different. Not because others solutions are worse or anything like that, but more because I needed to create a new account, a new password, eventually to pay if the number of blogs I wanted to follow was above a given threshold. With Google, I did not have to do anything like that. I would argue that giving everything to a single company which can monitor every single move you do on the net is not a good idea.

But if I push this reasoning to its extreme, why not having a tool on my laptop which allows me to read blog posts? That way, I would download myself the blog content, I would keep any statistics about my own uage for myself. And if the design is not good enough, I just have to change it. Well, the only argument against that is the time I will need to build that tool (and to maintain it).

Well, to be honest, I also did it because I wanted to learn about some python and javascript figures of programming which I talk about in previous blogs. The tools looks that way:


2013-07-27 Logging click events on your server

Many websites log events, where users clicked for example. They want to optimize for a better use. You would assume every time a user requests a page, your server needs to provide the user with the content of the page. However some cache mechanism could prevent you from getting that information, a user could click on a link leading outside your website or the same page could be obtained from different others pages. You need a more precise information. How to log a click event then?

To do that, we first need to do something when a user clicks on a url: we need to catch this event and to call another function. We use the following syntax:

<a href="url" onmousedown="sendlog('url')">anchor</a>
The function sendlog will be executed when the user clicks on this particular url. The string between the quotes is the information to log. The function sendlog is defined in another file, defsendlog.js in this case. The following lines must be added to the HTML page (header section):
<script type="text/javascript" src="/defsendlog.js"></script>


2013-07-26 Keep the scrolling position after resfreshing

I added the scrolling property to a div section:

	text-align: left;
Unfortunately, after a refresh or a click somewhere which makes only helf the page change, every list returned to its first position. It was bothering me. I finaly found a way to keep list the way they are after a refresh. It requires cookies: we store the position of each section in cookies.
function createCookie(name,value,days) 
	if (days) 
		var date = new Date();
		var expires = "; expires="+date.toGMTString();
	else var expires = "";
	document.cookie = name+"="+value+expires+"; path=/";

function readCookie(name) 
	var nameEQ = name + "=";
	var ca = document.cookie.split(';');
	for(var i=0;i < ca.length;i++) {
		var c = ca[i];
		while (c.charAt(0)==' ') c = c.substring(1,c.length);
		if (c.indexOf(nameEQ) == 0) return c.substring(nameEQ.length,c.length);
	return null;

function savePosition(document, divId)
    var intY = document.getElementById(divId).scrollTop;
    if (intY >= 0) {
        createCookie("divid" + divId, "y" + intY + "_", 1) ;

function getPosition(document, divId)
    var cook  = readCookie("divid" + divId) ;
    if (cook == null) return 0 ;
    var start = cook.indexOf("y") ;
    if (start == -1) return 0 ;
    var end   = cook.indexOf("_", start) ;
    var sub = cook.substring( start+1, end) ;
    return sub ;
So to save the scrolling position of a div section, you just need to call the function savePosition each time it is updated:
<div class="divblogs" id="divblogs" onscroll="savePosition(document,'divblogs')">
The last detail left is a function called when the page is reloading to restore each section's position:
<body onload="setPositions(document,['divblogs', 'divpostsshort', 'divpostsext'])">
The function setPositions is defined as follows:
function setPositions(document, listDiv)
    for (var i = 0 ; i < listDiv.length ; ++i)
        var divObject = document.getElementById(listDiv[i]);
        divObject.scrollTop = getPosition(document,listDiv[i]) ;
Last detail, each section div has attributes (class and id). They are important to enable the scrolling and the make them easily accessible.

2013-07-16 Les trucs que je ne sais jamais quand j'en ai besoin

I read two blogs about stuff I never remember when I need it. I often manipulate text file and I know the linux tools are doing quite a great job about that. But I never remember the syntax. This blog post seems to be a good pointer: Useful Unix commands for data science.

The second is about mutex and lock. The second one is used to synchronize threads among a single application. The first one (mutex) is used to synchronize processes among them (but also threads as a consequence). And if you want to use mutex all the time because it is convenient, you should read this blog post first: Lock vs. Mutex.

2013-07-10 Donner envie de partager ses données personnelles

Je lisais cet article aujourd'hui MATRIX – Ce que nos données Gmail révèlent de notre vie sociale qui décrit un outil d'analyse des emails Immersion. Le titre de l'application est assez évocateur et le journaliste plutôt enthousiaste A bien y réfléchir, à moins d'avoir un réseau démesurément grand, les informations représentées au travers de cette application ne devraient pas surprendre. On peut raisonnablement penser que chacun est susceptible de connaître les différents réseaux avec lesquels il communique, les heures auxquelles les emails sont envoyés ou reçus. La nouveauté réside dans l'aspect visuel.

Je reconnais que l'application est plutôt ludique et qu'on a envie de voir ses propres emails (même si personnellement je ne l'ai pas fait). Le concepteur aussi pu choisir une application que l'utilisateur télécharge et installe sur son ordinateur en lui laissant de récupérer ses propres données. Il a préféré emballer le tout sous forme de service acceptant volontiers de récupérer vos données pour vous les montrer. Bien sûr, il s'engage à ne les partager avec qui que ce soit voire à les effacer. On peut supposer que le MIT qui héberge l'ensemble est un organisme à qui on peut faire confiance.

Choisir une application qui s'installer aurait sans doute nuit quelque peu à la facilité d'utilisation. Toutefois, cela montre comment, avec une application ludique, on peut inciter pas mal de gens à partager leur données. Ce service, déjà présenté dans plusieurs journaux, aura probablement été essayé par plusieurs journalistes, lesquels auront utilisé leur adresse gmail. C'est ainsi qu'en quelques jours, il aura été possible de dresser une carte des connexions entre les journalistes de la presse informatique.

On peut imaginer que l'application soit étendue aux données Facebook. Il est probable qu'on retrouve des réseaux similaires sur gmail et Facebook. Et en peu de temps et quelques algorithmes plus loin, on aura pu associer un profil facebook avec une adresse gmail.

2013-07-07 Build a Python 64 bit extension on Windows 8

I was using MinGW on Windows to build a Python extension including C++ code. The 32bit mode was working fine and I was using the following command line:

python setup.by build --compiler=mingw32
I thought it would be easy to run it with the Python 64 bit version. No change would be required. I was a little bit over confident.
Traceback (most recent call last):
  File "setup.py", line 11, in <module>
    author_email  = '...',
  File "c:\python33_x64\lib\distutils\core.py", line 148, in setup
  File "c:\python33_x64\lib\distutils\dist.py", line 929, in run_commands
  File "c:\python33_x64\lib\distutils\dist.py", line 948, in run_command
  File "c:\python33_x64\lib\distutils\command\build_ext.py", line 323, in run
  File "c:\python33_x64\lib\distutils\ccompiler.py", line 1034, in new_compiler
    return klass(None, dry_run, force)
  File "c:\python33_x64\lib\distutils\cygwinccompiler.py", line 125, in __init__
    if self.ld_version >= "2.10.90":
TypeError: unorderable types: NoneType() >= str()
I decided to switch to Visual Studio Express 2012 to build my extension. but I went through some error related to the file vcvarsall.bat because Python was not able to find the file vsvars64.bat (for one very good reason, it does not exist). I read some blogs where people suggest to reinstall Visual Studio Express but I did not remember the setup asking me anything about options. When I checked the folder of Visual Studio, I found the following file vcvarsx86_amd64.bat. Then, after some research (stubborness is mandatory for those parts) and some tweaks, I discovered two mistakes in the package distutils. They need to be fixed in the file msvc9compiler.py: After the two modifications, it was working fine with the following command line:
python setup.by build --compiler=msvc --plat-name=win-amd64
And I understood why it was failing without any mysterious new installation. I checked about 64 bit version of MinGW but it looked a longer path than the one I chose. Who knows? You will find some others details here. I wrote a function which import a module written in one single C++ file. If the module does not exist, it compiles it inplace.

As a conclusion, I would say it was difficult to find the proper instructions. Maybe the number of documents related to that issue has increased, or the search engines I used were not able to give me the answer. All I know is I do not want to go through that again even if I know there will be a next time when I update Python or when I change my laptop. I hope next time I face that problem, search engines will show me my own page.

2013-07-01 Unit test, what a relief ?

To be honest, I hesitated. French, English, I was pissed off by own coding (I speak French in that case) but saved by a unit test...

Function signature are a bit tricky in python because this definition does not exist. There is only one function and you have to tweak around the parameters' type. So... this is an example of a case where unit tests were useful to me.

def function_nobody(input) :
    if isinstance (input, list) :
        for line in list :
            # ....
    elif isinstance (input ,str) :
        with open(input, "r", encoding="utf8") as f :
            for line in f :
                # ...
So basically, the previous function accepts a file or a list. But I wanted it to be more generic and to accept iterators:
import collections
def function_nobody(input) :
    if isinstance (input, list) or isinstance (file, collections.Iterable) :
        for line in list :
            # ....
    elif isinstance (input ,str) :
        with open(input, "r", encoding="utf8") as f :
            for line in f :
                # ...
I added isinstance (file, collections.Iterable). But a string falls under that condition which made the second case useless. But because I wrote unit tests, I was able to catch my mistake. I just reverted the two tests:
import collections
def function_nobody(input) :
    if isinstance (input ,str) :
        with open(input, "r", encoding="utf8") as f :
            for line in f :
                # ...
    elif isinstance (input, list) or isinstance (file, collections.Iterable) :
        for line in list :
            # ....
And it worked. I confess I did not lose two much time and I would not have lost any even without unit tests because I would not know my failure. However, I don't want to guess how much crazyness I could have gone through if one of my scripts fails because of that a month, two, three months later. How could I guess it is because of that...

Please do unit testing... Ok, it is late, my style might be overdramatic. But, please remember this when it is your time to experience one of these overdramatic scenarii. (I used the latin plural for scenario, don't know if it works in English).

2013-06-23 Internet et la programmation

Je ne me souviens plus de ce qu'était la programmation avant internet. Aujourd'hui, je passe beaucoup de temps à comprendre comment marche un truc précis, je fais une centaine de requête en espérant qu'un autre fou comme moi ait un jour fait face au même problème et qu'il ait de surcroît penser à décrire sa solution. Les forums de discussions sont une source plutôt utiles. Il y a dix ans, internet n'était pas aussi fourni, on cherchait moins et on codait plus. Aujourd'hui, on passe la première partie d'un projet à chercher des briques qu'on pourrait réutiliser. Une fois qu'on les a trouvées, on passe à la seconde phase qui se résume à coder comme un fou. C'est un peu ce que résume le graphe qui suit.

A la fin, il est possible que tout se déroule comme prévu ou que, comme il arrive souvent, vous ne butiez sur un nouveau problème. Rebelotte, nouvelles séries de requêtes pour comprendre que ce petit détail qu'on n'avait pas gardé pour plus tard en n'anticipant pas sur le fait qu'il pourrait s'avérer redoutable... Et bizarrement, armé de nouveaux mots-clé, on finit par trouver la librairie qui aurait pu vous faire économiser pas mal de temps.

Cela explique la troisème partie. A ce stade, trois options s'offrent à vous. La première est de tout laisser tomber en vous disant qu'un jour quelqu'un aura le courage de surmonter ces épreuves mais que là, étant donné que la vie est courte, on va tout laissant en plan pour aller s'amuser. Ca finit par une bonne cuite pour oublier le temps perdu dans des trucs de geek inutile (la cuite va souvent de paire avec le dénigrement). La seconde est de se plonger dans la documentation du nouvel outil puis de repartir presque de zéro pour se dire finalement qu'au bout de toutes ces épreuves, vous aurez enfin sur les yeux ce que vous aviez en tête au départ, à moins qu'un nouvel obstacle n'apparaisse, un nouvel outil, l'enfer... Dans ce cas, la première tentation refait surface et en général l'emporte. La troisième option est celle des temps anciens : l'ignorance. On s'obstine à poursuivre le chemin déjà commencé et on se dit ce n'est pas une petite difficulté qui va tout remettre en question. Quand il n'y avait pas internet, j'avais souvent tendance à choisir cette dernière option. J'ai codé beaucoup de trucs inutiles mais l'essentiel était de ne pas l'apprendre trop tôt. Et puis ce n'était pas tout-à-fait perdu, j'ai appris pas mal aussi. Mais l'ignorance était vraiment le point important.

C'était pas mal sans internet. Je butais contre une machine pour trouver une solution. Aujourd'hui, je passe plus de temps à déchiffrer des explications succintes trouvées sur des blogs écrites par un développeur qui affirme avoir un bousin qui marche. Une heure après, convaincu que cela marche, je m'aperçois que mon truc et le sien ne fonctionnent pas sur la même version de Python. Ca ne m'étonne pas qu'on y perdent des nuits sur ces conneries.

Je me demande ce que tous ceux qui ont bossé sur le Vasa ont pensé au moment où ils ont vu leur oeuvre couler juste après la mis à l'eau.

2013-06-13 A template to create a Python module including Sphinx documentation and a setup

My students often struggle to debug their programs when two or three students need to synchronize their versions. A good way to avoid wasting too much time is to use a tool to keep track of the modifications such as github. It then becomes easy to synchronize multiple versions.

However, students still need to debug the program after a synchronization. A good practice is to write unit tests. Every time, you write a complex function or an easy one, a unit test should be written to ensure its behaviour will not change after many changes. But it means to add a file, to spend some time to do it right, and to frequently run all the unit tests. This is usually too painful when the project will only last a couple of months. Plus, you usually commit yourself to do it only after you went through the nightmare of debugging once.

Last but not least, my students usually do not add documentation to their code. Most of the time, they do not need it because the project is too short to lose track of the modifications and too small to not know it completely. Maybe another reason is because they cannot see a compiled version of the documentation. The best way is to use Sphinx ut using it means spending a couple of hours at least (a lot of more if you do it for the first time). Documentation can also be used to navigate through the program.

For those reasons, I made a kind of template for a Python module. It includes an easy mechanism to add a unit test and to run it. It generates with the documentation with no change and it also generated a setup (gz, exe) with no change either. You can get it here: Pieces of codes, libraries (section Code). After you downloaded it, a page gives the short list of instructions to tweak the template in order to make it yours: README.

2013-04-04 Coding spirit - which developer are you?

When you work for a big company, you have the opportunity to observe many coding styles and spirits. I will not talk about coding style, it appears mine is not perfect since I never learnt programming in a school but only by myself. I put spaces before semicolon or brackets because it fits the way I write in French. But coding spirit is something else. Some developers goes fast, develop an idea as fast as they can, show it works and...

That's the big question: what happens next? You can continue based on the existing code which is very tempting to avoid losing the momentum of your ideas. Usually, it happens that your development speed decreases as you face issues, bugs, design limitations. The code is most of the time not enough unit tested, it has very small documentation and memory fades away. Code review was not that used and the design did not benefit from too many insights.

Some people might say that you should stop, think again about the whole thing and make it right this time. A good design never fails... I would say design always faces a time when it reaches its limitations just because research moved forward, needs changed. My suggestion is to revisit the code from time to time and not to hesitate holding development just to redesign some parts. A kind of regular code cleaning. It also adresses the goal of not completely losing sight of an old code as you revisit it.

However, the first approach is not bad when you do research. You are only interest in proving an idea is wworking not in maintaining the code which helped to prove it. And it perfectly makes sense... until somebody asks you to continue and develop the new features you promised. You then continue to develop as crazy, show the beginning of the new features. You are successful. My recommendation at this stage of the development: leave the project as it is and be as successful somewhere else. The new team in charge will take some time to fully understand what you did, and will probably decide to make some deep changes which will significantly slow down the software development (at least compare to the previous team).

There are so many ways to develop. But sometimes, I think it is like playing chess, some developers think further. You do not understand why they are playing this move... but it works ten moves later. The only issue is a chess games does not last more than a day when developing game can last many years.

2013-02-09 Program to convert latex into gif picture in a html document

A couple of days ago, I wrote a blog (Insérer des formules en code Latex dans un blog) on how to insert Latex formulas in a blog post. Unfortunately, this way does not seem to work all the time. The browser will try to convert formulas using another site each somebody tries to read the post. And sometimes, the latex formula shows up instead of the picture. So, I decided to write a Python program to call http://latex.codecogs.com/latexit.js before publishing the blog post. I only push GIF images and let the latex code as comments. You will find this code latex_svg_gif.py. Basically, it looks for latex formulas, extract them, call the site mentioned below, stores the images, put the former latex code in a comments section and adds a link to the created image.

<div lang="latex_help">
N \frac{c^k-1}{c-1} \sim N \frac{\frac{S}{N}-1}{c-1} \sim \frac{S-N}{c-1}
<div lang="latex_help">
N \frac{c^k-1}{c-1} \sim N \frac{\frac{S}{N}-1}{c-1} \sim \frac{S-N}{c-1}
<p class="latexcenter">
<img src="giflatex/blog_2013_2013-02-07.html__Nfracck1c1simNfracfracSN1c1simfracSNc1.gif"
alt=" N \frac{c^k-1}{c-1} \sim N \frac{\frac{S}{N}-1}{c-1} \sim \frac{S-N}{c-1} " />
The picture name is the concatenation of all symbols in [a-zA-Z0-9].

However, recently, I found another framework which seems to work better than this one. I did not try but it is available with wordpress for example: MathJax.

2013-02-07 How to extend a stack?

A common way to create a sizeable but continuous array is to use a stack mechanism. We start by allocating a buffer and when we need more space to store new elements, we often multiply the size of this buffer by 2. We copy the existing elements into the new space, we add the new one and we free the old buffer. If we follow that strategy for a long time, we end up by allocation memory blocks of size N, 2N, 4N, 8N, ... The major drawback of this approach is we cannot reuse the space we used for the first allocation. The reason is simple:

 N + 2N + 4N + ... + 2^k N = N (2^{k+1} - 1 ) < 2^{k+1} N

As a consequence, the new buffer is always larger than the sum of all previously allocated block. And we because, we need to keep the last one alive to copy the existing elements to the new block, this way of growing a stack cannot reuse the same memory space. This side effect could increase the memory fragmentation.

What does happen if we multiply the size of a block by a coefficient c smaller than 2:

 N + cN + c^2 N + ... + c^k N = N \frac{c^{k+1} - 1 }{c-1}

We need to compare that sum to:

 N \frac{c^{k+1} - 1 }{c-1} - N (c^{k+1} + c^k)

We can remove N from the equation:

 \frac{c^{k+1} - 1 }{c-1} - (c^{k+1} + c^k)   = \left(\frac{c}{c-1}-(c+1)\right) c^k - \frac{1}{c-1} \\ =  c^k \frac{c - c^2 +1}{c-1}  - \frac{1}{c-1}

This expression is positive if and only if:

 c^k (c - c^2 +1)  - 1 > 0

So first, we must have c>1. Second, we must also have c-c^2+1>0, which means:

 c < \frac{\sqrt{5}+1}{2}

If that condition is fulfilled then, there will k for which the above expression becomes positive.


Xavier Dupré