XD blog

blog page

tableformula


2013-03-04 Easy Pivot

I realize that wherever I go, I always do the same operation with figures. Before using any machine learning, the most basic task consists in aggregate figures using a pivot. Excel is already providing a visual tool. No matter what I do, I always find some limitations or I feel stuck because I do not know this function or this other one. Translating a matrix in Excel is still a pain. Plus, depending on the machine, I'm using, I need to switch from English function to French function which is really annoying. 10 years ago, I used to work with Lotus Improve. Basically, it was a cube in n dimension and you could write formulas quite easily. The sheet displayed on the screen was a pivot by itself and we could move a dimension from one axis to the others or choose a value and bring another dimension for the view. The concept was interesting. I never met anything similar except a programming language.

Any scripting language now allows to implement its own cube quite easily. The initial format is a big table containing the extended cube. Each row follows the same format: n coordinates, value or list of values. When I need to create a view of this cube, I need to aggregate using a kind of SQL logic. It is fine to write a GROUP BY query when the number of variable is small but when it is high, I would appreciate to have a tool which would do that for me. The convention I chose is to have meaningful name. Like many convention, the name of a column should indicate the numeric type and the aggregation type you want (sum, avg, len, key, none, max, min).

key_name	sum_a	len_b	avg_c
A	1.0	2.0	3.0
A	1.1	2.1	3.1
B	3.0	4.0	5.0
To aggregate on name, I just write this:
gr = table.groupby_implicit ( lambda v : v ["key_name"] )
I do not have to specify the rest. Usually, I use sum most of the time. I'll probably add others aggregator later such as argmax. I added a module computing some basics statistics, the code is still here: Maybe one day, somebody will find something easy.

2013-02-23 Dessin avec MatPlotLib en Python

J'ai créé une classe qui permet de faire des opérations assez simples des données. J'ai ajouté quelques fonctions pour créer des graphiques rapidement :

Aujourd'hui, je me sers de cette classe avant de passer les données sous Excel. Les données viennent d'expériences, d'estimations de modèles. Refaire sans arrêt les mêmes manipulations sous Excel devenait trop rébarbatif.


Xavier Dupré