I realize that wherever I go, I always do the same operation with figures. Before using any machine learning, the most basic task consists in aggregate figures using a pivot. Excel is already providing a visual tool. No matter what I do, I always find some limitations or I feel stuck because I do not know this function or this other one. Translating a matrix in Excel is still a pain. Plus, depending on the machine, I'm using, I need to switch from English function to French function which is really annoying. 10 years ago, I used to work with Lotus Improve. Basically, it was a cube in n dimension and you could write formulas quite easily. The sheet displayed on the screen was a pivot by itself and we could move a dimension from one axis to the others or choose a value and bring another dimension for the view. The concept was interesting. I never met anything similar except a programming language.
Any scripting language now allows to implement its own cube quite easily. The initial format is a big table containing the extended cube. Each row follows the same format: n coordinates, value or list of values. When I need to create a view of this cube, I need to aggregate using a kind of SQL logic. It is fine to write a GROUP BY query when the number of variable is small but when it is high, I would appreciate to have a tool which would do that for me. The convention I chose is to have meaningful name. Like many convention, the name of a column should indicate the numeric type and the aggregation type you want (sum, avg, len, key, none, max, min).
key_name sum_a len_b avg_c A 1.0 2.0 3.0 A 1.1 2.1 3.1 B 3.0 4.0 5.0To aggregate on name, I just write this:
gr = table.groupby_implicit ( lambda v : v ["key_name"] )I do not have to specify the rest. Usually, I use sum most of the time. I'll probably add others aggregator later such as argmax. I added a module computing some basics statistics, the code is still here: Maybe one day, somebody will find something easy.
<-- --> |