XD blog

blog page

features, machine learning, model, python

2015-03-07 Work on the features or the model

Sometimes, a machine learned model does not get it. It does not find any way to properly classify the data. Sometimes, you know it could work better with another model but it cannot be trained on such an amount of data. So what...

Another direction consists in looking for non linear combinations of existing features which could explain better the border between two classes. Let's consider this known difficult example:

It cannot be linearly separated but it can with others kinds of models (k-NN, SVC). However, by adding simple multiplications between existing features, the problem becomes linear:

The point is: if you know that a complex features would really help your model, it is worth spending time implementing it rather that trying to approximating it by using a more complex model. (corresponding notebook).

<-- -->

Xavier Dupré