2016-08-16 Reading, modules about machine learning...
How to train a model with imbalanced datasets (not enough
observation for a class), to do recommendations or to compute
confidence intervals on prediction with a random forest?
That's some of the answer the following extensions of
scikit-learn try to answer.
- imbalanced-learn
imbalanced-learn is a python package offering a number of re-sampling techniques
commonly used in datasets showing strong between-class imbalance.
It is compatible with scikit-learn and is part of scikit-learn-contrib projects.
- polylearn
Factorization machines and polynomial networks are machine learning models
that can capture feature interaction (co-occurrence) through polynomial terms.
Because feature interactions can be very sparse,
it's common to use low rank, factorized representations; this way,
we can learn weights even for feature co-occurrences that haven't
been observed at training time.
- forest-confidence-interval
forest-confidence-interval is a Python module for calculating variance and adding
confidence intervals to scikit-learn random forest regression or classification
objects. The core functions calculate an in-bag and error bars for random
forest objects
Some papers. I will not probably have time to read more than one or two
with the teachings preparation but I should to get more ideas
about students projects.