Reading, modules about machine learning...

biblio

2016-08-16 Reading, modules about machine learning...

How to train a model with imbalanced datasets (not enough observation for a class), to do recommendations or to compute confidence intervals on prediction with a random forest? That's some of the answer the following extensions of scikit-learn try to answer.

imbalanced-learn imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-class imbalance. It is compatible with scikit-learn and is part of scikit-learn-contrib projects.
polylearn Factorization machines and polynomial networks are machine learning models that can capture feature interaction (co-occurrence) through polynomial terms. Because feature interactions can be very sparse, it's common to use low rank, factorized representations; this way, we can learn weights even for feature co-occurrences that haven't been observed at training time.
forest-confidence-interval forest-confidence-interval is a Python module for calculating variance and adding confidence intervals to scikit-learn random forest regression or classification objects. The core functions calculate an in-bag and error bars for random forest objects

Some papers. I will not probably have time to read more than one or two with the teachings preparation but I should to get more ideas about students projects.

Xavier Dupré

XD blog

2016-08-16 Reading, modules about machine learning...