XD blog

blog page

machine learning, python

2013-09-15 Python extensions to do machine learning

I started to compare the functionalities of some Python extensions (the list is not exhaustive) :

The first one (scikit-learn) covers many features and its documentation is quite clear. When a model is missing, you can look into PyBrain for Reinforcement Learning, in Gensim for Dirichlet Application (Latent, Hierarchical) and in NLTK for any text processing (tokenization for example). For those who do not want to code, Orange would be a good option. The module Theano does gradient optimization using GPU.

A couple of forums, kind of FAQ for machine learning:

It would be difficult to do machine learning without using visualization tools. matplotlib and ggplot would be a good way to start. We also manipulate tables: numpy and pandas. For a command line: ipython or bpython are two common options.

If you are looking for data UC Irvine Machine Learning Repository. If you work with Windows, many of the presented modules can be downloaded from Unofficial Windows Binaries for Python Extension Packages. It also gives a clear view of what package is available on which Python's version.

Next table summarizes where you can find which features (with some errors):
ARMA (Time Series)yes
Canonical Correlation Analysisyesyes
Cross Validationyesyes
Decision Treesyesyesyesyes
Deep Belief Networksyes
Dictionary Learningyes
Dynamic Time Warping (yes)yes
Elastic Netyesyesyesyes
Evolution Strategies (ES)yes
Fast ICAyesyes
Fast/Partial PCAyes
Features Selectionyesyesyesyes
Gaussian Mixture Modelyesyes
Gaussian Naive Bayesyesyes
Genetic Algorithmyes
Golub Classifieryes
GPU computationyes
Gradient Based Optimizationyesyes
Gradient Boosted Treeyes
Gradient Boosting Regressionyesyes
Grid Searchyes
Hidden Markov Model with Gaussian Mixture Emissions (HMM GMM)yesyes
Hierarchical Clustering (Ward…)yesyesyesyes
Hierarchical Dirichlet Application (HDP)yes
Isotonic Regressionyes
Kernal Densityyesyes
Kernel Fisher Discriminantyes
Kernel PCAyesyesyes
Kernel Regressionyes
Kernel Ridge Regressionyes
Label Spreadingyesyes
Largest Common Subsequence (LCS)
Large Linear Classificationyes
Latent Dirichlet Application (LDA)yes
Least Angle Regression (LARS)yesyesyes
Linear Discriminant Analysis (LDA)yesyesyesyes
Linear Regressionyesyesyesyesyesyesyes
Logisitic Regressionyesyesyesyesyes
Naive Bayesian Learneryesyes
Natural Language Processing (NLP)yes
Neural Network (NN)yesyesyesyes
Non-Negative matrix factorization by Projected Gradient (NMF)yesyes
Partial Least Square (PLS)yesyes
Partial Least Square (SVD)yes
Particle Swarm Optimization (PSO)yes
Passive Aggressive Classificationyes
Passive Aggressive Regressionyes
Principal Component Analysis (PCA)yesyesyesyesyes
Probabilistic Principal Component Analysis (pPCA)yesyesyes
Quadratic Discriminant Analysis (QDA)yesyes
Random Forestsyesyesyesyes
Recurrent Neural Networkyes
Regression Treeyesyesyes
Reinforcement Learningyes
Ridge Regressionyesyesyesyes
ROC / Precision / Recallyesyes
Self Organizing Map (SOM - Kohonen)yesyesyes
Singular Value Decomposition (SVD)yesyes
Sparse PCAyesyes
Spectral BiClusteringyes
Spectral Clusteringyes
Spectral Coclusteringyes
Spectral Regression Discriminant Analysisyes
Support Vector Classificiation (SVC)yesyesyes
Support Vector Machine (SVM)yesyesyesyesyesyesyes
Support Vector Regression (SVR)yesyes

If the model you need is not in the previous list, you can use rpy2 to communicate with R where you will surely find a related package.

2014/09/03: you can also read Python Tools for Machine Learning.

<-- -->

Xavier Dupré