blog page - 1/1 Blog machine_learning (5)
blog page - 1/1#
scikit-learn 0.23#
2021-01-03
The unit test are run against
scikit-learn 0.23, 0.24.
Some unit tests are failing with version 0.23.
They were disabled instead of looking into a cause
which does not appear with the latest version.
It affects all classes inheriting from SkBase
where a model
using it is trained. The issue happens in :epkg:`joblib`.
scikit-learn internal API#
2020-09-02
The signature of method impurity_improvement will change for version 0.24. That’s usually easy to handle two versions of scikit-learn even overloaded in a class except that method is implemented in cython. The method must be overloaded the same way with the same signature. The way it was handled is implemented in PR 88.
…
Nogil, numpy, cython#
2019-03-25
I had to implement a custom criterion to optimize a decision tree and I wanted to leverage scikit-learn instead of rewriting my own. Version 0.21 of scikit-learn introduced some changed in the API which make possible to overload an existing criterion and replace some of the logic by another one: _criterion.pyx. The purpose was to show that a fast implementation requires some tricks (see Custom DecisionTreeRegressor adapted to a linear regression) and piecewise_tree_regression_criterion.pyx, piecewise_tree_regression_criterion_fast.pyx for the code. Other than that, every function to overlaod is marked as nogil. Every function or method marked as nogil cannot go through the GIL (see also PEP-0311), which no python object can be created in that method. In fact, no python can be called inside a Cython method protected with nogil. The issue with that is that any numpy method cannot be called.
…
Faster Polynomial Features#
2019-02-15
The current implementation of
PolynomialFeatures
in scikit-learn computes each new feature
independently and that increases the number of
data exchanged between numpy and Python.
The idea of the implementation in
ExtendedFeatures
is to reduce this number by brodcast multiplications.
The second optimization occurs by transposing the matrix:
dense matrix are organized by rows in memory so
it is faster to mulitply two rows than two columns.
See Faster Polynomial Features.
Piecewise Linear Regression#
2019-02-10
I decided to turn one of the notebook I wrote about
Piecewise Linear Regression.
I wanted to turn my code into something usable and following
the scikit-learn API:
PiecewiseRegression
and another notebook Piecewise linear regression with scikit-learn predictors.
Predictable t-SNE#
2019-02-01
t-SNE is quite an interesting tool to
visualize data on a map but it has one drawback:
results are not reproducible. It is much more powerful
than a PCA but the results is difficult to
interpret. Based on some experiment, if t-SNE
manages to separate classes, there is a good chance that
a classifier can get good performances. Anyhow, I implemented
a regressor which approximates the t-SNE outputs
so that it can be used as features for a further classifier.
I create a notebook Predictable t-SNE and a new tranform
PredictableTSNE
.
Pipeline visualization#
2019-02-01
scikit-learn introduced nice feature to be able to process mixed type column in a single pipeline which follows scikit-learn API: ColumnTransformer FeatureUnion and Pipeline. Ideas are not new but it is finally taking place in scikit-learn.
…
Quantile regression with scikit-learn.#
2018-05-07
scikit-learn does not have any quantile regression.
statsmodels does have one
QuantReg
but I wanted to try something I did for my teachings
Régression Quantile
based on Iteratively reweighted least squares.
I thought it was a good case study to turn a simple algorithm into
a learner scikit-learn can reused in a pipeline.
The notebook Quantile Regression demonstrates it
and it is implemented in
QuantileLinearRegression
.
Function to get insights on machine learned models#
2017-11-18
Machine learned models are black boxes. The module tries to implements some functions to get insights on machine learned models.
blog page - 1/1 2018-05 (1) 2019-02 (4) 2019-03 (1) 2020-09 (1) 2021-01 (1)