# 2019-03-25 Nogil, numpy, cython¶

I had to implement a custom criterion to optimize
a decision tree and I wanted to leverage scikit-learn
instead of rewriting my own. Version 0.21 of scikit-learn
introduced some changed in the API which make possible
to overload an existing criterion and replace some of the logic
by another one: _criterion.pyx.
The purpose was to show that a fast implementation requires
some tricks (see None) and
piecewise_tree_regression_criterion.pyx,
piecewise_tree_regression_criterion_fast.pyx
for the code. Other than that, every function to overlaod is marked as
nogil. Every function or method marked as *nogil* cannot
go through the GIL (see also PEP-0311),
which no python object can be created in that method.
In fact, no python can be called inside a Cython
method protected with *nogil*. The issue with that is that
any numpy method cannot be called.

My goal was to replace the implemention of the decision tree
criterion by something optimizing a linear regression, basically
something close to function numpy.linalg.lstsq but that’s
inside numpy so unavailable in a *nogil* method.
I needed to use the inner API from BLAS or LAPACK
and available in cython through
cython_blas
(matrix operations)
cython_lapack
(complex matrix operations).
It is fast but not really well documented…
I needed to use function dgelss
(same from scipy scipy.linalg.lapack.dgelss).
which documentation is available through Lapack documentation.
Its signature can be found at cython_lapack_signatures.txt
and is the following:

```
cdef void dgelss(int *m, int *n, int *nrhs, double *a, int *lda, double *b, int *ldb,
double *s, double *rcond, int *rank,
double *work, int *lwork, int *info) nogil
```

I tried and it failed many times before getting it correctly.
Most of the time, python just crashes without telling me
what the issue is. I had to use many `printf`

in the cython
code to get it right (remember no python call, so no *print* function).
These function do not do any allocation, every needed buffer
must be allocated first. The documentation gives some recommendations
about the optimal buffer size. The function usually modifies the inputs,
they must be copied first if the user wants to reuse them later.
I finally implemented `dgelss`

or
on github: direct_blas_lapack.pyx.