module `mlmodel.categories_to_integers`#

Short summary#

module mlinsights.mlmodel.categories_to_integers

Implements a transformation which can be put in a pipeline to transform categories in integers.

Classes#

class	truncated documentation
`CategoriesToIntegers`	Does something similar to what DictVectorizer …

Properties#

property	truncated documentation
`_repr_html_`	HTML representation of estimator. This is redundant with the logic of _repr_mimebundle_. The latter should …

Methods#

method	truncated documentation
`__init__`
`__str__`	usual
`_build_schema`	Concatenates all the categories given the information stored in _categories.
`fit`	Makes the list of all categories in input X. X must be a dataframe.
`fit_transform`	Fits and transforms categories in numerical features based on the list of categories found by method fit. …
`transform`	Transforms categories in numerical features based on the list of categories found by method fit. X must …

Documentation#

Implements a transformation which can be put in a pipeline to transform categories in integers.

source on GitHub

class mlinsights.mlmodel.categories_to_integers.CategoriesToIntegers(columns=None, remove=None, skip_errors=False, single=False)#

Bases: BaseEstimator, TransformerMixin

Does something similar to what DictVectorizer does but in a transformer. The method fit retains all categories, the method transform transforms categories into integers. Categories are sorted by columns. If the method transform tries to convert a categories which was not seen by method fit, it can raise an exception or ignore it and replace it by zero.

DictVectorizer or CategoriesToIntegers

Example which transforms text into integers:

<<<

import pandas
from mlinsights.mlmodel import CategoriesToIntegers
df = pandas.DataFrame([{"cat": "a"}, {"cat": "b"}])
trans = CategoriesToIntegers()
trans.fit(df)
newdf = trans.transform(df)
print(newdf)

>>>

       cat=a  cat=b
    0    1.0    NaN
    1    NaN    1.0

source on GitHub

Parameters:

columns – specify a columns selection
remove – modalities to remove
skip_errors – skip when a new categories appear (no 1)
single – use a single column per category, do not multiply them for each value

The logging function displays a message when a new dense and big matrix is created when it should be sparse. A sparse matrix should be allocated instead.

source on GitHub

__init__(columns=None, remove=None, skip_errors=False, single=False)#

Parameters:

columns – specify a columns selection
remove – modalities to remove
skip_errors – skip when a new categories appear (no 1)
single – use a single column per category, do not multiply them for each value

The logging function displays a message when a new dense and big matrix is created when it should be sparse. A sparse matrix should be allocated instead.

source on GitHub

__str__()#

usual

source on GitHub

_build_schema()#

Concatenates all the categories given the information stored in _categories.

Returns:: list of columns, beginning of each

source on GitHub

fit(X, y=None, **fit_params)#

Makes the list of all categories in input X. X must be a dataframe.

Parameters:

X – iterable Training data
y – iterable, default=None Training targets.

Returns:

self

source on GitHub

fit_transform(X, y=None, **fit_params)#

Fits and transforms categories in numerical features based on the list of categories found by method fit. X must be a dataframe. The function does not preserve the order of the columns.

Parameters:

X – iterable Training data
y – iterable, default=None Training targets.

Returns:

Dataframe, X with categories.

source on GitHub

transform(X, y=None, **fit_params)#

Transforms categories in numerical features based on the list of categories found by method fit. X must be a dataframe. The function does not preserve the order of the columns.

Parameters:

X – iterable Training data
y – iterable, default=None Training targets.

Returns:

DataFrame, X with categories.

source on GitHub

module mlmodel.categories_to_integers#

Short summary#

Classes#

Properties#

Methods#

Documentation#

module `mlmodel.categories_to_integers`#