module mlmodel.categories_to_integers#

Inheritance diagram of mlinsights.mlmodel.categories_to_integers

Short summary#

module mlinsights.mlmodel.categories_to_integers

Implements a transformation which can be put in a pipeline to transform categories in integers.

source on GitHub

Classes#

class

truncated documentation

CategoriesToIntegers

Does something similar to what DictVectorizer

Properties#

property

truncated documentation

_repr_html_

HTML representation of estimator. This is redundant with the logic of _repr_mimebundle_. The latter should …

Methods#

method

truncated documentation

__init__

__str__

usual

_build_schema

Concatenates all the categories given the information stored in _categories.

fit

Makes the list of all categories in input X. X must be a dataframe.

fit_transform

Fits and transforms categories in numerical features based on the list of categories found by method fit. …

transform

Transforms categories in numerical features based on the list of categories found by method fit. X must …

Documentation#

Implements a transformation which can be put in a pipeline to transform categories in integers.

source on GitHub

class mlinsights.mlmodel.categories_to_integers.CategoriesToIntegers(columns=None, remove=None, skip_errors=False, single=False)#

Bases: BaseEstimator, TransformerMixin

Does something similar to what DictVectorizer does but in a transformer. The method fit retains all categories, the method transform transforms categories into integers. Categories are sorted by columns. If the method transform tries to convert a categories which was not seen by method fit, it can raise an exception or ignore it and replace it by zero.

DictVectorizer or CategoriesToIntegers

Example which transforms text into integers:

<<<

import pandas
from mlinsights.mlmodel import CategoriesToIntegers
df = pandas.DataFrame([{"cat": "a"}, {"cat": "b"}])
trans = CategoriesToIntegers()
trans.fit(df)
newdf = trans.transform(df)
print(newdf)

>>>

       cat=a  cat=b
    0    1.0    NaN
    1    NaN    1.0

source on GitHub

Parameters:
  • columns – specify a columns selection

  • remove – modalities to remove

  • skip_errors – skip when a new categories appear (no 1)

  • single – use a single column per category, do not multiply them for each value

The logging function displays a message when a new dense and big matrix is created when it should be sparse. A sparse matrix should be allocated instead.

source on GitHub

__init__(columns=None, remove=None, skip_errors=False, single=False)#
Parameters:
  • columns – specify a columns selection

  • remove – modalities to remove

  • skip_errors – skip when a new categories appear (no 1)

  • single – use a single column per category, do not multiply them for each value

The logging function displays a message when a new dense and big matrix is created when it should be sparse. A sparse matrix should be allocated instead.

source on GitHub

__str__()#

usual

source on GitHub

_build_schema()#

Concatenates all the categories given the information stored in _categories.

Returns:

list of columns, beginning of each

source on GitHub

fit(X, y=None, **fit_params)#

Makes the list of all categories in input X. X must be a dataframe.

Parameters:
  • X – iterable Training data

  • y – iterable, default=None Training targets.

Returns:

self

source on GitHub

fit_transform(X, y=None, **fit_params)#

Fits and transforms categories in numerical features based on the list of categories found by method fit. X must be a dataframe. The function does not preserve the order of the columns.

Parameters:
  • X – iterable Training data

  • y – iterable, default=None Training targets.

Returns:

Dataframe, X with categories.

source on GitHub

transform(X, y=None, **fit_params)#

Transforms categories in numerical features based on the list of categories found by method fit. X must be a dataframe. The function does not preserve the order of the columns.

Parameters:
  • X – iterable Training data

  • y – iterable, default=None Training targets.

Returns:

DataFrame, X with categories.

source on GitHub