Word Tokenizer Transform

The documentation is generated based on the sources available at dotnet/machinelearning and released under MIT License.

Type: datatransform Aliases: WordTokenizeTransform, DelimitedTokenizeTransform, WordToken, DelimitedTokenize, Token Namespace: Microsoft.ML.Runtime.Data Assembly: Microsoft.ML.Transforms.dll Microsoft Documentation: Word Tokenizer Transform

Description

The input to this transform is text, and the output is a vector of text containing the words (tokens) in the original text. The separator is space, but can be specified as any other character (or multiple characters) if needed.

Parameters

Name Short name Default Description
column col   New column definition(s)
termSeparators sep space Comma separated set of term separator(s). Commonly: ‘space’, ‘comma’, ‘semicolon’ or other single character.