Ngram Hash Extractor TransformΒΆ

The documentation is generated based on the sources available at dotnet/machinelearning and released under MIT License.

Type: ngramextractorfactory Aliases: NgramHashExtractorTransform, NgramHash, NgramHashExtractor Namespace: Microsoft.ML.Transforms.Text Assembly: Microsoft.ML.Transforms.dll Microsoft Documentation: Ngram Hash Extractor Transform

Description

A transform that turns a collection of tokenized text (vector of ReadOnlyMemory) into numerical feature vectors using the hashing trick.

Parameters

Name Short name Default Description
allLengths all True Whether to include all ngram lengths up to ngramLength or only ngramLength
hashBits bits 16 Number of bits to hash into. Must be between 1 and 30, inclusive.
invertHash ih 0 Limit the number of keys used to generate the slot name to this many. 0 means no invert hashing, -1 means no limit.
ngramLength ngram 1 Ngram length
ordered ord True Whether the position of each source column should be included in the hash (when there are multiple source columns).
seed   314489979 Hashing seed
skipLength skips 0 Maximum number of tokens to skip when constructing an ngram