module onnxrt.ops_cpu.op_tokenizer#

Inheritance diagram of mlprodict.onnxrt.ops_cpu.op_tokenizer

Short summary#

module mlprodict.onnxrt.ops_cpu.op_tokenizer

Runtime operator.

source on GitHub

Classes#

class

truncated documentation

Tokenizer

See Tokenizer.

TokenizerSchema

Defines a schema for operators added in this package such as TreeEnsembleClassifierDouble.

Properties#

property

truncated documentation

args_default

Returns the list of arguments as well as the list of parameters with the default values (close to the signature). …

args_default_modified

Returns the list of modified parameters.

args_mandatory

Returns the list of optional arguments.

args_optional

Returns the list of optional arguments.

atts_value

Returns all parameters in a dictionary.

Methods#

method

truncated documentation

__init__

__init__

_find_custom_operator_schema

_infer_shapes

_infer_types

_run

_run_char_tokenization

Tokenizes y charaters.

_run_regex_tokenization

Tokenizes using separators. The function should use a trie to find text.

_run_sep_tokenization

Tokenizes using separators. The function should use a trie to find text.

_run_tokenization

Tokenizes a char level.

Documentation#

Runtime operator.

source on GitHub

class mlprodict.onnxrt.ops_cpu.op_tokenizer.Tokenizer(onnx_node, desc=None, **options)#

Bases: OpRunUnary

See Tokenizer.

source on GitHub

__init__(onnx_node, desc=None, **options)#
_find_custom_operator_schema(op_name)#
_infer_shapes(x)#

Returns the same shape by default.

source on GitHub

_infer_types(x)#

Returns the same type by default.

source on GitHub

_run(text, attributes=None, verbose=0, fLOG=None)#

Should be overwritten.

source on GitHub

_run_char_tokenization(text, stops)#

Tokenizes y charaters.

source on GitHub

_run_regex_tokenization(text, stops, exp)#

Tokenizes using separators. The function should use a trie to find text.

source on GitHub

_run_sep_tokenization(text, stops, separators)#

Tokenizes using separators. The function should use a trie to find text.

source on GitHub

_run_tokenization(text, stops, split)#

Tokenizes a char level.

source on GitHub

class mlprodict.onnxrt.ops_cpu.op_tokenizer.TokenizerSchema#

Bases: OperatorSchema

Defines a schema for operators added in this package such as TreeEnsembleClassifierDouble.

source on GitHub

__init__()#