module `nlp.completion_simple`#

Short summary#

module mlstatpy.nlp.completion_simple

About completion, simple algorithm

Classes#

class	truncated documentation
`CompletionElement`	Definition of an element in a completion system, it contains the following members:
`CompletionSystem`	define a completion system

Static Methods#

staticmethod	truncated documentation
`empty_prefix`	return an instance filled with an empty prefix

Methods#

method	truncated documentation
`__getitem__`	Returns `elements[i]`.
`__init__`	constructor
`__init__`	fill the completion system
`__iter__`	Iterates over elements.
`__len__`	Number of elements.
`__repr__`	usual
`compare_with_trie`	Compares the results with the other implementation.
`compute_metrics`	Computes the metric for the completion itself.
`enumerate_test_metric`	Evaluates the completion set on a set of queries, the function returns a list of `CompletionElement` …
`find`	Not very efficient, finds an item in a the list.
`init_metrics`	initiate the metrics
`items`	Iterates on `(e.value, e)`.
`sort_values`	sort the elements by value
`sort_weight`	Sorts the elements by value.
`str_all_completions`	builds a string with all completions for all prefixes along the paths, this is only available if parameter …
`str_mks`	return a string with metric information
`str_mks0`	return a string with metric information
`test_metric`	Evaluates the completion set on a set of queries, the function returns a dictionary with the aggregated metrics …
`to_dict`	Returns a dictionary.
`tuples`	Iterates on `(e.weight, e.value)`.
`update_metrics`	update the metrics

Documentation#

About completion, simple algorithm

source on GitHub

class mlstatpy.nlp.completion_simple.CompletionElement(value: str, weight=1.0, disp=None)#

Bases : object

Definition of an element in a completion system, it contains the following members:

value: the completion
weight: a weight or a position, we assume a completion with a lower weight is shown at a lower position
disp: display string (no impact on the algorithm)
mks0*: value of minimum keystroke
mks0_*: length of the prefix to obtain mks0
mks1: value of dynamic minimum keystroke
mks1_: length of the prefix to obtain mks1
mks2: value of modified dynamic minimum keystroke
mks2_: length of the prefix to obtain mks2

source on GitHub

constructor

Paramètres:

value – value (a character)
weight – ordering (the lower, the first)
disp – original string, use this to identify the node

source on GitHub

__init__(value: str, weight=1.0, disp=None)#

constructor

Paramètres:

value – value (a character)
weight – ordering (the lower, the first)
disp – original string, use this to identify the node

source on GitHub

__repr__()#

usual

source on GitHub

__slots__ = ('value', 'weight', 'disp', 'mks0', 'mks0_', 'mks1', 'mks1_', 'mks2', 'mks2_', 'prefix', '_info')#

_info#

static empty_prefix()#

return an instance filled with an empty prefix

source on GitHub

init_metrics(position: int, completions: List[CompletionElement] | None = None)#

initiate the metrics

Paramètres:

position – position in the completion system when prefix is null, position starting from 0
completions – displayed completions, if not None, the method will store them in member _completions

Renvoie:

boolean which indicates there was an update

source on GitHub

str_all_completions(maxn=10, use_precompute=True) → str#

builds a string with all completions for all prefixes along the paths, this is only available if parameter completions was used when calling method update_metrics.

Paramètres:

maxn – maximum number of completions to show
use_precompute – use intermediate results built by precompute_stat

Renvoie:

str

source on GitHub

str_mks() → str#

return a string with metric information

source on GitHub

str_mks0() → str#

return a string with metric information

source on GitHub

update_metrics(prefix: str, position: int, improved: dict, delta: float, completions: List[CompletionElement] | None = None, iteration=-1)#

update the metrics

Paramètres:

prefix – prefix
position – position in the completion system when prefix has length k, position starting from 0
improved – if one metrics is < to the completion length, it means it can be used to improve others queries
delta – delta in the dynamic modified mks
completions – displayed completions, if not None, the method will store them in member _completions
iteration – for debugging purpose, indicates when this improvment was detected

Renvoie:

boolean which indicates there was an update

source on GitHub

class mlstatpy.nlp.completion_simple.CompletionSystem(elements: List[CompletionElement])#

Bases : object

define a completion system

source on GitHub

fill the completion system

source on GitHub

__getitem__(i)#

Returns elements[i].

source on GitHub

__init__(elements: List[CompletionElement])#

fill the completion system

source on GitHub

__iter__() → Iterator[CompletionElement]#

Iterates over elements.

source on GitHub

__len__() → int#

Number of elements.

source on GitHub

compare_with_trie(delta=0.8, fLOG=<function noLOG>)#

Compares the results with the other implementation.

Paramètres:

delta – parameter delta in the dynamic modified mks
fLOG – logging function

Renvoie:

None or differences

source on GitHub

compute_metrics(ffilter=None, delta=0.8, details=False, fLOG=<function noLOG>) → int#

Computes the metric for the completion itself.

Paramètres:

ffilter – filter function
delta – parameter delta in the dynamic modified mks
details – log more details about displayed completions
fLOG – logging function

Renvoie:

number of iterations

The function ends by sorting the set of completion by alphabetical order.

source on GitHub

enumerate_test_metric(qset: Iterator[Tuple[str, float]]) → Iterator[Tuple[CompletionElement, CompletionElement]]#

Evaluates the completion set on a set of queries, the function returns a list of CompletionElement with the three metrics $M$ , $M'$ , $M"$ for these particular queries.

Paramètres:: qset – list of tuple(str, float) = (query, weight)
Renvoie:: list of tuple of CompletionElement, the first one is the query, the second one is the None or the matching completion

The method compute_metric() needs to be called first.

source on GitHub

find(value: str, is_sorted=False) → CompletionElement#