module nlp.completion#

Inheritance diagram of mlstatpy.nlp.completion

Short summary#

module mlstatpy.nlp.completion

About completion

source on GitHub

Classes#

class

truncated documentation

CompletionTrieNode

Node definition in a trie used to do completion, see Complétion. This implementation is not very efficient …

Properties#

property

truncated documentation

root

Returns the initial node with no parent.

Static Methods#

staticmethod

truncated documentation

build

Builds a trie.

Methods#

method

truncated documentation

__init__

__iter__

Iterates on all nodes (sorted).

__str__

usual

_add

Adds a child.

all_completions

Retrieves all completions for a node, the method does not need precompute_stat() to be run first.

all_mks_completions

Retrieves all completions for a node, the method assumes precompute_stat() was run.

find

Returns the node which holds all completions starting with a given prefix.

items

Iterates on children, iterates on weight, key, child.

items_list

All children nodes inluding itself in a list.

iter_leaves

Iterators on leaves sorted per weight, yield weight, value.

leaves

Iterators on leaves.

min_dynamic_keystroke

Returns the dynamic minimum keystrokes for a word.

min_dynamic_keystroke2

Returns the modified dynamic minimum keystrokes for a word.

min_keystroke

Returns the minimum keystrokes for a word without optimisation, this function should be used if you only have a …

min_keystroke0

Returns the minimum keystrokes for a word.

precompute_stat

Computes and stores list of completions for each node, computes mks.

str_all_completions

Builds a string with all completions for all prefixes along the paths.

unsorted_iter

Iterates on all nodes.

update_stat_dynamic

Must be called after precompute_stat() and computes dynamic mks (see Dynamic Minimum Keystroke). …

Documentation#

About completion

source on GitHub

class mlstatpy.nlp.completion.CompletionTrieNode(value, leave, weight=1.0, disp=None)#

Bases : object

Node definition in a trie used to do completion, see Complétion. This implementation is not very efficient about memmory consumption, it does not hold above 200.000 words. It should be done another way (cython, C++).

source on GitHub

Paramètres:
  • value – value (a character)

  • leave – boolean (is it a completion)

  • weight – ordering (the lower, the first)

  • disp – original string, use this to identify the node

source on GitHub

class _Stat#

Bases : object

Stores statistics and intermediate data about the compuation the metrics.

It contains the following members:

  • mks0*: value of minimum keystroke

  • mks0_*: length of the prefix to obtain mks0

  • mks_iter: current iteration during the computation of mks

  • mks1: value of dynamic minimum keystroke

  • mks1_: length of the prefix to obtain mks

  • mks1i_: iteration when it was obtained

  • mks2: value of modified dynamic minimum keystroke

  • mks2_: length of the prefix to obtain mks2

  • mks2i: iteration when it converged

source on GitHub

init_dynamic_minimum_keystroke(lw)#

Initializes mks and mks2 from from mks0.

Paramètres:

lw – length of the prefix

source on GitHub

merge_completions(prefix: int, nodes: [CompletionTrieNode])#

Merges list of completions and cut the list, we assume given lists are sorted.

source on GitHub

str_mks() str#

Returns a string with metric information.

source on GitHub

str_mks0() str#

Returns a string with metric information.

source on GitHub

update_dynamic_minimum_keystroke(lw, delta)#

Updates dynamic minimum keystroke for the completions.

Paramètres:
Renvoie:

number of updates

source on GitHub

update_minimum_keystroke(lw)#

Updates minimum keystroke for the completions.

Paramètres:

lw – prefix length

source on GitHub

__init__(value, leave, weight=1.0, disp=None)#
Paramètres:
  • value – value (a character)

  • leave – boolean (is it a completion)

  • weight – ordering (the lower, the first)

  • disp – original string, use this to identify the node

source on GitHub

__iter__()#

Iterates on all nodes (sorted).

source on GitHub

__slots__ = ('value', 'children', 'weight', 'leave', 'stat', 'parent', 'disp')#
__str__()#

usual

source on GitHub

_add(key, child)#

Adds a child.

Paramètres:
  • key – one letter of the word

  • child – child

Renvoie:

self

source on GitHub

all_completions() List[Tuple[CompletionTrieNone, List[str]]]#

Retrieves all completions for a node, the method does not need precompute_stat to be run first.

source on GitHub

all_mks_completions() List[Tuple[CompletionTrieNone, List[CompletionTrieNone]]]#

Retrieves all completions for a node, the method assumes precompute_stat was run.

source on GitHub

static build(words) CompletionTrieNode#

Builds a trie.

Paramètres:

words – list of (word) or (weight, word) or (weight, word, display string)

Renvoie:

root of the trie (CompletionTrieNode)

source on GitHub

find(prefix: str) CompletionTrieNode#

Returns the node which holds all completions starting with a given prefix.

Paramètres:

prefix – prefix

Renvoie:

node or None for no result

source on GitHub

items() Iterator[Tuple[float, str, CompletionTrieNode]]#

Iterates on children, iterates on weight, key, child.

source on GitHub

items_list() List[CompletionTrieNode]#

All children nodes inluding itself in a list.

Renvoie:

list[

source on GitHub

iter_leaves(max_weight=None) Iterator[Tuple[float, str]]#

Iterators on leaves sorted per weight, yield weight, value.

Paramètres:

max_weight – keep all value under this threshold or None for all

source on GitHub

leaves() Iterator[CompletionTrieNode]#

Iterators on leaves.

source on GitHub

min_dynamic_keystroke(word: str) Tuple[int, int]#

Returns the dynamic minimum keystrokes for a word.

Paramètres:

word – word

Renvoie:

number, length of best prefix, iteration it stops moving

This function must be called after precompute_stat and update_stat_dynamic. See Dynamic Minimum Keystroke.

\begin{eqnarray*}
K(q, k, S) &=& \min\acc{ i | s_i \succ q[1..k], s_i \in S } \\
M'(q, S) &=& \min_{0 \infegal k \infegal l(q)} \acc{ M'(q[1..k], S) + K(q, k, S) | q[1..k] \in S }
\end{eqnarray*}

source on GitHub

min_dynamic_keystroke2(word: str) Tuple[int, int]#

Returns the modified dynamic minimum keystrokes for a word.

Paramètres:

word – word

Renvoie:

number, length of best prefix, iteration it stops moving

This function must be called after precompute_stat and update_stat_dynamic. See Modified Dynamic Minimum Keystroke.

\begin{eqnarray*}
K(q, k, S) &=& \min\acc{ i | s_i \succ q[1..k], s_i \in S } \\
M"(q, S) &=& \min \left\{ \begin{array}{l}
                \min_{1 \infegal k \infegal l(q)} \acc{ M"(q[1..k-1], S) + 1 + K(q, k, S) | q[1..k] \in S } \\
                \min_{0 \infegal k \infegal l(q)} \acc{ M"(q[1..k], S) + \delta + K(q, k, S) | q[1..k] \in S }
                \end{array} \right .
\end{eqnarray*}

source on GitHub

min_keystroke(word: str) Tuple[int, int]#

Returns the minimum keystrokes for a word without optimisation, this function should be used if you only have a couple of values to computes. You shoud use min_keystroke0 to compute all of them.

Paramètres:

word – word

Renvoie:

number, length of best prefix

See Problème d’optimisation.

\begin{eqnarray*}
K(q, k, S) &=& \min\acc{ i | s_i \succ q[1..k], s_i \in S } \\
M(q, S) &=& \min_{0 \infegal k \infegal l(q)}  k + K(q, k, S)
\end{eqnarray*}

source on GitHub

min_keystroke0(word: str) Tuple[int, int]#

Returns the minimum keystrokes for a word.

Paramètres:

word – word

Renvoie:

number, length of best prefix, iteration it stops moving

This function must be called after precompute_stat and update_stat_dynamic.

See Problème d’optimisation.

\begin{eqnarray*}
K(q, k, S) &=& \min\acc{ i | s_i \succ q[1..k], s_i \in S } \\
M(q, S) &=& \min_{0 \infegal k \infegal l(q)}  k + K(q, k, S)
\end{eqnarray*}

source on GitHub

precompute_stat()#

Computes and stores list of completions for each node, computes mks.

Paramètres:

clean – clean stat

source on GitHub

property root#

Returns the initial node with no parent.

source on GitHub

str_all_completions(maxn=10, use_precompute=True) str#

Builds a string with all completions for all prefixes along the paths.

Paramètres:
  • maxn – maximum number of completions to show

  • use_precompute – use intermediate results built by precompute_stat

Renvoie:

str

source on GitHub

unsorted_iter()#

Iterates on all nodes.

source on GitHub

update_stat_dynamic(delta=0.8)#

Must be called after precompute_stat and computes dynamic mks (see Dynamic Minimum Keystroke).

Paramètres:

delta – parameter \delta in defintion Modified Dynamic KeyStroke

Renvoie:

number of iterations to converge

source on GitHub