module nlp.completion

Inheritance diagram of mlstatpy.nlp.completion

Short summary

module mlstatpy.nlp.completion

About completion

source on GitHub

Classes

class truncated documentation
CompletionTrieNode Node definition in a trie used to do completion, see Complétion. This implementation is not very efficient …

Properties

property truncated documentation
root return the initial node with no parent

Static Methods

staticmethod truncated documentation
build builds a trie

Methods

method truncated documentation
__init__  
__iter__ iterates on all nodes (sorted)
__str__ usual
_add add a child
all_completions retrieve all completions for a node, the method does not need precompute_stat() to be run first
all_mks_completions retrieve all completions for a node, the method assumes precompute_stat() was run
find returns the node which holds all completions starting with a given prefix
items iterates on children, iterates on weight, key, child
items_list all children nodes inluding itself in a list
iter_leaves iterators on leaves sorted per weight, yield weight, value
leaves iterators on leaves
min_dynamic_keystroke returns the dynamic minimum keystrokes for a word,
min_dynamic_keystroke2 returns the modified dynamic minimum keystrokes for a word,
min_keystroke Returns the minimum keystrokes for a word without optimisation, this function should be used if you only have a …
min_keystroke0 returns the minimum keystrokes for a word
precompute_stat computes and stores list of completions for each node, computes mks
str_all_completions builds a string with all completions for all prefixes along the paths
unsorted_iter iterates on all nodes
update_stat_dynamic must be called after precompute_stat() and computes dynamic mks (see Dynamic Minimum Keystroke) …

Documentation

About completion

source on GitHub

class mlstatpy.nlp.completion.CompletionTrieNode(value, leave, weight=1.0, disp=None)[source]

Bases : object

Node definition in a trie used to do completion, see Complétion. This implementation is not very efficient about memmory consumption, it does not hold above 200.000 words. It should be done another way (cython, C++).

source on GitHub

Paramètres:
  • value – value (a character)
  • leave – boolean (is it a completion)
  • weight – ordering (the lower, the first)
  • disp – original string, use this to identify the node

source on GitHub

class _Stat[source]

Bases : object

Stores statistics and intermediate data about the compuation the metrics.

It contains the following members:

  • mks0*: value of minimum keystroke
  • mks0_*: length of the prefix to obtain mks0
  • mks_iter: current iteration during the computation of mks
  • mks1: value of dynamic minimum keystroke
  • mks1_: length of the prefix to obtain mks
  • mks1i_: iteration when it was obtained
  • mks2: value of modified dynamic minimum keystroke
  • mks2_: length of the prefix to obtain mks2
  • mks2i: iteration when it converged

source on GitHub

init_dynamic_minimum_keystroke(lw)[source]

initializes mks and mks2 from from mks0

Paramètres:lw – length of the prefix

source on GitHub

merge_completions(prefix: int, nodes: [CompletionTrieNode])[source]

merges list of completions and cut the list, we assume given lists are sorted

source on GitHub

str_mks() → str[source]

return a string with metric information

source on GitHub

str_mks0() → str[source]

return a string with metric information

source on GitHub

update_dynamic_minimum_keystroke(lw, delta)[source]

update dynamic minimum keystroke for the completions

Paramètres:
Renvoie:

number of updates

source on GitHub

update_minimum_keystroke(lw)[source]

update minimum keystroke for the completions

Paramètres:lw – prefix length

source on GitHub

__init__(value, leave, weight=1.0, disp=None)[source]
Paramètres:
  • value – value (a character)
  • leave – boolean (is it a completion)
  • weight – ordering (the lower, the first)
  • disp – original string, use this to identify the node

source on GitHub

__iter__()[source]

iterates on all nodes (sorted)

source on GitHub

__slots__ = ('value', 'children', 'weight', 'leave', 'stat', 'parent', 'disp')
__str__()[source]

usual

source on GitHub

_add(key, child)[source]

add a child

Paramètres:
  • key – one letter of the word
  • child – child
Renvoie:

self

source on GitHub

all_completions() → List[Tuple[_ForwardRef('CompletionTrieNone'), List[str]]][source]

retrieve all completions for a node, the method does not need precompute_stat to be run first

source on GitHub

all_mks_completions() → List[Tuple[_ForwardRef('CompletionTrieNone'), List[_ForwardRef('CompletionTrieNone')]]][source]

retrieve all completions for a node, the method assumes precompute_stat was run

source on GitHub

static build(words) → mlstatpy.nlp.completion.CompletionTrieNode[source]

builds a trie

Paramètres:words – list of (word) or (weight, word) or (weight, word, display string)
Renvoie:root of the trie (CompletionTrieNode)

source on GitHub

find(prefix: str) → mlstatpy.nlp.completion.CompletionTrieNode[source]

returns the node which holds all completions starting with a given prefix

Paramètres:prefix – prefix
Renvoie:node or None for no result

source on GitHub

items() → Iterator[Tuple[[float, str], _ForwardRef('CompletionTrieNode')]][source]

iterates on children, iterates on weight, key, child

source on GitHub

items_list() → List[_ForwardRef('CompletionTrieNode')][source]

all children nodes inluding itself in a list

Renvoie:list[

source on GitHub

iter_leaves(max_weight=None) → Iterator[Tuple[float, str]][source]

iterators on leaves sorted per weight, yield weight, value

Paramètres:max_weight – keep all value under this threshold or None for all

source on GitHub

leaves() → Iterator[_ForwardRef('CompletionTrieNode')][source]

iterators on leaves

source on GitHub

min_dynamic_keystroke(word: str) → Tuple[int, int][source]

returns the dynamic minimum keystrokes for a word,

Paramètres:word – word
Renvoie:number, length of best prefix, iteration it stops moving

This function must be called after precompute_stat and update_stat_dynamic. See Dynamic Minimum Keystroke.

\begin{eqnarray*}
K(q, k, S) &=& \min\acc{ i | s_i \succ q[1..k], s_i \in S } \\
M'(q, S) &=& \min_{0 \infegal k \infegal l(q)} \acc{ M'(q[1..k], S) + K(q, k, S) | q[1..k] \in S }
\end{eqnarray*}

source on GitHub

min_dynamic_keystroke2(word: str) → Tuple[int, int][source]

returns the modified dynamic minimum keystrokes for a word,

Paramètres:word – word
Renvoie:number, length of best prefix, iteration it stops moving

This function must be called after precompute_stat and update_stat_dynamic. See Modified Dynamic Minimum Keystroke.

\begin{eqnarray*}
K(q, k, S) &=& \min\acc{ i | s_i \succ q[1..k], s_i \in S } \\
M"(q, S) &=& \min \left\{ \begin{array}{l}
                \min_{1 \infegal k \infegal l(q)} \acc{ M"(q[1..k-1], S) + 1 + K(q, k, S) | q[1..k] \in S } \\
                \min_{0 \infegal k \infegal l(q)} \acc{ M"(q[1..k], S) + \delta + K(q, k, S) | q[1..k] \in S }
                \end{array} \right .
\end{eqnarray*}

source on GitHub

min_keystroke(word: str) → Tuple[int, int][source]

Returns the minimum keystrokes for a word without optimisation, this function should be used if you only have a couple of values to computes. You shoud use min_keystroke0 to compute all of them.

Paramètres:word – word
Renvoie:number, length of best prefix

See Problème d’optimisation.

\begin{eqnarray*}
K(q, k, S) &=& \min\acc{ i | s_i \succ q[1..k], s_i \in S } \\
M(q, S) &=& \min_{0 \infegal k \infegal l(q)}  k + K(q, k, S)
\end{eqnarray*}

source on GitHub

min_keystroke0(word: str) → Tuple[int, int][source]

returns the minimum keystrokes for a word

Paramètres:word – word
Renvoie:number, length of best prefix, iteration it stops moving

This function must be called after precompute_stat and update_stat_dynamic.

See Problème d’optimisation.

\begin{eqnarray*}
K(q, k, S) &=& \min\acc{ i | s_i \succ q[1..k], s_i \in S } \\
M(q, S) &=& \min_{0 \infegal k \infegal l(q)}  k + K(q, k, S)
\end{eqnarray*}

source on GitHub

precompute_stat()[source]

computes and stores list of completions for each node, computes mks

Paramètres:clean – clean stat

source on GitHub

root

return the initial node with no parent

source on GitHub

str_all_completions(maxn=10, use_precompute=True) → str[source]

builds a string with all completions for all prefixes along the paths

Paramètres:
  • maxn – maximum number of completions to show
  • use_precompute – use intermediate results built by precompute_stat
Renvoie:

str

source on GitHub

unsorted_iter()[source]

iterates on all nodes

source on GitHub

update_stat_dynamic(delta=0.8)[source]

must be called after precompute_stat and computes dynamic mks (see Dynamic Minimum Keystroke)

Paramètres:delta – parameter \delta in defintion Modified Dynamic KeyStroke
Renvoie:number of iterations to converge

source on GitHub