module nlp.completion

Inheritance diagram of mlstatpy.nlp.completion

Short summary

module mlstatpy.nlp.completion

About completion

source on GitHub

Classes

class

truncated documentation

CompletionTrieNode

Node definition in a trie used to do completion, see Complétion. This implementation is not very efficient …

Properties

property

truncated documentation

root

Returns the initial node with no parent.

Static Methods

staticmethod

truncated documentation

build

Builds a trie.

Methods

method

truncated documentation

__init__

__iter__

Iterates on all nodes (sorted).

__str__

usual

_add

Adds a child.

all_completions

Retrieves all completions for a node, the method does not need precompute_stat() to be run first.

all_mks_completions

Retrieves all completions for a node, the method assumes precompute_stat() was run.

find

Returns the node which holds all completions starting with a given prefix.

items

Iterates on children, iterates on weight, key, child.

items_list

All children nodes inluding itself in a list.

iter_leaves

Iterators on leaves sorted per weight, yield weight, value.

leaves

Iterators on leaves.

min_dynamic_keystroke

Returns the dynamic minimum keystrokes for a word.

min_dynamic_keystroke2

Returns the modified dynamic minimum keystrokes for a word.

min_keystroke

Returns the minimum keystrokes for a word without optimisation, this function should be used if you only have a …

min_keystroke0

Returns the minimum keystrokes for a word.

precompute_stat

Computes and stores list of completions for each node, computes mks.

str_all_completions

Builds a string with all completions for all prefixes along the paths.

unsorted_iter

Iterates on all nodes.

update_stat_dynamic

Must be called after precompute_stat() and computes dynamic mks (see Dynamic Minimum Keystroke). …

Documentation

About completion

source on GitHub

class mlstatpy.nlp.completion.CompletionTrieNode(value, leave, weight=1.0, disp=None)[source]

Bases : object

Node definition in a trie used to do completion, see Complétion. This implementation is not very efficient about memmory consumption, it does not hold above 200.000 words. It should be done another way (cython, C++).

source on GitHub

Paramètres
  • value – value (a character)

  • leave – boolean (is it a completion)

  • weight – ordering (the lower, the first)

  • disp – original string, use this to identify the node

source on GitHub

class _Stat[source]

Bases : object

Stores statistics and intermediate data about the compuation the metrics.

It contains the following members:

  • mks0*: value of minimum keystroke

  • mks0_*: length of the prefix to obtain mks0

  • mks_iter: current iteration during the computation of mks

  • mks1: value of dynamic minimum keystroke

  • mks1_: length of the prefix to obtain mks

  • mks1i_: iteration when it was obtained

  • mks2: value of modified dynamic minimum keystroke

  • mks2_: length of the prefix to obtain mks2

  • mks2i: iteration when it converged

source on GitHub

init_dynamic_minimum_keystroke(lw)[source]

Initializes mks and mks2 from from mks0.

Paramètres

lw – length of the prefix

source on GitHub

merge_completions(prefix: int, nodes: [CompletionTrieNode])[source]

Merges list of completions and cut the list, we assume given lists are sorted.

source on GitHub

str_mks() → str[source]

Returns a string with metric information.

source on GitHub

str_mks0() → str[source]

Returns a string with metric information.

source on GitHub

update_dynamic_minimum_keystroke(lw, delta)[source]

Updates dynamic minimum keystroke for the completions.

Paramètres
Renvoie

number of updates

source on GitHub

update_minimum_keystroke(lw)[source]

Updates minimum keystroke for the completions.

Paramètres

lw – prefix length

source on GitHub

__init__(value, leave, weight=1.0, disp=None)[source]
Paramètres
  • value – value (a character)

  • leave – boolean (is it a completion)

  • weight – ordering (the lower, the first)

  • disp – original string, use this to identify the node

source on GitHub

__iter__()[source]

Iterates on all nodes (sorted).

source on GitHub

__slots__ = ('value', 'children', 'weight', 'leave', 'stat', 'parent', 'disp')
__str__()[source]

usual

source on GitHub

_add(key, child)[source]

Adds a child.

Paramètres
  • key – one letter of the word

  • child – child

Renvoie

self

source on GitHub

all_completions() → List[Tuple[CompletionTrieNone, List[str]]][source]

Retrieves all completions for a node, the method does not need precompute_stat to be run first.

source on GitHub

all_mks_completions() → List[Tuple[CompletionTrieNone, List[CompletionTrieNone]]][source]

Retrieves all completions for a node, the method assumes precompute_stat was run.

source on GitHub

static build(words) → mlstatpy.nlp.completion.CompletionTrieNode[source]

Builds a trie.

Paramètres

words – list of (word) or (weight, word) or (weight, word, display string)

Renvoie

root of the trie (CompletionTrieNode)

source on GitHub

find(prefix: str) → mlstatpy.nlp.completion.CompletionTrieNode[source]

Returns the node which holds all completions starting with a given prefix.

Paramètres

prefix – prefix

Renvoie

node or None for no result

source on GitHub

items() → Iterator[Tuple[float, str, mlstatpy.nlp.completion.CompletionTrieNode]][source]

Iterates on children, iterates on weight, key, child.

source on GitHub

items_list() → List[mlstatpy.nlp.completion.CompletionTrieNode][source]

All children nodes inluding itself in a list.

Renvoie

list[

source on GitHub

iter_leaves(max_weight=None) → Iterator[Tuple[float, str]][source]

Iterators on leaves sorted per weight, yield weight, value.

Paramètres

max_weight – keep all value under this threshold or None for all

source on GitHub

leaves() → Iterator[mlstatpy.nlp.completion.CompletionTrieNode][source]

Iterators on leaves.

source on GitHub

min_dynamic_keystroke(word: str) → Tuple[int, int][source]

Returns the dynamic minimum keystrokes for a word.

Paramètres

word – word

Renvoie

number, length of best prefix, iteration it stops moving

This function must be called after precompute_stat and update_stat_dynamic. See Dynamic Minimum Keystroke.

\begin{eqnarray*}
K(q, k, S) &=& \min\acc{ i | s_i \succ q[1..k], s_i \in S } \\
M'(q, S) &=& \min_{0 \infegal k \infegal l(q)} \acc{ M'(q[1..k], S) + K(q, k, S) | q[1..k] \in S }
\end{eqnarray*}

source on GitHub

min_dynamic_keystroke2(word: str) → Tuple[int, int][source]

Returns the modified dynamic minimum keystrokes for a word.

Paramètres

word – word

Renvoie

number, length of best prefix, iteration it stops moving

This function must be called after precompute_stat and update_stat_dynamic. See Modified Dynamic Minimum Keystroke.

\begin{eqnarray*}
K(q, k, S) &=& \min\acc{ i | s_i \succ q[1..k], s_i \in S } \\
M"(q, S) &=& \min \left\{ \begin{array}{l}
                \min_{1 \infegal k \infegal l(q)} \acc{ M"(q[1..k-1], S) + 1 + K(q, k, S) | q[1..k] \in S } \\
                \min_{0 \infegal k \infegal l(q)} \acc{ M"(q[1..k], S) + \delta + K(q, k, S) | q[1..k] \in S }
                \end{array} \right .
\end{eqnarray*}

source on GitHub

min_keystroke(word: str) → Tuple[int, int][source]

Returns the minimum keystrokes for a word without optimisation, this function should be used if you only have a couple of values to computes. You shoud use min_keystroke0 to compute all of them.

Paramètres

word – word

Renvoie

number, length of best prefix

See Problème d’optimisation.

\begin{eqnarray*}
K(q, k, S) &=& \min\acc{ i | s_i \succ q[1..k], s_i \in S } \\
M(q, S) &=& \min_{0 \infegal k \infegal l(q)}  k + K(q, k, S)
\end{eqnarray*}

source on GitHub

min_keystroke0(word: str) → Tuple[int, int][source]

Returns the minimum keystrokes for a word.

Paramètres

word – word

Renvoie

number, length of best prefix, iteration it stops moving

This function must be called after precompute_stat and update_stat_dynamic.

See Problème d’optimisation.

\begin{eqnarray*}
K(q, k, S) &=& \min\acc{ i | s_i \succ q[1..k], s_i \in S } \\
M(q, S) &=& \min_{0 \infegal k \infegal l(q)}  k + K(q, k, S)
\end{eqnarray*}

source on GitHub

precompute_stat()[source]

Computes and stores list of completions for each node, computes mks.

Paramètres

clean – clean stat

source on GitHub

root

Returns the initial node with no parent.

source on GitHub

str_all_completions(maxn=10, use_precompute=True) → str[source]

Builds a string with all completions for all prefixes along the paths.

Paramètres
  • maxn – maximum number of completions to show

  • use_precompute – use intermediate results built by precompute_stat

Renvoie

str

source on GitHub

unsorted_iter()[source]

Iterates on all nodes.

source on GitHub

update_stat_dynamic(delta=0.8)[source]

Must be called after precompute_stat and computes dynamic mks (see Dynamic Minimum Keystroke).

Paramètres

delta – parameter \delta in defintion Modified Dynamic KeyStroke

Renvoie

number of iterations to converge

source on GitHub