Traitement du langage naturel#

Complétion #

mlstatpy.nlp.CompletionElement (self, value, weight = 1.0, disp = None)

Definition of an element in a completion system, it contains the following members:

value: the completion

weight: a weight or a position, we assume a completion with a lower weight is shown at a lower position

disp: display string (no impact on the algorithm)

mks0*: value of minimum keystroke

mks0_*: length of the prefix to obtain mks0

mks1: value of dynamic minimum keystroke

mks1_: length of the prefix to obtain mks1

mks2: value of modified dynamic minimum keystroke

mks2_: length of the prefix to obtain mks2

empty_prefix ()

return an instance filled with an empty prefix

init_metrics (self, position, completions = None)

initiate the metrics

str_all_completions (self, maxn = 10, use_precompute = True)

builds a string with all completions for all prefixes along the paths, this is only available if parameter completions was used when calling method update_metrics.

str_mks (self)

return a string with metric information

str_mks0 (self)

return a string with metric information

update_metrics (self, prefix, position, improved, delta, completions = None, iteration = -1)

update the metrics

mlstatpy.nlp.CompletionSystem (self, elements)

define a completion system

compare_with_trie (self, delta = 0.8, fLOG = <function noLOG at 0x7fb8b202bb80>)

Compares the results with the other implementation.

compute_metrics (self, ffilter = None, delta = 0.8, details = False, fLOG = <function noLOG at 0x7fb8b202bb80>)

Computes the metric for the completion itself.

enumerate_test_metric (self, qset)

Evaluates the completion set on a set of queries, the function returns a list of CompletionElement with the three metrics $M$ , $M'$ , $M"$ for these particular queries.

find (self, value, is_sorted = False)

Not very efficient, finds an item in a the list.

items (self)

Iterates on (e.value, e).

sort_values (self)

sort the elements by value

sort_weight (self)

Sorts the elements by value.

test_metric (self, qset)

Evaluates the completion set on a set of queries, the function returns a dictionary with the aggregated metrics and some statistics about them.

to_dict (self)

Returns a dictionary.

tuples (self)

Iterates on (e.weight, e.value).

Normalisation #

mlstatpy.data.wikipedia.normalize_wiki_text (text)

Normalizes a text such as a wikipedia title.

mlstatpy.nlp.remove_diacritics (input_str)

remove diacritics