Traitement du langage naturel

Complétion

mlstatpy.nlp.CompletionElement (self, value, weight = 1.0, disp = None)

Definition of an element in a completion system, it contains the following members:

  • value: the completion
  • weight: a weight or a position, we assume a completion with a lower weight is shown at a lower position
  • disp: display string (no impact on the algorithm)
  • mks0*: value of minimum keystroke
  • mks0_*: length of the prefix to obtain mks0
  • mks1: value of dynamic minimum keystroke
  • mks1_: length of the prefix to obtain mks1
  • mks2: value of modified dynamic minimum keystroke
  • mks2_: length of the prefix to obtain mks2

empty_prefix ()

return an instance filled with an empty prefix

init_metrics (self, position, completions = None)

initiate the metrics

str_all_completions (self, maxn = 10, use_precompute = True)

builds a string with all completions for all prefixes along the paths, this is only available if parameter completions was used when calling method update_metrics.

str_mks (self)

return a string with metric information

str_mks0 (self)

return a string with metric information

update_metrics (self, prefix, position, improved, delta, completions = None, iteration = -1)

update the metrics

mlstatpy.nlp.CompletionSystem (self, elements)

define a completion system

compare_with_trie (self, delta = 0.8, fLOG = <function noLOG at 0x7f84c9cf3158>)

compare the results with the other implementation

compute_metrics (self, ffilter = None, delta = 0.8, details = False, fLOG = <function noLOG at 0x7f84c9cf3158>)

Compute the metric for the completion itself.

enumerate_test_metric (self, qset)

Evaluate the completion set on a set of queries, the function returns a list of CompletionElement with the three metrics M, M', M" for these particular queries

find (self, value, is_sorted = False)

not very efficient, find an item in a the list

items (self)

iterate on (e.value, e)

sort_values (self)

sort the elements by value

sort_weight (self)

sort the elements by value

test_metric (self, qset)

evaluate the completion set on a set of queries, the function returns a dictionary with the aggregated metrics and some statisitcs about them

to_dict (self)

return a dictionary

tuples (self)

iterate on (e.weight, e.value)

Normalisation

mlstatpy.data.wikipedia.normalize_wiki_text (text)

Normalizes a text such as a wikipedia title.

mlstatpy.nlp.remove_diacritics (input_str)

remove diacritics