Traitement du langage naturel#
Complétion#
mlstatpy.nlp.CompletionElement
(self, value, weight = 1.0, disp = None)
Definition of an element in a completion system, it contains the following members:
value: the completion
weight: a weight or a position, we assume a completion with a lower weight is shown at a lower position
disp: display string (no impact on the algorithm)
mks0*: value of minimum keystroke
mks0_*: length of the prefix to obtain mks0
mks1: value of dynamic minimum keystroke
mks1_: length of the prefix to obtain mks1
mks2: value of modified dynamic minimum keystroke
mks2_: length of the prefix to obtain mks2
empty_prefix
()return an instance filled with an empty prefix
init_metrics
(self, position, completions = None)initiate the metrics
str_all_completions
(self, maxn = 10, use_precompute = True)builds a string with all completions for all prefixes along the paths, this is only available if parameter completions was used when calling method
update_metrics
.
str_mks
(self)return a string with metric information
str_mks0
(self)return a string with metric information
update_metrics
(self, prefix, position, improved, delta, completions = None, iteration = -1)update the metrics
mlstatpy.nlp.CompletionSystem
(self, elements)
define a completion system
compare_with_trie
(self, delta = 0.8, fLOG = <function noLOG at 0x7fb8b202bb80>)Compares the results with the other implementation.
compute_metrics
(self, ffilter = None, delta = 0.8, details = False, fLOG = <function noLOG at 0x7fb8b202bb80>)Computes the metric for the completion itself.
enumerate_test_metric
(self, qset)Evaluates the completion set on a set of queries, the function returns a list of
CompletionElement
with the three metrics,
,
for these particular queries.
find
(self, value, is_sorted = False)Not very efficient, finds an item in a the list.
items
(self)Iterates on
(e.value, e)
.
sort_values
(self)sort the elements by value
sort_weight
(self)Sorts the elements by value.
test_metric
(self, qset)Evaluates the completion set on a set of queries, the function returns a dictionary with the aggregated metrics and some statistics about them.
to_dict
(self)Returns a dictionary.
tuples
(self)Iterates on
(e.weight, e.value)
.
Normalisation#
mlstatpy.data.wikipedia.normalize_wiki_text
(text)
Normalizes a text such as a wikipedia title.
mlstatpy.nlp.remove_diacritics
(input_str)
remove diacritics