.. _examplere2rst: ================ Example with re2 ================ .. only:: html **Links:** :download:`notebook `, :downloadlink:`html `, :download:`PDF `, :download:`python `, :downloadlink:`slides `, :githublink:`GitHub|_doc/notebooks/example_re2.ipynb|*` *wrapclib* wraps the library `re2 `__ using the wrapper `pyre2 `__. .. code:: ipython3 from jyquickhelper import add_notebook_menu add_notebook_menu() .. contents:: :local: .. code:: ipython3 from wrapclib import re2 Example with HTML ----------------- .. code:: ipython3 import re s = "

mot

" print(re.compile("(<.*>)").match(s).groups()) .. parsed-literal:: ('

mot

',) .. code:: ipython3 s = "

mot

" print(re2.compile("(<.*>)").match(s).groups()) .. parsed-literal:: ('

mot

',) Group, Span ----------- .. code:: ipython3 s = """date 0 : 14/9/2000 date 1 : 20/04/1971 date 2 : 14/09/1913 date 3 : 2/3/1978 date 4 : 1/7/1986 date 5 : 7/3/47 date 6 : 15/10/1914 date 7 : 08/03/1941 date 8 : 8/1/1980 date 9 : 30/6/1976""" expression = re2.compile( "([0-3]?[0-9]/[0-1]?[0-9]/([0-2][0-9])?[0-9][0-9])[^\d]") expression.search(s).group(1, 2) .. parsed-literal:: ('14/9/2000', '20') .. code:: ipython3 c = expression.search(s).span(1) s[c[0]:c[1]] .. parsed-literal:: '14/9/2000' Names ----- .. code:: ipython3 date = "05/22/2010" exp = "(?P[0-9]{1,2})/(?P[0-9]{1,2})/(?P((19)|(20))[0-9]{2})" com = re2.compile(exp) print(com.search(date).groupdict()) .. parsed-literal:: {'aa': '2010', 'jj': '05', 'mm': '22'} findall ------- *findall* is not natively implemented in *re2*. It was added. .. code:: ipython3 s = """date 0 : 14/9/2000 date 1 : 20/04/1971 date 2 : 14/09/1913 date 3 : 2/3/1978 date 4 : 1/7/1986 date 5 : 7/3/47 date 6 : 15/10/1914 date 7 : 08/03/1941 date 8 : 8/1/1980 date 9 : 30/6/1976""" expression = re2.compile( "([0-3]?[0-9]/[0-1]?[0-9]/([0-2][0-9])?[0-9][0-9])[^\d]") re2.findall(expression, s) .. parsed-literal:: [('14/9/2000', '20'), ('20/04/1971', '19'), ('14/09/1913', '19'), ('2/3/1978', '19'), ('1/7/1986', '19'), ('7/3/47', None), ('15/10/1914', '19'), ('08/03/1941', '19'), ('8/1/1980', '19')] benchmark --------- .. code:: ipython3 s = """date 0 : 14/9/2000 date 1 : 20/04/1971 date 2 : 14/09/1913 date 3 : 2/3/1978 date 4 : 1/7/1986 date 5 : 7/3/47 date 6 : 15/10/1914 date 7 : 08/03/1941 date 8 : 8/1/1980 date 9 : 30/6/1976""" expression = re.compile( "([0-3]?[0-9]/[0-1]?[0-9]/([0-2][0-9])?[0-9][0-9])[^\d]") %timeit expression.findall(s) .. parsed-literal:: 10.5 µs ± 296 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) .. code:: ipython3 %timeit re2.findall(expression, s) .. parsed-literal:: 18.4 µs ± 1.51 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) That’s expected as method *findall* is implemented in python and not C.