Mercurial > hg
changeset 38474:96f65bdf0bf4
stringutil: add a new function to do minimal regex escaping
Per https://bugs.python.org/issue29995, re.escape() used to
over-escape regular expression strings, but in Python 3.7 that's been
fixed, which also improved the performance of re.escape(). Since it's
both an output change for us *and* a perfomance win, let's just
effectively backport the new behavior to hg on all Python versions.
Differential Revision: https://phab.mercurial-scm.org/D3841
author | Augie Fackler <augie@google.com> |
---|---|
date | Tue, 26 Jun 2018 10:33:52 -0400 |
parents | 622f79e3a1cb |
children | 67dc32d4e790 |
files | mercurial/utils/stringutil.py |
diffstat | 1 files changed, 19 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- a/mercurial/utils/stringutil.py Tue Jun 26 16:14:02 2018 +0530 +++ b/mercurial/utils/stringutil.py Tue Jun 26 10:33:52 2018 -0400 @@ -23,6 +23,25 @@ pycompat, ) +# regex special chars pulled from https://bugs.python.org/issue29995 +# which was part of Python 3.7. +_respecial = pycompat.bytestr(b'()[]{}?*+-|^$\\.# \t\n\r\v\f') +_regexescapemap = {ord(i): (b'\\' + i).decode('latin1') for i in _respecial} + +def reescape(pat): + """Drop-in replacement for re.escape.""" + # NOTE: it is intentional that this works on unicodes and not + # bytes, as it's only possible to do the escaping with + # unicode.translate, not bytes.translate. Sigh. + wantuni = True + if isinstance(pat, bytes): + wantuni = False + pat = pat.decode('latin1') + pat = pat.translate(_regexescapemap) + if wantuni: + return pat + return pat.encode('latin1') + def pprint(o, bprefix=False): """Pretty print an object.""" if isinstance(o, bytes):