Mercurial > hg
comparison mercurial/encoding.py @ 23596:885bd7c5c7e3 stable
encoding: add hfsignoreclean to clean out HFS-ignored characters
According to Apple Technote 1150 (unavailable from Apple as far as I
can tell, but archived in several places online), HFS+ ignores sixteen
specific unicode runes when doing path normalization. We need to
handle those cases, so this function lets us efficiently strip the
offending characters from a UTF-8 encoded string (which is the only
way it seems to matter on OS X.)
author | Augie Fackler <raf@durin42.com> |
---|---|
date | Tue, 16 Dec 2014 13:06:41 -0500 |
parents | bcff9ecdaae0 |
children | ac08de78de7f |
comparison
equal
deleted
inserted
replaced
23595:035434b407be | 23596:885bd7c5c7e3 |
---|---|
5 # This software may be used and distributed according to the terms of the | 5 # This software may be used and distributed according to the terms of the |
6 # GNU General Public License version 2 or any later version. | 6 # GNU General Public License version 2 or any later version. |
7 | 7 |
8 import error | 8 import error |
9 import unicodedata, locale, os | 9 import unicodedata, locale, os |
10 | |
11 # These unicode characters are ignored by HFS+ (Apple Technote 1150, | |
12 # "Unicode Subtleties"), so we need to ignore them in some places for | |
13 # sanity. | |
14 _ignore = [unichr(int(x, 16)).encode("utf-8") for x in | |
15 "200c 200d 200e 200f 202a 202b 202c 202d 202e " | |
16 "206a 206b 206c 206d 206e 206f feff".split()] | |
17 # verify the next function will work | |
18 assert set([i[0] for i in _ignore]) == set(["\xe2", "\xef"]) | |
19 | |
20 def hfsignoreclean(s): | |
21 """Remove codepoints ignored by HFS+ from s. | |
22 | |
23 >>> hfsignoreclean(u'.h\u200cg'.encode('utf-8')) | |
24 '.hg' | |
25 >>> hfsignoreclean(u'.h\ufeffg'.encode('utf-8')) | |
26 '.hg' | |
27 """ | |
28 if "\xe2" in s or "\xef" in s: | |
29 for c in _ignore: | |
30 s = s.replace(c, '') | |
31 return s | |
10 | 32 |
11 def _getpreferredencoding(): | 33 def _getpreferredencoding(): |
12 ''' | 34 ''' |
13 On darwin, getpreferredencoding ignores the locale environment and | 35 On darwin, getpreferredencoding ignores the locale environment and |
14 always returns mac-roman. http://bugs.python.org/issue6202 fixes this | 36 always returns mac-roman. http://bugs.python.org/issue6202 fixes this |