comparison mercurial/encoding.py @ 23596:885bd7c5c7e3 stable

encoding: add hfsignoreclean to clean out HFS-ignored characters According to Apple Technote 1150 (unavailable from Apple as far as I can tell, but archived in several places online), HFS+ ignores sixteen specific unicode runes when doing path normalization. We need to handle those cases, so this function lets us efficiently strip the offending characters from a UTF-8 encoded string (which is the only way it seems to matter on OS X.)
author Augie Fackler <raf@durin42.com>
date Tue, 16 Dec 2014 13:06:41 -0500
parents bcff9ecdaae0
children ac08de78de7f
comparison
equal deleted inserted replaced
23595:035434b407be 23596:885bd7c5c7e3
5 # This software may be used and distributed according to the terms of the 5 # This software may be used and distributed according to the terms of the
6 # GNU General Public License version 2 or any later version. 6 # GNU General Public License version 2 or any later version.
7 7
8 import error 8 import error
9 import unicodedata, locale, os 9 import unicodedata, locale, os
10
11 # These unicode characters are ignored by HFS+ (Apple Technote 1150,
12 # "Unicode Subtleties"), so we need to ignore them in some places for
13 # sanity.
14 _ignore = [unichr(int(x, 16)).encode("utf-8") for x in
15 "200c 200d 200e 200f 202a 202b 202c 202d 202e "
16 "206a 206b 206c 206d 206e 206f feff".split()]
17 # verify the next function will work
18 assert set([i[0] for i in _ignore]) == set(["\xe2", "\xef"])
19
20 def hfsignoreclean(s):
21 """Remove codepoints ignored by HFS+ from s.
22
23 >>> hfsignoreclean(u'.h\u200cg'.encode('utf-8'))
24 '.hg'
25 >>> hfsignoreclean(u'.h\ufeffg'.encode('utf-8'))
26 '.hg'
27 """
28 if "\xe2" in s or "\xef" in s:
29 for c in _ignore:
30 s = s.replace(c, '')
31 return s
10 32
11 def _getpreferredencoding(): 33 def _getpreferredencoding():
12 ''' 34 '''
13 On darwin, getpreferredencoding ignores the locale environment and 35 On darwin, getpreferredencoding ignores the locale environment and
14 always returns mac-roman. http://bugs.python.org/issue6202 fixes this 36 always returns mac-roman. http://bugs.python.org/issue6202 fixes this