darwin: omit ignorable codepoints when normcase()ing a file path
This lets us avoid some nasty case collision problems in OS X with
invisible codepoints.
--- a/mercurial/posix.py Tue Dec 16 13:06:41 2014 -0500
+++ b/mercurial/posix.py Tue Dec 16 13:07:10 2014 -0500
@@ -208,6 +208,7 @@
- escape-encode invalid characters
- decompose to NFD
- lowercase
+ - omit ignored characters [200c-200f, 202a-202e, 206a-206f,feff]
>>> normcase('UPPER')
'upper'
@@ -265,7 +266,9 @@
u = s.decode('utf-8')
# Decompose then lowercase (HFS+ technote specifies lower)
- return unicodedata.normalize('NFD', u).lower().encode('utf-8')
+ enc = unicodedata.normalize('NFD', u).lower().encode('utf-8')
+ # drop HFS+ ignored characters
+ return encoding.hfsignoreclean(enc)
if sys.platform == 'cygwin':
# workaround for cygwin, in which mount point part of path is
--- a/tests/test-casefolding.t Tue Dec 16 13:06:41 2014 -0500
+++ b/tests/test-casefolding.t Tue Dec 16 13:07:10 2014 -0500
@@ -200,12 +200,11 @@
We assume anyone running the tests on a case-insensitive volume on OS
X will be using HFS+. If that's not true, this test will fail.
-Bug: some codepoints are to be ignored on HFS+:
-
$ rm A
>>> open(u'a\u200c'.encode('utf-8'), 'w').write('unicode is fun')
$ hg status
M A
- ? a\xe2\x80\x8c (esc)
+
#endif
+
$ cd ..