encoding: avoid localstr when a string can be encoded losslessly (issue2763) stable
authorMatt Mackall <mpm@selenic.com>
Fri, 15 Apr 2011 23:45:41 -0500
branchstable
changeset 13940 b7b26e54e37a
parent 13937 5f126c01ebfa
child 13941 924f40b977ee
child 13952 1416b9118540
encoding: avoid localstr when a string can be encoded losslessly (issue2763) localstr's hash method exists to prevent bogus matching on lossy local encodings. For instance, we don't want 'caf?' to match 'café' in an ASCII locale. But when café can be losslessly encoded in the local charset, we can simply use a normal string and avoid the hashing trick. This avoids using localstr's hash method, which would prevent a match between
mercurial/encoding.py
tests/test-encoding.t
--- a/mercurial/encoding.py	Fri Apr 15 16:35:32 2011 +0300
+++ b/mercurial/encoding.py	Fri Apr 15 23:45:41 2011 -0500
@@ -95,11 +95,15 @@
     for e in ('UTF-8', fallbackencoding):
         try:
             u = s.decode(e) # attempt strict decoding
-            if e == 'UTF-8':
-                return localstr(s, u.encode(encoding, "replace"))
+            r = u.encode(encoding, "replace")
+            if u == r.decode(encoding):
+                # r is a safe, non-lossy encoding of s
+                return r
+            elif e == 'UTF-8':
+                return localstr(s, r)
             else:
-                return localstr(u.encode('UTF-8'),
-                                u.encode(encoding, "replace"))
+                return localstr(u.encode('UTF-8'), r)
+
         except LookupError, k:
             raise error.Abort("%s, please check your locale settings" % k)
         except UnicodeDecodeError:
--- a/tests/test-encoding.t	Fri Apr 15 16:35:32 2011 +0300
+++ b/tests/test-encoding.t	Fri Apr 15 23:45:41 2011 -0500
@@ -241,3 +241,9 @@
   [255]
   $ cp latin-1-tag .hg/branch
   $ HGENCODING=latin-1 hg ci -m 'auto-promote legacy name'
+
+Test roundtrip encoding of lookup tables when not using UTF-8 (issue2763)
+
+  $ HGENCODING=latin-1 hg up `cat latin-1-tag`
+  0 files updated, 0 files merged, 1 files removed, 0 files unresolved
+