Thu, 05 Nov 2015 17:30:10 -0600 encoding: re-escape U+DCxx characters in toutf8b input (issue4927)
Matt Mackall <mpm@selenic.com> [Thu, 05 Nov 2015 17:30:10 -0600] rev 26879
encoding: re-escape U+DCxx characters in toutf8b input (issue4927) This is the final missing piece in fully round-tripping random byte strings through UTF-8b. While this issue means that UTF-8 <-> UTF-8b isn't fully bijective, we don't expect to ever see U+DCxx codepoints in "real" UTF-8 data, so it should remain bijective in practice.
Thu, 05 Nov 2015 17:21:43 -0600 encoding: use getutf8char in toutf8b
Matt Mackall <mpm@selenic.com> [Thu, 05 Nov 2015 17:21:43 -0600] rev 26878
encoding: use getutf8char in toutf8b This correctly avoids the ambiguity of U+FFFD already present in the input and similar confusion by working a character at a time.
Thu, 05 Nov 2015 17:11:50 -0600 encoding: handle non-BMP characters in fromutf8b
Matt Mackall <mpm@selenic.com> [Thu, 05 Nov 2015 17:11:50 -0600] rev 26877
encoding: handle non-BMP characters in fromutf8b
Thu, 05 Nov 2015 17:09:00 -0600 posix: use getutf8char to handle OS X filename percent-escaping
Matt Mackall <mpm@selenic.com> [Thu, 05 Nov 2015 17:09:00 -0600] rev 26876
posix: use getutf8char to handle OS X filename percent-escaping This replaces an open-coded utf-8 parser that was ignoring subtle issues like overlong encodings.
(0) -10000 -3000 -1000 -300 -100 -30 -10 -4 +4 +10 +30 +100 +300 +1000 +3000 +10000 tip