Mercurial > hg
diff hgext/children.py @ 27699:c8d3392f76e1
encoding: handle UTF-16 internal limit with fromutf8b (issue5031)
Default builds of Python have a Unicode type that isn't actually full
Unicode but UTF-16, which encodes non-BMP codepoints to a pair of BMP
codepoints with surrogate escaping. Since our UTF-8b hack escaping
uses a plane that overlaps with the UTF-16 escaping system, this gets
extra complicated. In addition, unichr() for codepoints greater than
U+FFFF may not work either.
This changes the code to reuse getutf8char to walk the byte string, so we
only rely on Python for unpacking our U+DCxx characters.
author | Matt Mackall <mpm@selenic.com> |
---|---|
date | Thu, 07 Jan 2016 14:57:57 -0600 |
parents | 80c5b2666a96 |
children | 3501bd89dad2 |