comparison hgeditor @ 27699:c8d3392f76e1

encoding: handle UTF-16 internal limit with fromutf8b (issue5031) Default builds of Python have a Unicode type that isn't actually full Unicode but UTF-16, which encodes non-BMP codepoints to a pair of BMP codepoints with surrogate escaping. Since our UTF-8b hack escaping uses a plane that overlaps with the UTF-16 escaping system, this gets extra complicated. In addition, unichr() for codepoints greater than U+FFFF may not work either. This changes the code to reuse getutf8char to walk the byte string, so we only rely on Python for unpacking our U+DCxx characters.
author Matt Mackall <mpm@selenic.com>
date Thu, 07 Jan 2016 14:57:57 -0600
parents 1aee2ab0f902
children
comparison
equal deleted inserted replaced
27698:dad6404ccddb 27699:c8d3392f76e1