comparison mercurial/encoding.py @ 37946:57b0c7221dba

encoding: fix toutf8b() to resurrect lossy characters even if "\xed" in it If 's' is a localstr, 's._utf8' must be returned to get the original UTF-8 sequence back. Because of this, it was totally wrong to test if '"\xed" not in s', which should be either '"\xed" not in s._utf8' or just omitted. This patch moves the localstr handling to top as the validity of 's._utf8' should be pre-checked by encoding.tolocal().
author Yuya Nishihara <yuya@tcha.org>
date Sun, 22 Apr 2018 11:38:53 +0900
parents d4c760c997cd
children 3ea3c96ada54
comparison
equal deleted inserted replaced
37945:bfe8ef6e370e 37946:57b0c7221dba
502 arbitrary bytes into an internal Unicode format that can be 502 arbitrary bytes into an internal Unicode format that can be
503 re-encoded back into the original. Here we are exposing the 503 re-encoded back into the original. Here we are exposing the
504 internal surrogate encoding as a UTF-8 string.) 504 internal surrogate encoding as a UTF-8 string.)
505 ''' 505 '''
506 506
507 if not isinstance(s, localstr) and isasciistr(s): 507 if isinstance(s, localstr):
508 # assume that the original UTF-8 sequence would never contain
509 # invalid characters in U+DCxx range
510 return s._utf8
511 elif isasciistr(s):
508 return s 512 return s
509 if "\xed" not in s: 513 if "\xed" not in s:
510 if isinstance(s, localstr):
511 return s._utf8
512 try: 514 try:
513 s.decode('utf-8', _utf8strict) 515 s.decode('utf-8', _utf8strict)
514 return s 516 return s
515 except UnicodeDecodeError: 517 except UnicodeDecodeError:
516 pass 518 pass