view tests/test-encoding.out @ 12252:4481f8a93c7a stable

convert/darcs: handle non-ASCII metadata in darcs changelog (issue2354) Given a commit author or message with non-ASCII characters in a darcs repo, convert would raise a UnicodeEncodeError when adding changesets to the hg changelog. This happened because etree returns back unicode objects for any text it can't encode into ASCII. convert was passing these objects to changelog.add(), which would then attempt encoding.fromlocal() on them. This patch ensures converter_source.recode() is called on each piece of commit data returned by etree. (Also note that darcs is currently encoding agnostic and will print out whatever is in a patch's metadata byte-for-byte, even in the XML changelog.)
author Brodie Rao <brodie@bitheap.org>
date Fri, 10 Sep 2010 09:30:50 -0500
parents d320e70442a5
children 4c94b6d0fb1c
line wrap: on
line source

adding changesets
adding manifests
adding file changes
added 2 changesets with 2 changes to 1 files
(run 'hg update' to get a working copy)
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
% should fail with encoding error
M a
? latin-1
? latin-1-tag
? utf-8
transaction abort!
rollback completed
abort: decoding near ' encoded: é': 'ascii' codec can't decode byte 0xe9 in position 20: ordinal not in range(128)!
% these should work
marked working directory as branch é
% hg log (ascii)
changeset:   5:db5520b4645f
branch:      ?
tag:         tip
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     latin1 branch

changeset:   4:9cff3c980b58
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     Added tag ? for changeset 770b9b11621d

changeset:   3:770b9b11621d
tag:         ?
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     utf-8 e' encoded: ?

changeset:   2:0572af48b948
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     latin-1 e' encoded: ?

changeset:   1:0e5b7e3f9c4a
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     koi8-r: ????? = u'\u0440\u0442\u0443\u0442\u044c'

changeset:   0:1e78a93102a3
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     latin-1 e': ? = u'\xe9'

% hg log (latin-1)
changeset:   5:db5520b4645f
branch:      é
tag:         tip
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     latin1 branch

changeset:   4:9cff3c980b58
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     Added tag é for changeset 770b9b11621d

changeset:   3:770b9b11621d
tag:         é
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     utf-8 e' encoded: é

changeset:   2:0572af48b948
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     latin-1 e' encoded: é

changeset:   1:0e5b7e3f9c4a
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     koi8-r: ÒÔÕÔØ = u'\u0440\u0442\u0443\u0442\u044c'

changeset:   0:1e78a93102a3
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     latin-1 e': é = u'\xe9'

% hg log (utf-8)
changeset:   5:db5520b4645f
branch:      é
tag:         tip
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     latin1 branch

changeset:   4:9cff3c980b58
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     Added tag é for changeset 770b9b11621d

changeset:   3:770b9b11621d
tag:         é
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     utf-8 e' encoded: é

changeset:   2:0572af48b948
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     latin-1 e' encoded: é

changeset:   1:0e5b7e3f9c4a
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     koi8-r: ÒÔÕÔØ = u'\u0440\u0442\u0443\u0442\u044c'

changeset:   0:1e78a93102a3
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     latin-1 e': é = u'\xe9'

% hg tags (ascii)
tip                                5:db5520b4645f
?                                  3:770b9b11621d
% hg tags (latin-1)
tip                                5:db5520b4645f
é                                 3:770b9b11621d
% hg tags (utf-8)
tip                                5:db5520b4645f
é                                 3:770b9b11621d
% hg branches (ascii)
?                              5:db5520b4645f
default                        4:9cff3c980b58 (inactive)
% hg branches (latin-1)
é                             5:db5520b4645f
default                        4:9cff3c980b58 (inactive)
% hg branches (utf-8)
é                             5:db5520b4645f
default                        4:9cff3c980b58 (inactive)
% hg log (utf-8)
changeset:   5:db5520b4645f
branch:      é
tag:         tip
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     latin1 branch

changeset:   4:9cff3c980b58
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     Added tag é for changeset 770b9b11621d

changeset:   3:770b9b11621d
tag:         é
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     utf-8 e' encoded: é

changeset:   2:0572af48b948
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     latin-1 e' encoded: é

changeset:   1:0e5b7e3f9c4a
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     koi8-r: ртуть = u'\u0440\u0442\u0443\u0442\u044c'

changeset:   0:1e78a93102a3
user:        test
date:        Mon Jan 12 13:46:40 1970 +0000
summary:     latin-1 e': И = u'\xe9'

% hg log (dolphin)
abort: unknown encoding: dolphin, please check your locale settings
abort: decoding near 'é': 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)!
abort: branch name not in UTF-8!