tests/test-encoding.t
author Brodie Rao <brodie@sf.io>
Mon, 16 Sep 2013 01:08:29 -0700
changeset 20192 38fad5e76ee8
parent 17361 5e276d1d504a
child 22429 7a7eed5176a4
permissions -rw-r--r--
branches: simplify with repo.branchmap().iterbranches() Running hg branches on the PyPy repo (with 996) over a busy NFS server, before this change: $ time hg --profile branches > /dev/null CallCount Recursive Total(s) Inline(s) module:lineno(function) 1069 0 1.2955 1.2955 <open> 1063 0 0.5576 0.5576 <method 'close' of 'file' objects> 4122 0 0.1993 0.0449 mercurial.repoview:161(changelog) 8240 0 0.0771 0.0299 mercurial.changelog:133(tip) 4122 0 0.0422 0.0204 mercurial.localrepo:26(__get__) 8240 0 0.0252 0.0131 <len> 4122 0 0.0060 0.0037 mercurial.repoview:112(filterrevs) 8240 0 0.0028 0.0028 <hash> 3029 0 0.2139 0.0390 mercurial.context:202(__init__) 3029 0 0.1402 0.0339 mercurial.repoview:161(changelog) 3029 0 0.0240 0.0087 mercurial.changelog:183(rev) 9087 0 0.0067 0.0067 <isinstance> 1096 0 0.0025 0.0025 <binascii.unhexlify> 4125 0 0.0015 0.0015 <len> 4229 0 0.0344 0.0344 mercurial.revlog:296(rev) 1061 0 0.0343 0.0343 <method 'seek' of 'file' objects> 1063 0 0.0339 0.0339 <method 'read' of 'file' objects> 40476 16488 0.0479 0.0311 <len> 16488 0 0.0216 0.0168 mercurial.revlog:262(__len__) 8240 0 0.0771 0.0299 mercurial.changelog:133(tip) 8240 0 0.0281 0.0203 mercurial.changelog:190(node) 8240 0 0.0191 0.0095 <len> 1342 0 0.0278 0.0278 <zlib.decompress> 1074 0 2.2143 0.0266 mercurial.changelog:270(read) 1074 0 2.1328 0.0230 mercurial.revlog:907(revision) 1073 0 0.0208 0.0108 mercurial.changelog:28(decodeextra) 2148 0 0.0072 0.0072 <method 'split' of 'str' objects> 2148 0 0.0211 0.0038 mercurial.encoding:61(tolocal) 1074 0 0.0028 0.0028 <method 'index' of 'str' objects> 1061 0 1.9811 0.0237 mercurial.revlog:817(_loadchunk) real 0m2.742s user 0m0.811s sys 0m0.188s After this change: $ time hg --profile branches > /dev/null CallCount Recursive Total(s) Inline(s) module:lineno(function) 2092 0 0.1444 0.0292 mercurial.context:202(__init__) 2092 0 0.0908 0.0216 mercurial.repoview:161(changelog) 2092 0 0.0164 0.0057 mercurial.changelog:183(rev) 6276 0 0.0045 0.0045 <isinstance> 1096 0 0.0024 0.0024 <binascii.unhexlify> 3188 0 0.0013 0.0013 <len> 2218 0 0.0230 0.0230 mercurial.revlog:296(rev) 2111 0 0.1028 0.0218 mercurial.repoview:161(changelog) 4218 0 0.0387 0.0146 mercurial.changelog:133(tip) 2111 0 0.0238 0.0104 mercurial.localrepo:26(__get__) 4218 0 0.0122 0.0062 <len> 2111 0 0.0038 0.0021 mercurial.repoview:112(filterrevs) 4218 0 0.0014 0.0014 <hash> 20240 8444 0.0233 0.0149 <len> 8444 0 0.0110 0.0084 mercurial.revlog:262(__len__) 4218 0 0.0387 0.0146 mercurial.changelog:133(tip) 4218 0 0.0144 0.0103 mercurial.changelog:190(node) 4218 0 0.0097 0.0048 <len> 2398 1 0.0271 0.0115 mercurial.localrepo:26(__get__) 2398 1 0.0146 0.0046 mercurial.scmutil:939(__get__) 2124 0 0.0009 0.0009 mercurial.localrepo:330(unfiltered) 274 0 0.0002 0.0002 mercurial.repoview:192(unfiltered) 4 0 0.1409 0.0112 mercurial.branchmap:19(read) 1096 0 0.1113 0.0028 mercurial.localrepo:407(__contains__) 1098 0 0.0020 0.0020 <method 'split' of 'str' objects> 1097 0 0.0019 0.0019 <binascii.unhexlify> 1096 0 0.0093 0.0018 mercurial.encoding:61(tolocal) 1096 0 0.0010 0.0010 <method 'append' of 'list' objects> 4349 0 0.0150 0.0105 mercurial.changelog:190(node) 4349 0 0.0045 0.0045 mercurial.revlog:317(node) real 0m0.362s user 0m0.329s sys 0m0.024s

Test character encoding

  $ hg init t
  $ cd t

we need a repo with some legacy latin-1 changesets

  $ hg unbundle "$TESTDIR/bundles/legacy-encoding.hg"
  adding changesets
  adding manifests
  adding file changes
  added 2 changesets with 2 changes to 1 files
  (run 'hg update' to get a working copy)
  $ hg co
  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
  $ python << EOF
  > f = file('latin-1', 'w'); f.write("latin-1 e' encoded: \xe9"); f.close()
  > f = file('utf-8', 'w'); f.write("utf-8 e' encoded: \xc3\xa9"); f.close()
  > f = file('latin-1-tag', 'w'); f.write("\xe9"); f.close()
  > EOF

should fail with encoding error

  $ echo "plain old ascii" > a
  $ hg st
  M a
  ? latin-1
  ? latin-1-tag
  ? utf-8
  $ HGENCODING=ascii hg ci -l latin-1
  transaction abort!
  rollback completed
  abort: decoding near ' encoded: \xe9': 'ascii' codec can't decode byte 0xe9 in position 20: ordinal not in range(128)! (esc)
  [255]

these should work

  $ echo "latin-1" > a
  $ HGENCODING=latin-1 hg ci -l latin-1
  $ echo "utf-8" > a
  $ HGENCODING=utf-8 hg ci -l utf-8
  $ HGENCODING=latin-1 hg tag `cat latin-1-tag`
  $ HGENCODING=latin-1 hg branch `cat latin-1-tag`
  marked working directory as branch \xe9 (esc)
  (branches are permanent and global, did you want a bookmark?)
  $ HGENCODING=latin-1 hg ci -m 'latin1 branch'
  $ hg -q rollback
  $ HGENCODING=latin-1 hg branch
  \xe9 (esc)
  $ HGENCODING=latin-1 hg ci -m 'latin1 branch'
  $ rm .hg/branch

hg log (ascii)

  $ hg --encoding ascii log
  changeset:   5:a52c0692f24a
  branch:      ?
  tag:         tip
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     latin1 branch
  
  changeset:   4:94db611b4196
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     Added tag ? for changeset ca661e7520de
  
  changeset:   3:ca661e7520de
  tag:         ?
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     utf-8 e' encoded: ?
  
  changeset:   2:650c6f3d55dd
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     latin-1 e' encoded: ?
  
  changeset:   1:0e5b7e3f9c4a
  user:        test
  date:        Mon Jan 12 13:46:40 1970 +0000
  summary:     koi8-r: ????? = u'\u0440\u0442\u0443\u0442\u044c'
  
  changeset:   0:1e78a93102a3
  user:        test
  date:        Mon Jan 12 13:46:40 1970 +0000
  summary:     latin-1 e': ? = u'\xe9'
  

hg log (latin-1)

  $ hg --encoding latin-1 log
  changeset:   5:a52c0692f24a
  branch:      \xe9 (esc)
  tag:         tip
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     latin1 branch
  
  changeset:   4:94db611b4196
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     Added tag \xe9 for changeset ca661e7520de (esc)
  
  changeset:   3:ca661e7520de
  tag:         \xe9 (esc)
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     utf-8 e' encoded: \xe9 (esc)
  
  changeset:   2:650c6f3d55dd
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     latin-1 e' encoded: \xe9 (esc)
  
  changeset:   1:0e5b7e3f9c4a
  user:        test
  date:        Mon Jan 12 13:46:40 1970 +0000
  summary:     koi8-r: \xd2\xd4\xd5\xd4\xd8 = u'\\u0440\\u0442\\u0443\\u0442\\u044c' (esc)
  
  changeset:   0:1e78a93102a3
  user:        test
  date:        Mon Jan 12 13:46:40 1970 +0000
  summary:     latin-1 e': \xe9 = u'\\xe9' (esc)
  

hg log (utf-8)

  $ hg --encoding utf-8 log
  changeset:   5:a52c0692f24a
  branch:      \xc3\xa9 (esc)
  tag:         tip
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     latin1 branch
  
  changeset:   4:94db611b4196
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     Added tag \xc3\xa9 for changeset ca661e7520de (esc)
  
  changeset:   3:ca661e7520de
  tag:         \xc3\xa9 (esc)
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     utf-8 e' encoded: \xc3\xa9 (esc)
  
  changeset:   2:650c6f3d55dd
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     latin-1 e' encoded: \xc3\xa9 (esc)
  
  changeset:   1:0e5b7e3f9c4a
  user:        test
  date:        Mon Jan 12 13:46:40 1970 +0000
  summary:     koi8-r: \xc3\x92\xc3\x94\xc3\x95\xc3\x94\xc3\x98 = u'\\u0440\\u0442\\u0443\\u0442\\u044c' (esc)
  
  changeset:   0:1e78a93102a3
  user:        test
  date:        Mon Jan 12 13:46:40 1970 +0000
  summary:     latin-1 e': \xc3\xa9 = u'\\xe9' (esc)
  

hg tags (ascii)

  $ HGENCODING=ascii hg tags
  tip                                5:a52c0692f24a
  ?                                  3:ca661e7520de

hg tags (latin-1)

  $ HGENCODING=latin-1 hg tags
  tip                                5:a52c0692f24a
  \xe9                                  3:ca661e7520de (esc)

hg tags (utf-8)

  $ HGENCODING=utf-8 hg tags
  tip                                5:a52c0692f24a
  \xc3\xa9                                  3:ca661e7520de (esc)

hg branches (ascii)

  $ HGENCODING=ascii hg branches
  ?                              5:a52c0692f24a
  default                        4:94db611b4196 (inactive)

hg branches (latin-1)

  $ HGENCODING=latin-1 hg branches
  \xe9                              5:a52c0692f24a (esc)
  default                        4:94db611b4196 (inactive)

hg branches (utf-8)

  $ HGENCODING=utf-8 hg branches
  \xc3\xa9                              5:a52c0692f24a (esc)
  default                        4:94db611b4196 (inactive)
  $ echo '[ui]' >> .hg/hgrc
  $ echo 'fallbackencoding = koi8-r' >> .hg/hgrc

hg log (utf-8)

  $ HGENCODING=utf-8 hg log
  changeset:   5:a52c0692f24a
  branch:      \xc3\xa9 (esc)
  tag:         tip
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     latin1 branch
  
  changeset:   4:94db611b4196
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     Added tag \xc3\xa9 for changeset ca661e7520de (esc)
  
  changeset:   3:ca661e7520de
  tag:         \xc3\xa9 (esc)
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     utf-8 e' encoded: \xc3\xa9 (esc)
  
  changeset:   2:650c6f3d55dd
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     latin-1 e' encoded: \xc3\xa9 (esc)
  
  changeset:   1:0e5b7e3f9c4a
  user:        test
  date:        Mon Jan 12 13:46:40 1970 +0000
  summary:     koi8-r: \xd1\x80\xd1\x82\xd1\x83\xd1\x82\xd1\x8c = u'\\u0440\\u0442\\u0443\\u0442\\u044c' (esc)
  
  changeset:   0:1e78a93102a3
  user:        test
  date:        Mon Jan 12 13:46:40 1970 +0000
  summary:     latin-1 e': \xd0\x98 = u'\\xe9' (esc)
  

hg log (dolphin)

  $ HGENCODING=dolphin hg log
  abort: unknown encoding: dolphin
  (please check your locale settings)
  [255]
  $ HGENCODING=ascii hg branch `cat latin-1-tag`
  abort: decoding near '\xe9': 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)! (esc)
  [255]
  $ cp latin-1-tag .hg/branch
  $ HGENCODING=latin-1 hg ci -m 'auto-promote legacy name'

Test roundtrip encoding of lookup tables when not using UTF-8 (issue2763)

  $ HGENCODING=latin-1 hg up `cat latin-1-tag`
  0 files updated, 0 files merged, 1 files removed, 0 files unresolved

  $ cd ..