Mercurial > hg
view tests/test-convert-svn-encoding.t @ 50400:95acba2c29f6
encoding: avoid quadratic time complexity when json-encoding non-UTF8 strings
Apparently the code uses "+=" with a bytes object, which is linear-time, so the
whole encoding is quadratic-time. This patch makes us use a bytearray object,
instead, which has a(n amortized-)constant-time append operation.
The encoding is still not particularly fast, but at least a 10MB file
takes tens of seconds, not many hours to encode.
author | Arseniy Alekseyev <aalekseyev@janestreet.com> |
---|---|
date | Mon, 06 Mar 2023 11:27:57 +0000 |
parents | 1d075b857c90 |
children |
line wrap: on
line source
#require svn svn-bindings $ cat >> $HGRCPATH <<EOF > [extensions] > convert = > EOF $ svnadmin create svn-repo $ svnadmin load -q svn-repo < "$TESTDIR/svn/encoding.svndump" Convert while testing all possible outputs $ hg --debug convert svn-repo A-hg --config progress.debug=1 initializing destination A-hg repository reparent to file:/*/$TESTTMP/svn-repo (glob) run hg sink pre-conversion action scanning source... found trunk at 'trunk' found tags at 'tags' found branches at 'branches' found branch branch\xc3\xa9 at 5 (esc) found branch branch\xc3\xa9e at 6 (esc) scanning: 1/4 revisions (25.00%) reparent to file:/*/$TESTTMP/svn-repo/trunk (glob) fetching revision log for "/trunk" from 4 to 0 parsing revision 4 (2 changes) parsing revision 3 (4 changes) parsing revision 2 (3 changes) parsing revision 1 (3 changes) no copyfrom path, don't know what to do. '/branches' is not under '/trunk', ignoring '/tags' is not under '/trunk', ignoring scanning: 2/4 revisions (50.00%) reparent to file:/*/$TESTTMP/svn-repo/branches/branch%C3%A9 (glob) fetching revision log for "/branches/branch\xc3\xa9" from 5 to 0 (esc) parsing revision 5 (1 changes) reparent to file:/*/$TESTTMP/svn-repo (glob) reparent to file:/*/$TESTTMP/svn-repo/branches/branch%C3%A9 (glob) found parent of branch /branches/branch\xc3\xa9 at 4: /trunk (esc) scanning: 3/4 revisions (75.00%) reparent to file:/*/$TESTTMP/svn-repo/branches/branch%C3%A9e (glob) fetching revision log for "/branches/branch\xc3\xa9e" from 6 to 0 (esc) parsing revision 6 (1 changes) reparent to file:/*/$TESTTMP/svn-repo (glob) reparent to file:/*/$TESTTMP/svn-repo/branches/branch%C3%A9e (glob) found parent of branch /branches/branch\xc3\xa9e at 5: /branches/branch\xc3\xa9 (esc) scanning: 4/4 revisions (100.00%) scanning: 5/4 revisions (125.00%) scanning: 6/4 revisions (150.00%) sorting... converting... 5 init projA source: svn:afeb9c47-92ff-4c0c-9f72-e1f6eb8ac9af/trunk@1 converting: 0/6 revisions (0.00%) reusing manifest from p1 (no file change) committing changelog updating the branch cache 4 hello source: svn:afeb9c47-92ff-4c0c-9f72-e1f6eb8ac9af/trunk@2 converting: 1/6 revisions (16.67%) reparent to file:/*/$TESTTMP/svn-repo/trunk (glob) scanning paths: /trunk/\xc3\xa0 0/3 paths (0.00%) (esc) scanning paths: /trunk/\xc3\xa0/e\xcc\x81 1/3 paths (33.33%) (esc) scanning paths: /trunk/\xc3\xa9 2/3 paths (66.67%) (esc) committing files: \xc3\xa0/e\xcc\x81 (esc) getting files: \xc3\xa0/e\xcc\x81 1/2 files (50.00%) (esc) \xc3\xa9 (esc) getting files: \xc3\xa9 2/2 files (100.00%) (esc) committing manifest committing changelog updating the branch cache 3 copy files source: svn:afeb9c47-92ff-4c0c-9f72-e1f6eb8ac9af/trunk@3 converting: 2/6 revisions (33.33%) scanning paths: /trunk/\xc3\xa0 0/4 paths (0.00%) (esc) gone from -1 reparent to file:/*/$TESTTMP/svn-repo (glob) reparent to file:/*/$TESTTMP/svn-repo/trunk (glob) scanning paths: /trunk/\xc3\xa8 1/4 paths (25.00%) (esc) copied to \xc3\xa8 from \xc3\xa9@2 (esc) scanning paths: /trunk/\xc3\xa9 2/4 paths (50.00%) (esc) gone from -1 reparent to file:/*/$TESTTMP/svn-repo (glob) reparent to file:/*/$TESTTMP/svn-repo/trunk (glob) scanning paths: /trunk/\xc3\xb9 3/4 paths (75.00%) (esc) mark /trunk/\xc3\xb9 came from \xc3\xa0:2 (esc) getting files: \xc3\xa0/e\xcc\x81 1/4 files (25.00%) (esc) getting files: \xc3\xa9 2/4 files (50.00%) (esc) committing files: \xc3\xa8 (esc) getting files: \xc3\xa8 3/4 files (75.00%) (esc) \xc3\xa8: copy \xc3\xa9:6b67ccefd5ce6de77e7ead4f5292843a0255329f (esc) \xc3\xb9/e\xcc\x81 (esc) getting files: \xc3\xb9/e\xcc\x81 4/4 files (100.00%) (esc) \xc3\xb9/e\xcc\x81: copy \xc3\xa0/e\xcc\x81:a9092a3d84a37b9993b5c73576f6de29b7ea50f6 (esc) committing manifest committing changelog updating the branch cache 2 remove files source: svn:afeb9c47-92ff-4c0c-9f72-e1f6eb8ac9af/trunk@4 converting: 3/6 revisions (50.00%) scanning paths: /trunk/\xc3\xa8 0/2 paths (0.00%) (esc) gone from -1 reparent to file:/*/$TESTTMP/svn-repo (glob) reparent to file:/*/$TESTTMP/svn-repo/trunk (glob) scanning paths: /trunk/\xc3\xb9 1/2 paths (50.00%) (esc) gone from -1 reparent to file:/*/$TESTTMP/svn-repo (glob) reparent to file:/*/$TESTTMP/svn-repo/trunk (glob) getting files: \xc3\xa8 1/2 files (50.00%) (esc) getting files: \xc3\xb9/e\xcc\x81 2/2 files (100.00%) (esc) committing files: committing manifest committing changelog updating the branch cache 1 branch to branch? source: svn:afeb9c47-92ff-4c0c-9f72-e1f6eb8ac9af/branches/branch?@5 converting: 4/6 revisions (66.67%) reparent to file:/*/$TESTTMP/svn-repo/branches/branch%C3%A9 (glob) scanning paths: /branches/branch\xc3\xa9 0/1 paths (0.00%) (esc) reusing manifest from p1 (no file change) committing changelog updating the branch cache 0 branch to branch?e source: svn:afeb9c47-92ff-4c0c-9f72-e1f6eb8ac9af/branches/branch?e@6 converting: 5/6 revisions (83.33%) reparent to file:/*/$TESTTMP/svn-repo/branches/branch%C3%A9e (glob) scanning paths: /branches/branch\xc3\xa9e 0/1 paths (0.00%) (esc) reusing manifest from p1 (no file change) committing changelog updating the branch cache reparent to file:/*/$TESTTMP/svn-repo (glob) reparent to file:/*/$TESTTMP/svn-repo/branches/branch%C3%A9e (glob) reparent to file:/*/$TESTTMP/svn-repo (glob) reparent to file:/*/$TESTTMP/svn-repo/branches/branch%C3%A9e (glob) updating tags committing files: .hgtags committing manifest committing changelog updating the branch cache run hg sink post-conversion action $ cd A-hg $ hg up 1 files updated, 0 files merged, 0 files removed, 0 files unresolved Check tags are in UTF-8 $ cat .hgtags e94e4422020e715add80525e8f0f46c9968689f1 branch\xc3\xa9e (esc) f7e66f98380ed1e53a797c5c7a7a2616a7ab377d branch\xc3\xa9 (esc) $ cd .. Subversion sources don't support non-ASCII characters in HTTP(S) URLs. $ XFF=$("$PYTHON" -c 'from mercurial.utils.procutil import stdout; stdout.write(b"\xff")') $ hg convert --source-type=svn http://localhost:$HGPORT/$XFF test initializing destination test repository Subversion sources don't support non-ASCII characters in HTTP(S) URLs. Please percent-encode them. http://localhost:$HGPORT/\xff does not look like a Subversion repository (esc) abort: http://localhost:$HGPORT/\xff: missing or unsupported repository (esc) [255] In Subversion, paths are Unicode (encoded as UTF-8). Therefore paths that can't be converted between UTF-8 and the locale encoding (which is always ASCII in tests) don't work. $ cp -R svn-repo $XFF $ hg convert $XFF test initializing destination test repository Subversion requires that paths can be converted to Unicode using the current locale encoding (ascii) \xff does not look like a CVS checkout (glob) (esc) $TESTTMP/\xff does not look like a Git repository (esc) \xff does not look like a Subversion repository (glob) (esc) \xff is not a local Mercurial repository (glob) (esc) \xff does not look like a darcs repository (glob) (esc) \xff does not look like a monotone repository (glob) (esc) \xff does not look like a GNU Arch repository (glob) (esc) \xff does not look like a Bazaar repository (glob) (esc) cannot find required "p4" tool abort: \xff: missing or unsupported repository (glob) (esc) [255] $ hg convert file://$TESTTMP/$XFF test initializing destination test repository Subversion requires that file URLs can be converted to Unicode using the current locale encoding (ascii) file:/*/$TESTTMP/\xff does not look like a CVS checkout (glob) (esc) $TESTTMP/file:$TESTTMP/\xff does not look like a Git repository (esc) file:/*/$TESTTMP/\xff does not look like a Subversion repository (glob) (esc) file:/*/$TESTTMP/\xff is not a local Mercurial repository (glob) (esc) file:/*/$TESTTMP/\xff does not look like a darcs repository (glob) (esc) file:/*/$TESTTMP/\xff does not look like a monotone repository (glob) (esc) file:/*/$TESTTMP/\xff does not look like a GNU Arch repository (glob) (esc) file:/*/$TESTTMP/\xff does not look like a Bazaar repository (glob) (esc) file:/*/$TESTTMP/\xff does not look like a P4 repository (glob) (esc) abort: file:/*/$TESTTMP/\xff: missing or unsupported repository (glob) (esc) [255] Subversion decodes percent-encoded bytes on the converted, UTF-8-encoded string. Therefore, if the percent-encoded bytes aren't valid UTF-8, Subversion would choke on them when converting them to the locale encoding. $ hg convert file://$TESTTMP/%FF test initializing destination test repository Subversion does not support non-UTF-8 percent-encoded bytes in file URLs file:/*/$TESTTMP/%FF does not look like a CVS checkout (glob) $TESTTMP/file:$TESTTMP/%FF does not look like a Git repository file:/*/$TESTTMP/%FF does not look like a Subversion repository (glob) file:/*/$TESTTMP/%FF is not a local Mercurial repository (glob) file:/*/$TESTTMP/%FF does not look like a darcs repository (glob) file:/*/$TESTTMP/%FF does not look like a monotone repository (glob) file:/*/$TESTTMP/%FF does not look like a GNU Arch repository (glob) file:/*/$TESTTMP/%FF does not look like a Bazaar repository (glob) file:/*/$TESTTMP/%FF does not look like a P4 repository (glob) abort: file:/*/$TESTTMP/%FF: missing or unsupported repository (glob) [255]