Mercurial > hg
view tests/test-subrepo-recursion.t @ 30442:41a8106789ca
util: implement zstd compression engine
Now that zstd is vendored and being built (in some configurations), we
can implement a compression engine for zstd!
The zstd engine is a little different from existing engines. Because
it may not always be present, we have to defer load the module in case
importing it fails. We facilitate this via a cached property that holds
a reference to the module or None. The "available" method is
implemented to reflect reality.
The zstd engine declares its ability to handle bundles using the
"zstd" human name and the "ZS" internal name. The latter was chosen
because internal names are 2 characters (by only convention I think)
and "ZS" seems reasonable.
The engine, like others, supports specifying the compression level.
However, there are no consumers of this API that yet pass in that
argument. I have plans to change that, so stay tuned.
Since all we need to do to support bundle generation with a new
compression engine is implement and register the compression engine,
bundle generation with zstd "just works!" Tests demonstrating this
have been added.
How does performance of zstd for bundle generation compare? On the
mozilla-unified repo, `hg bundle --all -t <engine>-v2` yields the
following on my i7-6700K on Linux:
engine CPU time bundle size vs orig size throughput
none 97.0s 4,054,405,584 100.0% 41.8 MB/s
bzip2 (l=9) 393.6s 975,343,098 24.0% 10.3 MB/s
gzip (l=6) 184.0s 1,140,533,074 28.1% 22.0 MB/s
zstd (l=1) 108.2s 1,119,434,718 27.6% 37.5 MB/s
zstd (l=2) 111.3s 1,078,328,002 26.6% 36.4 MB/s
zstd (l=3) 113.7s 1,011,823,727 25.0% 35.7 MB/s
zstd (l=4) 116.0s 1,008,965,888 24.9% 35.0 MB/s
zstd (l=5) 121.0s 977,203,148 24.1% 33.5 MB/s
zstd (l=6) 131.7s 927,360,198 22.9% 30.8 MB/s
zstd (l=7) 139.0s 912,808,505 22.5% 29.2 MB/s
zstd (l=12) 198.1s 854,527,714 21.1% 20.5 MB/s
zstd (l=18) 681.6s 789,750,690 19.5% 5.9 MB/s
On compression, zstd for bundle generation delivers:
* better compression than gzip with significantly less CPU utilization
* better than bzip2 compression ratios while still being significantly
faster than gzip
* ability to aggressively tune compression level to achieve
significantly smaller bundles
That last point is important. With clone bundles, a server can
pre-generate a bundle file, upload it to a static file server, and
redirect clients to transparently download it during clone. The server
could choose to produce a zstd bundle with the highest compression
settings possible. This would take a very long time - a magnitude
longer than a typical zstd bundle generation - but the result would
be hundreds of megabytes smaller! For the clone volume we do at
Mozilla, this could translate to petabytes of bandwidth savings
per year and faster clones (due to smaller transfer size).
I don't have detailed numbers to report on decompression. However,
zstd decompression is fast: >1 GB/s output throughput on this machine,
even through the Python bindings. And it can do that regardless of the
compression level of the input. By the time you have enough data to
worry about overhead of decompression, you have plenty of other things
to worry about performance wise.
zstd is wins all around. I can't wait to implement support for it
on the wire protocol and in revlogs.
author | Gregory Szorc <gregory.szorc@gmail.com> |
---|---|
date | Fri, 11 Nov 2016 01:10:07 -0800 |
parents | cd34bf29987e |
children | b3d2e8cce78c |
line wrap: on
line source
Create test repository: $ hg init repo $ cd repo $ echo x1 > x.txt $ hg init foo $ cd foo $ echo y1 > y.txt $ hg init bar $ cd bar $ echo z1 > z.txt $ cd .. $ echo 'bar = bar' > .hgsub $ cd .. $ echo 'foo = foo' > .hgsub Add files --- .hgsub files must go first to trigger subrepos: $ hg add -S .hgsub $ hg add -S foo/.hgsub $ hg add -S foo/bar adding foo/bar/z.txt (glob) $ hg add -S adding x.txt adding foo/y.txt (glob) Test recursive status without committing anything: $ hg status -S A .hgsub A foo/.hgsub A foo/bar/z.txt A foo/y.txt A x.txt Test recursive diff without committing anything: $ hg diff --nodates -S foo diff -r 000000000000 foo/.hgsub --- /dev/null +++ b/foo/.hgsub @@ -0,0 +1,1 @@ +bar = bar diff -r 000000000000 foo/y.txt --- /dev/null +++ b/foo/y.txt @@ -0,0 +1,1 @@ +y1 diff -r 000000000000 foo/bar/z.txt --- /dev/null +++ b/foo/bar/z.txt @@ -0,0 +1,1 @@ +z1 Commits: $ hg commit -m fails abort: uncommitted changes in subrepository 'foo' (use --subrepos for recursive commit) [255] The --subrepos flag overwrite the config setting: $ hg commit -m 0-0-0 --config ui.commitsubrepos=No --subrepos committing subrepository foo committing subrepository foo/bar (glob) $ cd foo $ echo y2 >> y.txt $ hg commit -m 0-1-0 $ cd bar $ echo z2 >> z.txt $ hg commit -m 0-1-1 $ cd .. $ hg commit -m 0-2-1 $ cd .. $ hg commit -m 1-2-1 Change working directory: $ echo y3 >> foo/y.txt $ echo z3 >> foo/bar/z.txt $ hg status -S M foo/bar/z.txt M foo/y.txt $ hg diff --nodates -S diff -r d254738c5f5e foo/y.txt --- a/foo/y.txt +++ b/foo/y.txt @@ -1,2 +1,3 @@ y1 y2 +y3 diff -r 9647f22de499 foo/bar/z.txt --- a/foo/bar/z.txt +++ b/foo/bar/z.txt @@ -1,2 +1,3 @@ z1 z2 +z3 Status call crossing repository boundaries: $ hg status -S foo/bar/z.txt M foo/bar/z.txt $ hg status -S -I 'foo/?.txt' M foo/y.txt $ hg status -S -I '**/?.txt' M foo/bar/z.txt M foo/y.txt $ hg diff --nodates -S -I '**/?.txt' diff -r d254738c5f5e foo/y.txt --- a/foo/y.txt +++ b/foo/y.txt @@ -1,2 +1,3 @@ y1 y2 +y3 diff -r 9647f22de499 foo/bar/z.txt --- a/foo/bar/z.txt +++ b/foo/bar/z.txt @@ -1,2 +1,3 @@ z1 z2 +z3 Status from within a subdirectory: $ mkdir dir $ cd dir $ echo a1 > a.txt $ hg status -S M foo/bar/z.txt M foo/y.txt ? dir/a.txt $ hg diff --nodates -S diff -r d254738c5f5e foo/y.txt --- a/foo/y.txt +++ b/foo/y.txt @@ -1,2 +1,3 @@ y1 y2 +y3 diff -r 9647f22de499 foo/bar/z.txt --- a/foo/bar/z.txt +++ b/foo/bar/z.txt @@ -1,2 +1,3 @@ z1 z2 +z3 Status with relative path: $ hg status -S .. M ../foo/bar/z.txt M ../foo/y.txt ? a.txt XXX: filtering lfilesrepo.status() in 3.3-rc causes these files to be listed as added instead of modified. $ hg status -S .. --config extensions.largefiles= M ../foo/bar/z.txt M ../foo/y.txt ? a.txt $ hg diff --nodates -S .. diff -r d254738c5f5e foo/y.txt --- a/foo/y.txt +++ b/foo/y.txt @@ -1,2 +1,3 @@ y1 y2 +y3 diff -r 9647f22de499 foo/bar/z.txt --- a/foo/bar/z.txt +++ b/foo/bar/z.txt @@ -1,2 +1,3 @@ z1 z2 +z3 $ cd .. Cleanup and final commit: $ rm -r dir $ hg commit --subrepos -m 2-3-2 committing subrepository foo committing subrepository foo/bar (glob) Test explicit path commands within subrepos: add/forget $ echo z1 > foo/bar/z2.txt $ hg status -S ? foo/bar/z2.txt $ hg add foo/bar/z2.txt $ hg status -S A foo/bar/z2.txt $ hg forget foo/bar/z2.txt $ hg status -S ? foo/bar/z2.txt $ hg forget foo/bar/z2.txt not removing foo/bar/z2.txt: file is already untracked (glob) [1] $ hg status -S ? foo/bar/z2.txt $ rm foo/bar/z2.txt Log with the relationships between repo and its subrepo: $ hg log --template '{rev}:{node|short} {desc}\n' 2:1326fa26d0c0 2-3-2 1:4b3c9ff4f66b 1-2-1 0:23376cbba0d8 0-0-0 $ hg -R foo log --template '{rev}:{node|short} {desc}\n' 3:65903cebad86 2-3-2 2:d254738c5f5e 0-2-1 1:8629ce7dcc39 0-1-0 0:af048e97ade2 0-0-0 $ hg -R foo/bar log --template '{rev}:{node|short} {desc}\n' 2:31ecbdafd357 2-3-2 1:9647f22de499 0-1-1 0:4904098473f9 0-0-0 Status between revisions: $ hg status -S $ hg status -S --rev 0:1 M .hgsubstate M foo/.hgsubstate M foo/bar/z.txt M foo/y.txt $ hg diff --nodates -S -I '**/?.txt' --rev 0:1 diff -r af048e97ade2 -r d254738c5f5e foo/y.txt --- a/foo/y.txt +++ b/foo/y.txt @@ -1,1 +1,2 @@ y1 +y2 diff -r 4904098473f9 -r 9647f22de499 foo/bar/z.txt --- a/foo/bar/z.txt +++ b/foo/bar/z.txt @@ -1,1 +1,2 @@ z1 +z2 Enable progress extension for archive tests: $ cp $HGRCPATH $HGRCPATH.no-progress $ cat >> $HGRCPATH <<EOF > [progress] > disable=False > assume-tty = 1 > delay = 0 > # set changedelay really large so we don't see nested topics > changedelay = 30000 > format = topic bar number > refresh = 0 > width = 60 > EOF Test archiving to a directory tree (the doubled lines in the output only show up in the test output, not in real usage): $ hg archive --subrepos ../archive \r (no-eol) (esc) archiving [ ] 0/3\r (no-eol) (esc) archiving [=============> ] 1/3\r (no-eol) (esc) archiving [===========================> ] 2/3\r (no-eol) (esc) archiving [==========================================>] 3/3\r (no-eol) (esc) \r (no-eol) (esc) \r (no-eol) (esc) archiving (foo) [ ] 0/3\r (no-eol) (esc) archiving (foo) [===========> ] 1/3\r (no-eol) (esc) archiving (foo) [=======================> ] 2/3\r (no-eol) (esc) archiving (foo) [====================================>] 3/3\r (no-eol) (esc) \r (no-eol) (esc) \r (no-eol) (esc) archiving (foo/bar) [ ] 0/1\r (no-eol) (glob) (esc) archiving (foo/bar) [================================>] 1/1\r (no-eol) (glob) (esc) \r (no-eol) (esc) $ find ../archive | sort ../archive ../archive/.hg_archival.txt ../archive/.hgsub ../archive/.hgsubstate ../archive/foo ../archive/foo/.hgsub ../archive/foo/.hgsubstate ../archive/foo/bar ../archive/foo/bar/z.txt ../archive/foo/y.txt ../archive/x.txt Test archiving to zip file (unzip output is unstable): $ hg archive --subrepos --prefix '.' ../archive.zip \r (no-eol) (esc) archiving [ ] 0/3\r (no-eol) (esc) archiving [=============> ] 1/3\r (no-eol) (esc) archiving [===========================> ] 2/3\r (no-eol) (esc) archiving [==========================================>] 3/3\r (no-eol) (esc) \r (no-eol) (esc) \r (no-eol) (esc) archiving (foo) [ ] 0/3\r (no-eol) (esc) archiving (foo) [===========> ] 1/3\r (no-eol) (esc) archiving (foo) [=======================> ] 2/3\r (no-eol) (esc) archiving (foo) [====================================>] 3/3\r (no-eol) (esc) \r (no-eol) (esc) \r (no-eol) (esc) archiving (foo/bar) [ ] 0/1\r (no-eol) (glob) (esc) archiving (foo/bar) [================================>] 1/1\r (no-eol) (glob) (esc) \r (no-eol) (esc) (unzip date formating is unstable, we do not care about it and glob it out) $ unzip -l ../archive.zip Archive: ../archive.zip Length [ ]* Date [ ]* Time [ ]* Name (re) [\- ]* (re) 172 [0-9:\- ]* .hg_archival.txt (re) 10 [0-9:\- ]* .hgsub (re) 45 [0-9:\- ]* .hgsubstate (re) 3 [0-9:\- ]* x.txt (re) 10 [0-9:\- ]* foo/.hgsub (re) 45 [0-9:\- ]* foo/.hgsubstate (re) 9 [0-9:\- ]* foo/y.txt (re) 9 [0-9:\- ]* foo/bar/z.txt (re) [\- ]* (re) 303 [ ]* 8 files (re) Test archiving a revision that references a subrepo that is not yet cloned: #if hardlink $ hg clone -U . ../empty \r (no-eol) (esc) linking [ <=> ] 1\r (no-eol) (esc) linking [ <=> ] 2\r (no-eol) (esc) linking [ <=> ] 3\r (no-eol) (esc) linking [ <=> ] 4\r (no-eol) (esc) linking [ <=> ] 5\r (no-eol) (esc) linking [ <=> ] 6\r (no-eol) (esc) linking [ <=> ] 7\r (no-eol) (esc) linking [ <=> ] 8\r (no-eol) (esc) \r (no-eol) (esc) #else $ hg clone -U . ../empty \r (no-eol) (esc) linking [ <=> ] 1 (no-eol) #endif $ cd ../empty #if hardlink $ hg archive --subrepos -r tip --prefix './' ../archive.tar.gz \r (no-eol) (esc) archiving [ ] 0/3\r (no-eol) (esc) archiving [=============> ] 1/3\r (no-eol) (esc) archiving [===========================> ] 2/3\r (no-eol) (esc) archiving [==========================================>] 3/3\r (no-eol) (esc) \r (no-eol) (esc) \r (no-eol) (esc) linking [ <=> ] 1\r (no-eol) (esc) linking [ <=> ] 2\r (no-eol) (esc) linking [ <=> ] 3\r (no-eol) (esc) linking [ <=> ] 4\r (no-eol) (esc) linking [ <=> ] 5\r (no-eol) (esc) linking [ <=> ] 6\r (no-eol) (esc) linking [ <=> ] 7\r (no-eol) (esc) linking [ <=> ] 8\r (no-eol) (esc) \r (no-eol) (esc) \r (no-eol) (esc) archiving (foo) [ ] 0/3\r (no-eol) (esc) archiving (foo) [===========> ] 1/3\r (no-eol) (esc) archiving (foo) [=======================> ] 2/3\r (no-eol) (esc) archiving (foo) [====================================>] 3/3\r (no-eol) (esc) \r (no-eol) (esc) \r (no-eol) (esc) linking [ <=> ] 1\r (no-eol) (esc) linking [ <=> ] 2\r (no-eol) (esc) linking [ <=> ] 3\r (no-eol) (esc) linking [ <=> ] 4\r (no-eol) (esc) linking [ <=> ] 5\r (no-eol) (esc) linking [ <=> ] 6\r (no-eol) (esc) \r (no-eol) (esc) \r (no-eol) (esc) archiving (foo/bar) [ ] 0/1\r (no-eol) (glob) (esc) archiving (foo/bar) [================================>] 1/1\r (no-eol) (glob) (esc) \r (no-eol) (esc) cloning subrepo foo from $TESTTMP/repo/foo cloning subrepo foo/bar from $TESTTMP/repo/foo/bar (glob) #else Note there's a slight output glitch on non-hardlink systems: the last "linking" progress topic never gets closed, leading to slight output corruption on that platform. $ hg archive --subrepos -r tip --prefix './' ../archive.tar.gz \r (no-eol) (esc) archiving [ ] 0/3\r (no-eol) (esc) archiving [=============> ] 1/3\r (no-eol) (esc) archiving [===========================> ] 2/3\r (no-eol) (esc) archiving [==========================================>] 3/3\r (no-eol) (esc) \r (no-eol) (esc) \r (no-eol) (esc) linking [ <=> ] 1\r (no-eol) (esc) cloning subrepo foo/bar from $TESTTMP/repo/foo/bar (glob) #endif Archive + subrepos uses '/' for all component separators $ tar -tzf ../archive.tar.gz | sort .hg_archival.txt .hgsub .hgsubstate foo/.hgsub foo/.hgsubstate foo/bar/z.txt foo/y.txt x.txt The newly cloned subrepos contain no working copy: $ hg -R foo summary parent: -1:000000000000 (no revision checked out) branch: default commit: (clean) update: 4 new changesets (update) Disable progress extension and cleanup: $ mv $HGRCPATH.no-progress $HGRCPATH Test archiving when there is a directory in the way for a subrepo created by archive: $ hg clone -U . ../almost-empty $ cd ../almost-empty $ mkdir foo $ echo f > foo/f $ hg archive --subrepos -r tip archive cloning subrepo foo from $TESTTMP/empty/foo abort: destination '$TESTTMP/almost-empty/foo' is not empty (in subrepo foo) (glob) [255] Clone and test outgoing: $ cd .. $ hg clone repo repo2 updating to branch default cloning subrepo foo from $TESTTMP/repo/foo cloning subrepo foo/bar from $TESTTMP/repo/foo/bar (glob) 3 files updated, 0 files merged, 0 files removed, 0 files unresolved $ cd repo2 $ hg outgoing -S comparing with $TESTTMP/repo (glob) searching for changes no changes found comparing with $TESTTMP/repo/foo searching for changes no changes found comparing with $TESTTMP/repo/foo/bar searching for changes no changes found [1] Make nested change: $ echo y4 >> foo/y.txt $ hg diff --nodates -S diff -r 65903cebad86 foo/y.txt --- a/foo/y.txt +++ b/foo/y.txt @@ -1,3 +1,4 @@ y1 y2 y3 +y4 $ hg commit --subrepos -m 3-4-2 committing subrepository foo $ hg outgoing -S comparing with $TESTTMP/repo (glob) searching for changes changeset: 3:2655b8ecc4ee tag: tip user: test date: Thu Jan 01 00:00:00 1970 +0000 summary: 3-4-2 comparing with $TESTTMP/repo/foo searching for changes changeset: 4:e96193d6cb36 tag: tip user: test date: Thu Jan 01 00:00:00 1970 +0000 summary: 3-4-2 comparing with $TESTTMP/repo/foo/bar searching for changes no changes found Switch to original repo and setup default path: $ cd ../repo $ echo '[paths]' >> .hg/hgrc $ echo 'default = ../repo2' >> .hg/hgrc Test incoming: $ hg incoming -S comparing with $TESTTMP/repo2 (glob) searching for changes changeset: 3:2655b8ecc4ee tag: tip user: test date: Thu Jan 01 00:00:00 1970 +0000 summary: 3-4-2 comparing with $TESTTMP/repo2/foo searching for changes changeset: 4:e96193d6cb36 tag: tip user: test date: Thu Jan 01 00:00:00 1970 +0000 summary: 3-4-2 comparing with $TESTTMP/repo2/foo/bar searching for changes no changes found $ hg incoming -S --bundle incoming.hg abort: cannot combine --bundle and --subrepos [255] Test missing subrepo: $ rm -r foo $ hg status -S warning: error "unknown revision '65903cebad86f1a84bd4f1134f62fa7dcb7a1c98'" in subrepository "foo" Issue2619: IndexError: list index out of range on hg add with subrepos The subrepo must sorts after the explicit filename. $ cd .. $ hg init test $ cd test $ hg init x $ echo abc > abc.txt $ hg ci -Am "abc" adding abc.txt $ echo "x = x" >> .hgsub $ hg add .hgsub $ touch a x/a $ hg add a x/a $ hg ci -Sm "added x" committing subrepository x $ echo abc > x/a $ hg revert --rev '.^' "set:subrepo('glob:x*')" abort: subrepository 'x' does not exist in 25ac2c9b3180! [255] $ cd ..