Mercurial > hg
view tests/test-clonebundles.t @ 30442:41a8106789ca
util: implement zstd compression engine
Now that zstd is vendored and being built (in some configurations), we
can implement a compression engine for zstd!
The zstd engine is a little different from existing engines. Because
it may not always be present, we have to defer load the module in case
importing it fails. We facilitate this via a cached property that holds
a reference to the module or None. The "available" method is
implemented to reflect reality.
The zstd engine declares its ability to handle bundles using the
"zstd" human name and the "ZS" internal name. The latter was chosen
because internal names are 2 characters (by only convention I think)
and "ZS" seems reasonable.
The engine, like others, supports specifying the compression level.
However, there are no consumers of this API that yet pass in that
argument. I have plans to change that, so stay tuned.
Since all we need to do to support bundle generation with a new
compression engine is implement and register the compression engine,
bundle generation with zstd "just works!" Tests demonstrating this
have been added.
How does performance of zstd for bundle generation compare? On the
mozilla-unified repo, `hg bundle --all -t <engine>-v2` yields the
following on my i7-6700K on Linux:
engine CPU time bundle size vs orig size throughput
none 97.0s 4,054,405,584 100.0% 41.8 MB/s
bzip2 (l=9) 393.6s 975,343,098 24.0% 10.3 MB/s
gzip (l=6) 184.0s 1,140,533,074 28.1% 22.0 MB/s
zstd (l=1) 108.2s 1,119,434,718 27.6% 37.5 MB/s
zstd (l=2) 111.3s 1,078,328,002 26.6% 36.4 MB/s
zstd (l=3) 113.7s 1,011,823,727 25.0% 35.7 MB/s
zstd (l=4) 116.0s 1,008,965,888 24.9% 35.0 MB/s
zstd (l=5) 121.0s 977,203,148 24.1% 33.5 MB/s
zstd (l=6) 131.7s 927,360,198 22.9% 30.8 MB/s
zstd (l=7) 139.0s 912,808,505 22.5% 29.2 MB/s
zstd (l=12) 198.1s 854,527,714 21.1% 20.5 MB/s
zstd (l=18) 681.6s 789,750,690 19.5% 5.9 MB/s
On compression, zstd for bundle generation delivers:
* better compression than gzip with significantly less CPU utilization
* better than bzip2 compression ratios while still being significantly
faster than gzip
* ability to aggressively tune compression level to achieve
significantly smaller bundles
That last point is important. With clone bundles, a server can
pre-generate a bundle file, upload it to a static file server, and
redirect clients to transparently download it during clone. The server
could choose to produce a zstd bundle with the highest compression
settings possible. This would take a very long time - a magnitude
longer than a typical zstd bundle generation - but the result would
be hundreds of megabytes smaller! For the clone volume we do at
Mozilla, this could translate to petabytes of bandwidth savings
per year and faster clones (due to smaller transfer size).
I don't have detailed numbers to report on decompression. However,
zstd decompression is fast: >1 GB/s output throughput on this machine,
even through the Python bindings. And it can do that regardless of the
compression level of the input. By the time you have enough data to
worry about overhead of decompression, you have plenty of other things
to worry about performance wise.
zstd is wins all around. I can't wait to implement support for it
on the wire protocol and in revlogs.
author | Gregory Szorc <gregory.szorc@gmail.com> |
---|---|
date | Fri, 11 Nov 2016 01:10:07 -0800 |
parents | 6b0741d6d234 |
children | a520aefb96f1 |
line wrap: on
line source
Set up a server $ cat >> $HGRCPATH << EOF > [format] > usegeneraldelta=yes > EOF $ hg init server $ cd server $ cat >> .hg/hgrc << EOF > [extensions] > clonebundles = > EOF $ touch foo $ hg -q commit -A -m 'add foo' $ touch bar $ hg -q commit -A -m 'add bar' $ hg serve -d -p $HGPORT --pid-file hg.pid --accesslog access.log $ cat hg.pid >> $DAEMON_PIDS $ cd .. Missing manifest should not result in server lookup $ hg --verbose clone -U http://localhost:$HGPORT no-manifest requesting all changes adding changesets adding manifests adding file changes added 2 changesets with 2 changes to 2 files $ cat server/access.log * - - [*] "GET /?cmd=capabilities HTTP/1.1" 200 - (glob) * - - [*] "GET /?cmd=batch HTTP/1.1" 200 - x-hgarg-1:cmds=heads+%3Bknown+nodes%3D (glob) * - - [*] "GET /?cmd=getbundle HTTP/1.1" 200 - x-hgarg-1:bundlecaps=HG20%2Cbundle2%3DHG20%250Achangegroup%253D01%252C02%250Adigests%253Dmd5%252Csha1%252Csha512%250Aerror%253Dabort%252Cunsupportedcontent%252Cpushraced%252Cpushkey%250Ahgtagsfnodes%250Alistkeys%250Apushkey%250Aremote-changegroup%253Dhttp%252Chttps&cg=1&common=0000000000000000000000000000000000000000&heads=aaff8d2ffbbf07a46dd1f05d8ae7877e3f56e2a2&listkeys=phases%2Cbookmarks (glob) Empty manifest file results in retrieval (the extension only checks if the manifest file exists) $ touch server/.hg/clonebundles.manifest $ hg --verbose clone -U http://localhost:$HGPORT empty-manifest no clone bundles available on remote; falling back to regular clone requesting all changes adding changesets adding manifests adding file changes added 2 changesets with 2 changes to 2 files Manifest file with invalid URL aborts $ echo 'http://does.not.exist/bundle.hg' > server/.hg/clonebundles.manifest $ hg clone http://localhost:$HGPORT 404-url applying clone bundle from http://does.not.exist/bundle.hg error fetching bundle: (.* not known|getaddrinfo failed|No address associated with hostname) (re) abort: error applying bundle (if this error persists, consider contacting the server operator or disable clone bundles via "--config ui.clonebundles=false") [255] Server is not running aborts $ echo "http://localhost:$HGPORT1/bundle.hg" > server/.hg/clonebundles.manifest $ hg clone http://localhost:$HGPORT server-not-runner applying clone bundle from http://localhost:$HGPORT1/bundle.hg error fetching bundle: (.* refused.*|Protocol not supported) (re) abort: error applying bundle (if this error persists, consider contacting the server operator or disable clone bundles via "--config ui.clonebundles=false") [255] Server returns 404 $ python $TESTDIR/dumbhttp.py -p $HGPORT1 --pid http.pid $ cat http.pid >> $DAEMON_PIDS $ hg clone http://localhost:$HGPORT running-404 applying clone bundle from http://localhost:$HGPORT1/bundle.hg HTTP error fetching bundle: HTTP Error 404: File not found abort: error applying bundle (if this error persists, consider contacting the server operator or disable clone bundles via "--config ui.clonebundles=false") [255] We can override failure to fall back to regular clone $ hg --config ui.clonebundlefallback=true clone -U http://localhost:$HGPORT 404-fallback applying clone bundle from http://localhost:$HGPORT1/bundle.hg HTTP error fetching bundle: HTTP Error 404: File not found falling back to normal clone requesting all changes adding changesets adding manifests adding file changes added 2 changesets with 2 changes to 2 files Bundle with partial content works $ hg -R server bundle --type gzip-v1 --base null -r 53245c60e682 partial.hg 1 changesets found We verify exact bundle content as an extra check against accidental future changes. If this output changes, we could break old clients. $ f --size --hexdump partial.hg partial.hg: size=207 0000: 48 47 31 30 47 5a 78 9c 63 60 60 98 17 ac 12 93 |HG10GZx.c``.....| 0010: f0 ac a9 23 45 70 cb bf 0d 5f 59 4e 4a 7f 79 21 |...#Ep..._YNJ.y!| 0020: 9b cc 40 24 20 a0 d7 ce 2c d1 38 25 cd 24 25 d5 |..@$ ...,.8%.$%.| 0030: d8 c2 22 cd 38 d9 24 cd 22 d5 c8 22 cd 24 cd 32 |..".8.$."..".$.2| 0040: d1 c2 d0 c4 c8 d2 32 d1 38 39 29 c9 34 cd d4 80 |......2.89).4...| 0050: ab 24 b5 b8 84 cb 40 c1 80 2b 2d 3f 9f 8b 2b 31 |.$....@..+-?..+1| 0060: 25 45 01 c8 80 9a d2 9b 65 fb e5 9e 45 bf 8d 7f |%E......e...E...| 0070: 9f c6 97 9f 2b 44 34 67 d9 ec 8e 0f a0 92 0b 75 |....+D4g.......u| 0080: 41 d6 24 59 18 a4 a4 9a a6 18 1a 5b 98 9b 5a 98 |A.$Y.......[..Z.| 0090: 9a 18 26 9b a6 19 98 1a 99 99 26 a6 18 9a 98 24 |..&.......&....$| 00a0: 26 59 a6 25 5a 98 a5 18 a6 24 71 41 35 b1 43 dc |&Y.%Z....$qA5.C.| 00b0: 16 b2 83 f7 e9 45 8b d2 56 c7 a3 1f 82 52 d7 8a |.....E..V....R..| 00c0: 78 ed fc d5 76 f1 36 35 dc 05 00 36 ed 5e c7 |x...v.65...6.^.| $ echo "http://localhost:$HGPORT1/partial.hg" > server/.hg/clonebundles.manifest $ hg clone -U http://localhost:$HGPORT partial-bundle applying clone bundle from http://localhost:$HGPORT1/partial.hg adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files finished applying clone bundle searching for changes adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files Incremental pull doesn't fetch bundle $ hg clone -r 53245c60e682 -U http://localhost:$HGPORT partial-clone adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files $ cd partial-clone $ hg pull pulling from http://localhost:$HGPORT/ searching for changes adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files (run 'hg update' to get a working copy) $ cd .. Bundle with full content works $ hg -R server bundle --type gzip-v2 --base null -r tip full.hg 2 changesets found Again, we perform an extra check against bundle content changes. If this content changes, clone bundles produced by new Mercurial versions may not be readable by old clients. $ f --size --hexdump full.hg full.hg: size=396 0000: 48 47 32 30 00 00 00 0e 43 6f 6d 70 72 65 73 73 |HG20....Compress| 0010: 69 6f 6e 3d 47 5a 78 9c 63 60 60 d0 e4 76 f6 70 |ion=GZx.c``..v.p| 0020: f4 73 77 75 0f f2 0f 0d 60 00 02 46 46 76 26 4e |.swu....`..FFv&N| 0030: c6 b2 d4 a2 e2 cc fc 3c 03 a3 bc a4 e4 8c c4 bc |.......<........| 0040: f4 d4 62 23 06 06 e6 19 40 f9 4d c1 2a 31 09 cf |..b#....@.M.*1..| 0050: 9a 3a 52 04 b7 fc db f0 95 e5 a4 f4 97 17 b2 c9 |.:R.............| 0060: 0c 14 00 02 e6 d9 99 25 1a a7 a4 99 a4 a4 1a 5b |.......%.......[| 0070: 58 a4 19 27 9b a4 59 a4 1a 59 a4 99 a4 59 26 5a |X..'..Y..Y...Y&Z| 0080: 18 9a 18 59 5a 26 1a 27 27 25 99 a6 99 1a 70 95 |...YZ&.''%....p.| 0090: a4 16 97 70 19 28 18 70 a5 e5 e7 73 71 25 a6 a4 |...p.(.p...sq%..| 00a0: 28 00 19 20 17 af fa df ab ff 7b 3f fb 92 dc 8b |(.. ......{?....| 00b0: 1f 62 bb 9e b7 d7 d9 87 3d 5a 44 89 2f b0 99 87 |.b......=ZD./...| 00c0: ec e2 54 63 43 e3 b4 64 43 73 23 33 43 53 0b 63 |..TcC..dCs#3CS.c| 00d0: d3 14 23 03 a0 fb 2c 2c 0c d3 80 1e 30 49 49 b1 |..#...,,....0II.| 00e0: 4c 4a 32 48 33 30 b0 34 42 b8 38 29 b1 08 e2 62 |LJ2H30.4B.8)...b| 00f0: 20 03 6a ca c2 2c db 2f f7 2c fa 6d fc fb 34 be | .j..,./.,.m..4.| 0100: fc 5c 21 a2 39 cb 66 77 7c 00 0d c3 59 17 14 58 |.\!.9.fw|...Y..X| 0110: 49 16 06 29 a9 a6 29 86 c6 16 e6 a6 16 a6 26 86 |I..)..).......&.| 0120: c9 a6 69 06 a6 46 66 a6 89 29 86 26 26 89 49 96 |..i..Ff..).&&.I.| 0130: 69 89 16 66 29 86 29 49 5c 20 07 3e 16 fe 23 ae |i..f).)I\ .>..#.| 0140: 26 da 1c ab 10 1f d1 f8 e3 b3 ef cd dd fc 0c 93 |&...............| 0150: 88 75 34 36 75 04 82 55 17 14 36 a4 38 10 04 d8 |.u46u..U..6.8...| 0160: 21 01 9a b1 83 f7 e9 45 8b d2 56 c7 a3 1f 82 52 |!......E..V....R| 0170: d7 8a 78 ed fc d5 76 f1 36 25 81 89 c7 ad ec 90 |..x...v.6%......| 0180: 54 47 75 2b 89 49 b1 00 d2 8a eb 92 |TGu+.I......| $ echo "http://localhost:$HGPORT1/full.hg" > server/.hg/clonebundles.manifest $ hg clone -U http://localhost:$HGPORT full-bundle applying clone bundle from http://localhost:$HGPORT1/full.hg adding changesets adding manifests adding file changes added 2 changesets with 2 changes to 2 files finished applying clone bundle searching for changes no changes found Feature works over SSH $ hg clone -U -e "python \"$TESTDIR/dummyssh\"" ssh://user@dummy/server ssh-full-clone applying clone bundle from http://localhost:$HGPORT1/full.hg adding changesets adding manifests adding file changes added 2 changesets with 2 changes to 2 files finished applying clone bundle searching for changes no changes found Entry with unknown BUNDLESPEC is filtered and not used $ cat > server/.hg/clonebundles.manifest << EOF > http://bad.entry1 BUNDLESPEC=UNKNOWN > http://bad.entry2 BUNDLESPEC=xz-v1 > http://bad.entry3 BUNDLESPEC=none-v100 > http://localhost:$HGPORT1/full.hg BUNDLESPEC=gzip-v2 > EOF $ hg clone -U http://localhost:$HGPORT filter-unknown-type applying clone bundle from http://localhost:$HGPORT1/full.hg adding changesets adding manifests adding file changes added 2 changesets with 2 changes to 2 files finished applying clone bundle searching for changes no changes found Automatic fallback when all entries are filtered $ cat > server/.hg/clonebundles.manifest << EOF > http://bad.entry BUNDLESPEC=UNKNOWN > EOF $ hg clone -U http://localhost:$HGPORT filter-all no compatible clone bundles available on server; falling back to regular clone (you may want to report this to the server operator) requesting all changes adding changesets adding manifests adding file changes added 2 changesets with 2 changes to 2 files URLs requiring SNI are filtered in Python <2.7.9 $ cp full.hg sni.hg $ cat > server/.hg/clonebundles.manifest << EOF > http://localhost:$HGPORT1/sni.hg REQUIRESNI=true > http://localhost:$HGPORT1/full.hg > EOF #if sslcontext Python 2.7.9+ support SNI $ hg clone -U http://localhost:$HGPORT sni-supported applying clone bundle from http://localhost:$HGPORT1/sni.hg adding changesets adding manifests adding file changes added 2 changesets with 2 changes to 2 files finished applying clone bundle searching for changes no changes found #else Python <2.7.9 will filter SNI URLs $ hg clone -U http://localhost:$HGPORT sni-unsupported applying clone bundle from http://localhost:$HGPORT1/full.hg adding changesets adding manifests adding file changes added 2 changesets with 2 changes to 2 files finished applying clone bundle searching for changes no changes found #endif Stream clone bundles are supported $ hg -R server debugcreatestreamclonebundle packed.hg writing 613 bytes for 4 files bundle requirements: generaldelta, revlogv1 No bundle spec should work $ cat > server/.hg/clonebundles.manifest << EOF > http://localhost:$HGPORT1/packed.hg > EOF $ hg clone -U http://localhost:$HGPORT stream-clone-no-spec applying clone bundle from http://localhost:$HGPORT1/packed.hg 4 files to transfer, 613 bytes of data transferred 613 bytes in *.* seconds (*) (glob) finished applying clone bundle searching for changes no changes found Bundle spec without parameters should work $ cat > server/.hg/clonebundles.manifest << EOF > http://localhost:$HGPORT1/packed.hg BUNDLESPEC=none-packed1 > EOF $ hg clone -U http://localhost:$HGPORT stream-clone-vanilla-spec applying clone bundle from http://localhost:$HGPORT1/packed.hg 4 files to transfer, 613 bytes of data transferred 613 bytes in *.* seconds (*) (glob) finished applying clone bundle searching for changes no changes found Bundle spec with format requirements should work $ cat > server/.hg/clonebundles.manifest << EOF > http://localhost:$HGPORT1/packed.hg BUNDLESPEC=none-packed1;requirements%3Drevlogv1 > EOF $ hg clone -U http://localhost:$HGPORT stream-clone-supported-requirements applying clone bundle from http://localhost:$HGPORT1/packed.hg 4 files to transfer, 613 bytes of data transferred 613 bytes in *.* seconds (*) (glob) finished applying clone bundle searching for changes no changes found Stream bundle spec with unknown requirements should be filtered out $ cat > server/.hg/clonebundles.manifest << EOF > http://localhost:$HGPORT1/packed.hg BUNDLESPEC=none-packed1;requirements%3Drevlogv42 > EOF $ hg clone -U http://localhost:$HGPORT stream-clone-unsupported-requirements no compatible clone bundles available on server; falling back to regular clone (you may want to report this to the server operator) requesting all changes adding changesets adding manifests adding file changes added 2 changesets with 2 changes to 2 files Set up manifest for testing preferences (Remember, the TYPE does not have to match reality - the URL is important) $ cp full.hg gz-a.hg $ cp full.hg gz-b.hg $ cp full.hg bz2-a.hg $ cp full.hg bz2-b.hg $ cat > server/.hg/clonebundles.manifest << EOF > http://localhost:$HGPORT1/gz-a.hg BUNDLESPEC=gzip-v2 extra=a > http://localhost:$HGPORT1/bz2-a.hg BUNDLESPEC=bzip2-v2 extra=a > http://localhost:$HGPORT1/gz-b.hg BUNDLESPEC=gzip-v2 extra=b > http://localhost:$HGPORT1/bz2-b.hg BUNDLESPEC=bzip2-v2 extra=b > EOF Preferring an undefined attribute will take first entry $ hg --config ui.clonebundleprefers=foo=bar clone -U http://localhost:$HGPORT prefer-foo applying clone bundle from http://localhost:$HGPORT1/gz-a.hg adding changesets adding manifests adding file changes added 2 changesets with 2 changes to 2 files finished applying clone bundle searching for changes no changes found Preferring bz2 type will download first entry of that type $ hg --config ui.clonebundleprefers=COMPRESSION=bzip2 clone -U http://localhost:$HGPORT prefer-bz applying clone bundle from http://localhost:$HGPORT1/bz2-a.hg adding changesets adding manifests adding file changes added 2 changesets with 2 changes to 2 files finished applying clone bundle searching for changes no changes found Preferring multiple values of an option works $ hg --config ui.clonebundleprefers=COMPRESSION=unknown,COMPRESSION=bzip2 clone -U http://localhost:$HGPORT prefer-multiple-bz applying clone bundle from http://localhost:$HGPORT1/bz2-a.hg adding changesets adding manifests adding file changes added 2 changesets with 2 changes to 2 files finished applying clone bundle searching for changes no changes found Sorting multiple values should get us back to original first entry $ hg --config ui.clonebundleprefers=BUNDLESPEC=unknown,BUNDLESPEC=gzip-v2,BUNDLESPEC=bzip2-v2 clone -U http://localhost:$HGPORT prefer-multiple-gz applying clone bundle from http://localhost:$HGPORT1/gz-a.hg adding changesets adding manifests adding file changes added 2 changesets with 2 changes to 2 files finished applying clone bundle searching for changes no changes found Preferring multiple attributes has correct order $ hg --config ui.clonebundleprefers=extra=b,BUNDLESPEC=bzip2-v2 clone -U http://localhost:$HGPORT prefer-separate-attributes applying clone bundle from http://localhost:$HGPORT1/bz2-b.hg adding changesets adding manifests adding file changes added 2 changesets with 2 changes to 2 files finished applying clone bundle searching for changes no changes found Test where attribute is missing from some entries $ cat > server/.hg/clonebundles.manifest << EOF > http://localhost:$HGPORT1/gz-a.hg BUNDLESPEC=gzip-v2 > http://localhost:$HGPORT1/bz2-a.hg BUNDLESPEC=bzip2-v2 > http://localhost:$HGPORT1/gz-b.hg BUNDLESPEC=gzip-v2 extra=b > http://localhost:$HGPORT1/bz2-b.hg BUNDLESPEC=bzip2-v2 extra=b > EOF $ hg --config ui.clonebundleprefers=extra=b clone -U http://localhost:$HGPORT prefer-partially-defined-attribute applying clone bundle from http://localhost:$HGPORT1/gz-b.hg adding changesets adding manifests adding file changes added 2 changesets with 2 changes to 2 files finished applying clone bundle searching for changes no changes found