comparison tests/test-bundle-type.t @ 30442:41a8106789ca

util: implement zstd compression engine Now that zstd is vendored and being built (in some configurations), we can implement a compression engine for zstd! The zstd engine is a little different from existing engines. Because it may not always be present, we have to defer load the module in case importing it fails. We facilitate this via a cached property that holds a reference to the module or None. The "available" method is implemented to reflect reality. The zstd engine declares its ability to handle bundles using the "zstd" human name and the "ZS" internal name. The latter was chosen because internal names are 2 characters (by only convention I think) and "ZS" seems reasonable. The engine, like others, supports specifying the compression level. However, there are no consumers of this API that yet pass in that argument. I have plans to change that, so stay tuned. Since all we need to do to support bundle generation with a new compression engine is implement and register the compression engine, bundle generation with zstd "just works!" Tests demonstrating this have been added. How does performance of zstd for bundle generation compare? On the mozilla-unified repo, `hg bundle --all -t <engine>-v2` yields the following on my i7-6700K on Linux: engine CPU time bundle size vs orig size throughput none 97.0s 4,054,405,584 100.0% 41.8 MB/s bzip2 (l=9) 393.6s 975,343,098 24.0% 10.3 MB/s gzip (l=6) 184.0s 1,140,533,074 28.1% 22.0 MB/s zstd (l=1) 108.2s 1,119,434,718 27.6% 37.5 MB/s zstd (l=2) 111.3s 1,078,328,002 26.6% 36.4 MB/s zstd (l=3) 113.7s 1,011,823,727 25.0% 35.7 MB/s zstd (l=4) 116.0s 1,008,965,888 24.9% 35.0 MB/s zstd (l=5) 121.0s 977,203,148 24.1% 33.5 MB/s zstd (l=6) 131.7s 927,360,198 22.9% 30.8 MB/s zstd (l=7) 139.0s 912,808,505 22.5% 29.2 MB/s zstd (l=12) 198.1s 854,527,714 21.1% 20.5 MB/s zstd (l=18) 681.6s 789,750,690 19.5% 5.9 MB/s On compression, zstd for bundle generation delivers: * better compression than gzip with significantly less CPU utilization * better than bzip2 compression ratios while still being significantly faster than gzip * ability to aggressively tune compression level to achieve significantly smaller bundles That last point is important. With clone bundles, a server can pre-generate a bundle file, upload it to a static file server, and redirect clients to transparently download it during clone. The server could choose to produce a zstd bundle with the highest compression settings possible. This would take a very long time - a magnitude longer than a typical zstd bundle generation - but the result would be hundreds of megabytes smaller! For the clone volume we do at Mozilla, this could translate to petabytes of bandwidth savings per year and faster clones (due to smaller transfer size). I don't have detailed numbers to report on decompression. However, zstd decompression is fast: >1 GB/s output throughput on this machine, even through the Python bindings. And it can do that regardless of the compression level of the input. By the time you have enough data to worry about overhead of decompression, you have plenty of other things to worry about performance wise. zstd is wins all around. I can't wait to implement support for it on the wire protocol and in revlogs.
author Gregory Szorc <gregory.szorc@gmail.com>
date Fri, 11 Nov 2016 01:10:07 -0800
parents e65d33182fd4
children 76104a4899ad
comparison
equal deleted inserted replaced
30441:de48d3a0573a 30442:41a8106789ca
33 summary: a 33 summary: a
34 $ cd .. 34 $ cd ..
35 35
36 test bundle types 36 test bundle types
37 37
38 $ for t in "None" "bzip2" "gzip" "none-v2" "v2" "v1" "gzip-v1"; do 38 $ testbundle() {
39 > echo % test bundle type $t 39 > echo % test bundle type $1
40 > hg init t$t 40 > hg init t$1
41 > cd t1 41 > cd t1
42 > hg bundle -t $t ../b$t ../t$t 42 > hg bundle -t $1 ../b$1 ../t$1
43 > f -q -B6 -D ../b$t; echo 43 > f -q -B6 -D ../b$1; echo
44 > cd ../t$t 44 > cd ../t$1
45 > hg debugbundle ../b$t 45 > hg debugbundle ../b$1
46 > hg debugbundle --spec ../b$t 46 > hg debugbundle --spec ../b$1
47 > echo 47 > echo
48 > cd .. 48 > cd ..
49 > }
50
51 $ for t in "None" "bzip2" "gzip" "none-v2" "v2" "v1" "gzip-v1"; do
52 > testbundle $t
49 > done 53 > done
50 % test bundle type None 54 % test bundle type None
51 searching for changes 55 searching for changes
52 1 changesets found 56 1 changesets found
53 HG20\x00\x00 (esc) 57 HG20\x00\x00 (esc)
104 1 changesets found 108 1 changesets found
105 HG10GZ 109 HG10GZ
106 c35a0f9217e65d1fdb90c936ffa7dbe679f83ddf 110 c35a0f9217e65d1fdb90c936ffa7dbe679f83ddf
107 gzip-v1 111 gzip-v1
108 112
113 #if zstd
114
115 $ for t in "zstd" "zstd-v2"; do
116 > testbundle $t
117 > done
118 % test bundle type zstd
119 searching for changes
120 1 changesets found
121 HG20\x00\x00 (esc)
122 Stream params: sortdict([('Compression', 'ZS')])
123 changegroup -- "sortdict([('version', '02'), ('nbchanges', '1')])"
124 c35a0f9217e65d1fdb90c936ffa7dbe679f83ddf
125 zstd-v2
126
127 % test bundle type zstd-v2
128 searching for changes
129 1 changesets found
130 HG20\x00\x00 (esc)
131 Stream params: sortdict([('Compression', 'ZS')])
132 changegroup -- "sortdict([('version', '02'), ('nbchanges', '1')])"
133 c35a0f9217e65d1fdb90c936ffa7dbe679f83ddf
134 zstd-v2
135
136 #else
137
138 zstd is a valid engine but isn't available
139
140 $ hg -R t1 bundle -a -t zstd irrelevant.hg
141 abort: compression engine zstd could not be loaded
142 [255]
143
144 #endif
109 145
110 test garbage file 146 test garbage file
111 147
112 $ echo garbage > bgarbage 148 $ echo garbage > bgarbage
113 $ hg init tgarbage 149 $ hg init tgarbage