Mercurial > hg
view tests/test-globalopts.t @ 30818:4c0a5a256ae8
localrepo: experimental support for non-zlib revlog compression
The final part of integrating the compression manager APIs into
revlog storage is the plumbing for repositories to advertise they
are using non-zlib storage and for revlogs to instantiate a non-zlib
compression engine.
The main intent of the compression manager work was to zstd all
of the things. Adding zstd to revlogs has proved to be more involved
than other places because revlogs are... special. Very small inputs
and the use of delta chains (which are themselves a form of
compression) are a completely different use case from streaming
compression, which bundles and the wire protocol employ. I've
conducted numerous experiments with zstd in revlogs and have yet
to formalize compression settings and a storage architecture that
I'm confident I won't regret later. In other words, I'm not yet
ready to commit to a new mechanism for using zstd - or any other
compression format - in revlogs.
That being said, having some support for zstd (and other compression
formats) in revlogs in core is beneficial. It can allow others to
conduct experiments.
This patch introduces *highly experimental* support for non-zlib
compression formats in revlogs. Introduced is a config option to
control which compression engine to use. Also introduced is a namespace
of "exp-compression-*" requirements to denote support for non-zlib
compression in revlogs. I've prefixed the namespace with "exp-"
(short for "experimental") because I'm not confident of the
requirements "schema" and in no way want to give the illusion of
supporting these requirements in the future. I fully intend to drop
support for these requirements once we figure out what we're doing
with zstd in revlogs.
A good portion of the patch is teaching the requirements system
about registered compression engines and passing the requested
compression engine as an opener option so revlogs can instantiate
the proper compression engine for new operations.
That's a verbose way of saying "we can now use zstd in revlogs!"
On an `hg pull` conversion of the mozilla-unified repo with no extra
redelta settings (like aggressivemergedeltas), we can see the impact
of zstd vs zlib in revlogs:
$ hg perfrevlogchunks -c
! chunk
! wall 2.032052 comb 2.040000 user 1.990000 sys 0.050000 (best of 5)
! wall 1.866360 comb 1.860000 user 1.820000 sys 0.040000 (best of 6)
! chunk batch
! wall 1.877261 comb 1.870000 user 1.860000 sys 0.010000 (best of 6)
! wall 1.705410 comb 1.710000 user 1.690000 sys 0.020000 (best of 6)
$ hg perfrevlogchunks -m
! chunk
! wall 2.721427 comb 2.720000 user 2.640000 sys 0.080000 (best of 4)
! wall 2.035076 comb 2.030000 user 1.950000 sys 0.080000 (best of 5)
! chunk batch
! wall 2.614561 comb 2.620000 user 2.580000 sys 0.040000 (best of 4)
! wall 1.910252 comb 1.910000 user 1.880000 sys 0.030000 (best of 6)
$ hg perfrevlog -c -d 1
! wall 4.812885 comb 4.820000 user 4.800000 sys 0.020000 (best of 3)
! wall 4.699621 comb 4.710000 user 4.700000 sys 0.010000 (best of 3)
$ hg perfrevlog -m -d 1000
! wall 34.252800 comb 34.250000 user 33.730000 sys 0.520000 (best of 3)
! wall 24.094999 comb 24.090000 user 23.320000 sys 0.770000 (best of 3)
Only modest wins for the changelog. But manifest reading is
significantly faster. What's going on?
One reason might be data volume. zstd decompresses faster. So given
more bytes, it will put more distance between it and zlib.
Another reason is size. In the current design, zstd revlogs are
*larger*:
debugcreatestreamclonebundle (size in bytes)
zlib: 1,638,852,492
zstd: 1,680,601,332
I haven't investigated this fully, but I reckon a significant cause of
larger revlogs is that the zstd frame/header has more bytes than
zlib's. For very small inputs or data that doesn't compress well, we'll
tend to store more uncompressed chunks than with zlib (because the
compressed size isn't smaller than original). This will make revlog
reading faster because it is doing less decompression.
Moving on to bundle performance:
$ hg bundle -a -t none-v2 (total CPU time)
zlib: 102.79s
zstd: 97.75s
So, marginal CPU decrease for reading all chunks in all revlogs
(this is somewhat disappointing).
$ hg bundle -a -t <engine>-v2 (total CPU time)
zlib: 191.59s
zstd: 115.36s
This last test effectively measures the difference between zlib->zlib
and zstd->zstd for revlogs to bundle. This is a rough approximation of
what a server does during `hg clone`.
There are some promising results for zstd. But not enough for me to
feel comfortable advertising it to users. We'll get there...
author | Gregory Szorc <gregory.szorc@gmail.com> |
---|---|
date | Fri, 13 Jan 2017 20:16:56 -0800 |
parents | e520f0f4b1cf |
children | 900996da577a |
line wrap: on
line source
$ hg init a $ cd a $ echo a > a $ hg ci -A -d'1 0' -m a adding a $ cd .. $ hg init b $ cd b $ echo b > b $ hg ci -A -d'1 0' -m b adding b $ cd .. $ hg clone a c updating to branch default 1 files updated, 0 files merged, 0 files removed, 0 files unresolved $ cd c $ cat >> .hg/hgrc <<EOF > [paths] > relative = ../a > EOF $ hg pull -f ../b pulling from ../b searching for changes warning: repository is unrelated requesting all changes adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files (+1 heads) (run 'hg heads' to see heads, 'hg merge' to merge) $ hg merge 1 files updated, 0 files merged, 0 files removed, 0 files unresolved (branch merge, don't forget to commit) $ cd .. Testing -R/--repository: $ hg -R a tip changeset: 0:8580ff50825a tag: tip user: test date: Thu Jan 01 00:00:01 1970 +0000 summary: a $ hg --repository b tip changeset: 0:b6c483daf290 tag: tip user: test date: Thu Jan 01 00:00:01 1970 +0000 summary: b -R with a URL: $ hg -R file:a identify 8580ff50825a tip $ hg -R file://localhost/`pwd`/a/ identify 8580ff50825a tip -R with path aliases: $ cd c $ hg -R default identify 8580ff50825a tip $ hg -R relative identify 8580ff50825a tip $ echo '[paths]' >> $HGRCPATH $ echo 'relativetohome = a' >> $HGRCPATH $ HOME=`pwd`/../ hg -R relativetohome identify 8580ff50825a tip $ cd .. #if no-outer-repo Implicit -R: $ hg ann a/a 0: a $ hg ann a/a a/a 0: a $ hg ann a/a b/b abort: no repository found in '$TESTTMP' (.hg not found)! [255] $ hg -R b ann a/a abort: a/a not under root '$TESTTMP/b' (glob) (consider using '--cwd b') [255] $ hg log abort: no repository found in '$TESTTMP' (.hg not found)! [255] #endif Abbreviation of long option: $ hg --repo c tip changeset: 1:b6c483daf290 tag: tip parent: -1:000000000000 user: test date: Thu Jan 01 00:00:01 1970 +0000 summary: b earlygetopt with duplicate options (36d23de02da1): $ hg --cwd a --cwd b --cwd c tip changeset: 1:b6c483daf290 tag: tip parent: -1:000000000000 user: test date: Thu Jan 01 00:00:01 1970 +0000 summary: b $ hg --repo c --repository b -R a tip changeset: 0:8580ff50825a tag: tip user: test date: Thu Jan 01 00:00:01 1970 +0000 summary: a earlygetopt short option without following space: $ hg -q -Rb tip 0:b6c483daf290 earlygetopt with illegal abbreviations: $ hg --confi "foo.bar=baz" abort: option --config may not be abbreviated! [255] $ hg --cw a tip abort: option --cwd may not be abbreviated! [255] $ hg --rep a tip abort: option -R has to be separated from other options (e.g. not -qR) and --repository may only be abbreviated as --repo! [255] $ hg --repositor a tip abort: option -R has to be separated from other options (e.g. not -qR) and --repository may only be abbreviated as --repo! [255] $ hg -qR a tip abort: option -R has to be separated from other options (e.g. not -qR) and --repository may only be abbreviated as --repo! [255] $ hg -qRa tip abort: option -R has to be separated from other options (e.g. not -qR) and --repository may only be abbreviated as --repo! [255] Testing --cwd: $ hg --cwd a parents changeset: 0:8580ff50825a tag: tip user: test date: Thu Jan 01 00:00:01 1970 +0000 summary: a Testing -y/--noninteractive - just be sure it is parsed: $ hg --cwd a tip -q --noninteractive 0:8580ff50825a $ hg --cwd a tip -q -y 0:8580ff50825a Testing -q/--quiet: $ hg -R a -q tip 0:8580ff50825a $ hg -R b -q tip 0:b6c483daf290 $ hg -R c --quiet parents 0:8580ff50825a 1:b6c483daf290 Testing -v/--verbose: $ hg --cwd c head -v changeset: 1:b6c483daf290 tag: tip parent: -1:000000000000 user: test date: Thu Jan 01 00:00:01 1970 +0000 files: b description: b changeset: 0:8580ff50825a user: test date: Thu Jan 01 00:00:01 1970 +0000 files: a description: a $ hg --cwd b tip --verbose changeset: 0:b6c483daf290 tag: tip user: test date: Thu Jan 01 00:00:01 1970 +0000 files: b description: b Testing --config: $ hg --cwd c --config paths.quuxfoo=bar paths | grep quuxfoo > /dev/null && echo quuxfoo quuxfoo $ hg --cwd c --config '' tip -q abort: malformed --config option: '' (use --config section.name=value) [255] $ hg --cwd c --config a.b tip -q abort: malformed --config option: 'a.b' (use --config section.name=value) [255] $ hg --cwd c --config a tip -q abort: malformed --config option: 'a' (use --config section.name=value) [255] $ hg --cwd c --config a.= tip -q abort: malformed --config option: 'a.=' (use --config section.name=value) [255] $ hg --cwd c --config .b= tip -q abort: malformed --config option: '.b=' (use --config section.name=value) [255] Testing --debug: $ hg --cwd c log --debug changeset: 1:b6c483daf2907ce5825c0bb50f5716226281cc1a tag: tip phase: public parent: -1:0000000000000000000000000000000000000000 parent: -1:0000000000000000000000000000000000000000 manifest: 1:23226e7a252cacdc2d99e4fbdc3653441056de49 user: test date: Thu Jan 01 00:00:01 1970 +0000 files+: b extra: branch=default description: b changeset: 0:8580ff50825a50c8f716709acdf8de0deddcd6ab phase: public parent: -1:0000000000000000000000000000000000000000 parent: -1:0000000000000000000000000000000000000000 manifest: 0:a0c8bcbbb45c63b90b70ad007bf38961f64f2af0 user: test date: Thu Jan 01 00:00:01 1970 +0000 files+: a extra: branch=default description: a Testing --traceback: $ hg --cwd c --config x --traceback id 2>&1 | grep -i 'traceback' Traceback (most recent call last): Testing --time: $ hg --cwd a --time id 8580ff50825a tip time: real * (glob) Testing --version: $ hg --version -q Mercurial Distributed SCM * (glob) hide outer repo $ hg init Testing -h/--help: $ hg -h Mercurial Distributed SCM list of commands: add add the specified files on the next commit addremove add all new files, delete all missing files annotate show changeset information by line for each file archive create an unversioned archive of a repository revision backout reverse effect of earlier changeset bisect subdivision search of changesets bookmarks create a new bookmark or list existing bookmarks branch set or show the current branch name branches list repository named branches bundle create a changegroup file cat output the current or given revision of files clone make a copy of an existing repository commit commit the specified files or all outstanding changes config show combined config settings from all hgrc files copy mark files as copied for the next commit diff diff repository (or selected files) export dump the header and diffs for one or more changesets files list tracked files forget forget the specified files on the next commit graft copy changes from other branches onto the current branch grep search revision history for a pattern in specified files heads show branch heads help show help for a given topic or a help overview identify identify the working directory or specified revision import import an ordered set of patches incoming show new changesets found in source init create a new repository in the given directory log show revision history of entire repository or files manifest output the current or given revision of the project manifest merge merge another revision into working directory outgoing show changesets not found in the destination paths show aliases for remote repositories phase set or show the current phase name pull pull changes from the specified source push push changes to the specified destination recover roll back an interrupted transaction remove remove the specified files on the next commit rename rename files; equivalent of copy + remove resolve redo merges or set/view the merge status of files revert restore files to their checkout state root print the root (top) of the current working directory serve start stand-alone webserver status show changed files in the working directory summary summarize working directory state tag add one or more tags for the current or given revision tags list repository tags unbundle apply one or more changegroup files update update working directory (or switch revisions) verify verify the integrity of the repository version output version and copyright information additional help topics: config Configuration Files dates Date Formats diffs Diff Formats environment Environment Variables extensions Using Additional Features filesets Specifying File Sets glossary Glossary hgignore Syntax for Mercurial Ignore Files hgweb Configuring hgweb internals Technical implementation topics merge-tools Merge Tools patterns File Name Patterns phases Working with Phases revisions Specifying Revisions scripting Using Mercurial from scripts and automation subrepos Subrepositories templating Template Usage urls URL Paths (use 'hg help -v' to show built-in aliases and global options) $ hg --help Mercurial Distributed SCM list of commands: add add the specified files on the next commit addremove add all new files, delete all missing files annotate show changeset information by line for each file archive create an unversioned archive of a repository revision backout reverse effect of earlier changeset bisect subdivision search of changesets bookmarks create a new bookmark or list existing bookmarks branch set or show the current branch name branches list repository named branches bundle create a changegroup file cat output the current or given revision of files clone make a copy of an existing repository commit commit the specified files or all outstanding changes config show combined config settings from all hgrc files copy mark files as copied for the next commit diff diff repository (or selected files) export dump the header and diffs for one or more changesets files list tracked files forget forget the specified files on the next commit graft copy changes from other branches onto the current branch grep search revision history for a pattern in specified files heads show branch heads help show help for a given topic or a help overview identify identify the working directory or specified revision import import an ordered set of patches incoming show new changesets found in source init create a new repository in the given directory log show revision history of entire repository or files manifest output the current or given revision of the project manifest merge merge another revision into working directory outgoing show changesets not found in the destination paths show aliases for remote repositories phase set or show the current phase name pull pull changes from the specified source push push changes to the specified destination recover roll back an interrupted transaction remove remove the specified files on the next commit rename rename files; equivalent of copy + remove resolve redo merges or set/view the merge status of files revert restore files to their checkout state root print the root (top) of the current working directory serve start stand-alone webserver status show changed files in the working directory summary summarize working directory state tag add one or more tags for the current or given revision tags list repository tags unbundle apply one or more changegroup files update update working directory (or switch revisions) verify verify the integrity of the repository version output version and copyright information additional help topics: config Configuration Files dates Date Formats diffs Diff Formats environment Environment Variables extensions Using Additional Features filesets Specifying File Sets glossary Glossary hgignore Syntax for Mercurial Ignore Files hgweb Configuring hgweb internals Technical implementation topics merge-tools Merge Tools patterns File Name Patterns phases Working with Phases revisions Specifying Revisions scripting Using Mercurial from scripts and automation subrepos Subrepositories templating Template Usage urls URL Paths (use 'hg help -v' to show built-in aliases and global options) Not tested: --debugger