Mercurial > hg
changeset 40918:3764330f76a6
sparse-revlog: enabled by default
The feature provides large benefits. It now seems mature enough to be enabled
by default.
* It solves catastrophic issues regarding delta storage in revlog,
* It allows for shorter delta chain in all repositories, improving
performances.
Running benchmark of a wide range of operation did not reveal problematic
impact. Performance gains are observed where expected.
The format is supported by Mercurial version 4.7. So it seems safe to enable
it by default now.
Here is a reminder of key numbers regarding this delta strategy effect on
repository size and performance.
Effect on Size:
===============
For repositories with a lot of branches, sparse-revlog significantly improve
size, fixing limitation associated with the span of a delta chain. In
addition, sparse-revlog, deal well with limitations of the delta chain length.
For large repositories, this allows for a stiff reduction of the delta chain
without a problematic impact on the repository size. This delta chain length
improvement helps all repositories, not just the ones with many branches.
As a reminder, here are the default chain limits for each "format":
* no-sparse: none
* sparse: 1000
Mercurial
---------
Manifest Size:
limit | none | 1000
------------|-------------|------------
no-sparse | 6 143 044 | 6 269 496
sparse | 5 798 796 | 5 827 025
Manifest Chain length data
limit || none || 1000
value || average | max || average | max
------------||---------|---------||---------|---------
no-sparse || 429 | 1 397 || 397 | 1 000
sparse || 326 | 1 290 || 313 | 1 000
Full Store Size
limit | none | 1000
------------|-------------|------------
no-sparse | 46 944 775 | 47 166 129
sparse | 46 622 445 | 46 723 774
pypy
----
Manifest Size:
limit | none | 1000
------------|-------------|------------
no-sparse | 52 941 760 | 56 200 970
sparse | 26 348 229 | 27 384 133
Manifest Chain length data
limit || none || 1000
value || average | max || average | max
------------||---------|---------||---------|---------
no-sparse || 769 | 3 889 || 390 | 1 000
sparse || 1 223 | 3 846 || 495 | 1 000
Full Store Size
limit | none | 1000
------------|-------------|------------
no-sparse | 336 050 203 | 339 309 413
sparse | 338 673 985 | 339 709 889
Mozilla
-------
Manifest Size:
limit | none | 1000
------------|----------------|---------------
no-sparse | 215 096 339 | 1 708 853 525
sparse | 188 947 271 | 278 894 170
Manifest Chain length data
limit || none || 1000
value || average | max || average | max
------------||---------|---------||---------|--------
no-sparse || 20 454 | 59 562 || 491 | 1 000
sparse || 23 509 | 69 891 || 489 | 1 000
Full Store Size
limit | none | 1000
------------|----------------|---------------
no-sparse | 2 377 578 715 | 3 876 258 798
sparse | 2 441 677 137 | 2 535 997 381
Netbeans
--------
Manifest Size:
limit | none | 1000
------------|----------------|---------------
no-sparse | 130 088 982 | 741 590 565
sparse | 118 836 887 | 159 161 207
Manifest Chain length data
limit || none || 1000
value || average | max || average | max
------------||---------|---------||---------|---------
no-sparse || 19 321 | 61 397 || 510 | 1 000
sparse || 21 240 | 61 583 || 503 | 1 000
Full Store Size
limit | none | 1000
------------|----------------|---------------
no-sparse | 1 160 013 008 | 1 771 514 591
sparse | 1 164 959 988 | 1 205 284 308
Private repo #1
---------------
Manifest Size:
limit | none | 1000
------------|-----------------|---------------
no-sparse | 33 725 285 081 | 33 724 834 190
sparse | 350 542 420 | 423 470 579
Manifest Chain length data
limit || none || 1000
value || average | max || average | max
------------||---------|---------||---------|---------
no-sparse || 282 | 8 885 || 113 | 1 000
snapshot || 3 655 | 8 951 || 530 | 1 000
Full Store Size
limit | none | 1000
------------|----------------|---------------
no-sparse | 41 544 149 652 | 41 543 698 761
sparse | 8 448 037 300 | 8 520 965 459
Effect on speed:
================
Performances are strongly impacted by the delta chain length. Longer chain
results in slower revision restoration. For this reason, the 1000 chain limit
introduced by sparse-revlog helps repository with previously large chains a
lot. In our corpus, this means `netbeans` and `mozilla-central` who suffered
from unreasonable manifest delta chain length.
Another way sparse revlog helps, is by producing better delta's. For
repositories with many branches, the pathological patterns that resulted in
many sub-optimal deltas are gone. Smaller delta help with operations where
deltas are directly relevant, like bundle.
However, the sparse-revlog logic introduces some extra processing and a more
throughout testing of possible delta candidates. Adding an extra cost in some
cases. This cost is usually counterbalanced by the other performance gain.
However, for smaller repositories not affected by delta chain length issues or
branching related issues, this might make things a bit slower. However, these
are also repository where revlog performance is dwarfed by other costs.
Below are the summary of some timing from the performance test suite running
at `http://perf.octobus.net/` for a handful of key commands and operation.
It is important to keep in mind that most of this command works on the tip
part of the repository. The non-sparse and sparse version produce different
delta chains and the tip revision can end up at an arbitrary point of these
chains. This will impact some performance number listed in this summary.
For the record: here is the delta chain length for the tip revision of
manifest log in the benchmarked repository:
| no-sparse | sparse |
mercurial | 94 | 904 |
pypy | 23 | 673 |
netbeans | 4158 | 258 |
mozilla | 63263 | 781 |
As you can see, the chain length for mercurial and pypy turn out to be
significantly longer. The netbeans and mozilla one get shorter because these
repositories benefit from the maximum chain length.
Timing for `hg commit`:
-----------------------
The time taken by `hg commit` does not varies significantly, no drawback for
using sparse here.
| no-sparse | sparse |
mercurial | 68.1ms | 66.7ms |
pypy | 95.0ms | 94.1ms |
netbeans | 614.0ms | 611.0ms |
mozilla | 1340.0ms | 1.320.0ms |
Check the final section for statistics on a wider array of write.
Timing for bundling 10 000 changesets
-------------------------------------
The repository that benefits from better delta see a good performance boost.
The other ones are not significantly affected.
| no-sparse | sparse |
mercurial | 3.1s | 3.0s |
pypy | 25.1s | 7.5s |
netbeans | 24.2s | 17.0s |
mozilla | 23.7s | 25.0s |
Timing for unbundling 1 000 changesets
--------------------------------------
Mercurial and mozilla are unaffected. The pypy repository benefit well from
the better delta.
However, the netbeans repository takes a visible hit. Digging that difference
reveals that it comes from the sparse-revlog bundle having to deal with a
snapshot that was re-encoded in the bundle. The slow path for adding new a revision
had to be triggered for it, slowing things down. The Sparse versions do not have
such snapshot to handle similar cases in the tested configuration.
| no-sparse | sparse |
mercurial | 519ms | 502ms |
pypy | 1.270ms | 886ms |
netbeans | 1.370ms | 2.250ms |
mozilla | 3.230ms | 3.210ms |
Netbeans benefits from the better deltas in other dimensions too. For
example, the produced bundle is significantly smaller:
* netbeans-no-sparse.hg: 2.3MB
* netbeans-sparse.hg: 1.9MB
Timing to restore the tip most manifest entry:
----------------------------------------------
Nothing surprising here. The timing for mercurial and pypy are within a small
range where they won't affect performance much. In our tested case, they are
slower as they use a longer chain.
Timing for netbeans and mozilla improves a lot. Removing a significant amount
of time.
| no-sparse | sparse |
mercurial | 1.09ms | 3.15ms |
pypy | 4.11ms | 10.70ms |
netbeans | 239.00ms | 112.00ms |
mozilla | 688.00ms | 198.00ms |
Reading 100 revision in descending order:
-----------------------------------------
We see the same kind of effect when reading the last 100 revisions. Large
boost for netbeans and mozilla, as they use much smaller delta chain.
Mercurial and pypy longer chain means slower reads, but nothing gets out of
control.
| no-sparse | sparse |
mercurial | 0.089s | 0.268s |
pypy | 0.259s | 0.698s |
netbeans | 125.000s | 20.600s |
mozilla | 23.000s | 11.400s |
Writing from full text: statistic for the last 30K revisions
------------------------------------------------------------
This benchmark adds revisions to revlog from their full text. This is similar
to the work done during a commit, but for a large amount of revisions so that
we get a more relevant view.
We see better overall performances with sparse-revlog. The very worst case is
usually slower with sparse-revlog, but does not gets out of control. For the
vast majorities of the other writes, sparse-revlog is significantly faster for
larger repositories. This is reflected in the accumulated rewrite time for
netbeans and mozilla.
The notable exception is the pypy repository where things get slower. The
extra processing is not balanced by shorter delta chain. However, this is to
be seen as a blocking issue. First, the overall time spend dealing with revlog
for the repository pypy size is small compared to the other costs, so we get
slower on operations that matter less than for other larger repository.
Second, we still get nice size benefit from using sparse-revlog, smaller repo
size brings other usability and speed benefit (eg: bundle size).
max time | no-sparse | sparse |
mercurial | 0.010143s | 0.045280s |
pypy | 0.034924s | 0.243288s |
netbeans | 0.605371s | 2.130876s |
mozilla | 1.478342s | 3.424541s |
99% time | no-sparse | sparse |
mercurial | 0.003774s | 0.003758s |
pypy | 0.017387s | 0.025310s |
netbeans | 0.576913s | 0.271195s |
mozilla | 1.478342s | 0.449661s |
95% time | no-sparse | sparse |
mercurial | 0.002069s | 0.002120s |
pypy | 0.010141s | 0.014797s |
netbeans | 0.540202s | 0.258644s |
mozilla | 0.654830s | 0.243440s |
full time | no-sparse | sparse |
mercurial | 14.15s | 14.87s |
pypy | 90.50s | 137.12s |
netbeans | 6401.06s | 3411.14s |
mozilla | 3086.89s | 1991.97s |
Differential Revision: https://phab.mercurial-scm.org/D5345
author | Boris Feld <boris.feld@octobus.net> |
---|---|
date | Mon, 12 Nov 2018 01:22:38 +0100 |
parents | e8cd688b2eb1 |
children | a0886a4d6dce |
files | mercurial/configitems.py mercurial/upgrade.py tests/test-upgrade-repo.t |
diffstat | 3 files changed, 48 insertions(+), 34 deletions(-) [+] |
line wrap: on
line diff
--- a/mercurial/configitems.py Mon Nov 12 01:22:30 2018 +0100 +++ b/mercurial/configitems.py Mon Nov 12 01:22:38 2018 +0100 @@ -694,7 +694,7 @@ default=None, ) coreconfigitem('format', 'sparse-revlog', - default=False, + default=True, ) coreconfigitem('format', 'usefncache', default=True,
--- a/mercurial/upgrade.py Mon Nov 12 01:22:30 2018 +0100 +++ b/mercurial/upgrade.py Mon Nov 12 01:22:38 2018 +0100 @@ -269,7 +269,7 @@ _requirement = localrepo.SPARSEREVLOG_REQUIREMENT - default = False + default = True description = _('in order to limit disk reading and memory usage on older ' 'version, the span of a delta chain from its root to its '
--- a/tests/test-upgrade-repo.t Mon Nov 12 01:22:30 2018 +0100 +++ b/tests/test-upgrade-repo.t Mon Nov 12 01:22:38 2018 +0100 @@ -56,7 +56,7 @@ fncache: yes dotencode: yes generaldelta: yes - sparserevlog: no + sparserevlog: yes plain-cl-delta: yes compression: zlib $ hg debugformat --verbose @@ -64,7 +64,7 @@ fncache: yes yes yes dotencode: yes yes yes generaldelta: yes yes yes - sparserevlog: no no no + sparserevlog: yes yes yes plain-cl-delta: yes yes yes compression: zlib zlib zlib $ hg debugformat --verbose --config format.usefncache=no @@ -72,7 +72,7 @@ fncache: yes no yes dotencode: yes no yes generaldelta: yes yes yes - sparserevlog: no no no + sparserevlog: yes yes yes plain-cl-delta: yes yes yes compression: zlib zlib zlib $ hg debugformat --verbose --config format.usefncache=no --color=debug @@ -80,7 +80,7 @@ [formatvariant.name.mismatchconfig|fncache: ][formatvariant.repo.mismatchconfig| yes][formatvariant.config.special| no][formatvariant.default| yes] [formatvariant.name.mismatchconfig|dotencode: ][formatvariant.repo.mismatchconfig| yes][formatvariant.config.special| no][formatvariant.default| yes] [formatvariant.name.uptodate|generaldelta: ][formatvariant.repo.uptodate| yes][formatvariant.config.default| yes][formatvariant.default| yes] - [formatvariant.name.uptodate|sparserevlog: ][formatvariant.repo.uptodate| no][formatvariant.config.default| no][formatvariant.default| no] + [formatvariant.name.uptodate|sparserevlog: ][formatvariant.repo.uptodate| yes][formatvariant.config.default| yes][formatvariant.default| yes] [formatvariant.name.uptodate|plain-cl-delta:][formatvariant.repo.uptodate| yes][formatvariant.config.default| yes][formatvariant.default| yes] [formatvariant.name.uptodate|compression: ][formatvariant.repo.uptodate| zlib][formatvariant.config.default| zlib][formatvariant.default| zlib] $ hg debugformat -Tjson @@ -104,10 +104,10 @@ "repo": true }, { - "config": false, - "default": false, + "config": true, + "default": true, "name": "sparserevlog", - "repo": false + "repo": true }, { "config": true, @@ -127,7 +127,7 @@ performing an upgrade with "--run" will make the following changes: requirements - preserved: dotencode, fncache, generaldelta, revlogv1, store + preserved: dotencode, fncache, generaldelta, revlogv1, sparserevlog, store additional optimizations are available by specifying "--optimize <name>": @@ -151,7 +151,7 @@ performing an upgrade with "--run" will make the following changes: requirements - preserved: dotencode, fncache, generaldelta, revlogv1, store + preserved: dotencode, fncache, generaldelta, revlogv1, sparserevlog, store redeltaparent deltas within internal storage will choose a new base revision if needed @@ -188,7 +188,7 @@ fncache: no yes yes dotencode: no yes yes generaldelta: no yes yes - sparserevlog: no no no + sparserevlog: no yes yes plain-cl-delta: yes yes yes compression: zlib zlib zlib $ hg debugformat --verbose --config format.usegeneraldelta=no @@ -196,7 +196,7 @@ fncache: no yes yes dotencode: no yes yes generaldelta: no no yes - sparserevlog: no no no + sparserevlog: no no yes plain-cl-delta: yes yes yes compression: zlib zlib zlib $ hg debugformat --verbose --config format.usegeneraldelta=no --color=debug @@ -204,7 +204,7 @@ [formatvariant.name.mismatchconfig|fncache: ][formatvariant.repo.mismatchconfig| no][formatvariant.config.default| yes][formatvariant.default| yes] [formatvariant.name.mismatchconfig|dotencode: ][formatvariant.repo.mismatchconfig| no][formatvariant.config.default| yes][formatvariant.default| yes] [formatvariant.name.mismatchdefault|generaldelta: ][formatvariant.repo.mismatchdefault| no][formatvariant.config.special| no][formatvariant.default| yes] - [formatvariant.name.uptodate|sparserevlog: ][formatvariant.repo.uptodate| no][formatvariant.config.default| no][formatvariant.default| no] + [formatvariant.name.mismatchdefault|sparserevlog: ][formatvariant.repo.mismatchdefault| no][formatvariant.config.special| no][formatvariant.default| yes] [formatvariant.name.uptodate|plain-cl-delta:][formatvariant.repo.uptodate| yes][formatvariant.config.default| yes][formatvariant.default| yes] [formatvariant.name.uptodate|compression: ][formatvariant.repo.uptodate| zlib][formatvariant.config.default| zlib][formatvariant.default| zlib] $ hg debugupgraderepo @@ -219,12 +219,15 @@ generaldelta deltas within internal storage are unable to choose optimal revisions; repository is larger and slower than it could be; interaction with other repositories may require extra network and CPU resources, making "hg push" and "hg pull" slower + sparserevlog + in order to limit disk reading and memory usage on older version, the span of a delta chain from its root to its end is limited, whatever the relevant data in this span. This can severly limit Mercurial ability to build good chain of delta resulting is much more storage space being taken and limit reusability of on disk delta during exchange. + performing an upgrade with "--run" will make the following changes: requirements preserved: revlogv1, store - added: dotencode, fncache, generaldelta + added: dotencode, fncache, generaldelta, sparserevlog fncache repository will be more resilient to storing certain paths and performance of certain operations should be improved @@ -235,6 +238,9 @@ generaldelta repository storage will be able to create optimal deltas; new repository data will be smaller and read times should decrease; interacting with other repositories using this storage model should require less network and CPU resources, making "hg push" and "hg pull" faster + sparserevlog + Revlog supports delta chain with more unused data between payload. These gaps will be skipped at read time. This allows for better delta chains, making a better compression and faster exchange with server. + additional optimizations are available by specifying "--optimize <name>": redeltaparent @@ -259,6 +265,9 @@ generaldelta deltas within internal storage are unable to choose optimal revisions; repository is larger and slower than it could be; interaction with other repositories may require extra network and CPU resources, making "hg push" and "hg pull" slower + sparserevlog + in order to limit disk reading and memory usage on older version, the span of a delta chain from its root to its end is limited, whatever the relevant data in this span. This can severly limit Mercurial ability to build good chain of delta resulting is much more storage space being taken and limit reusability of on disk delta during exchange. + repository lacks features used by the default config options: dotencode @@ -269,7 +278,7 @@ requirements preserved: revlogv1, store - added: fncache, generaldelta + added: fncache, generaldelta, sparserevlog fncache repository will be more resilient to storing certain paths and performance of certain operations should be improved @@ -277,6 +286,9 @@ generaldelta repository storage will be able to create optimal deltas; new repository data will be smaller and read times should decrease; interacting with other repositories using this storage model should require less network and CPU resources, making "hg push" and "hg pull" faster + sparserevlog + Revlog supports delta chain with more unused data between payload. These gaps will be skipped at read time. This allows for better delta chains, making a better compression and faster exchange with server. + additional optimizations are available by specifying "--optimize <name>": redeltaparent @@ -301,7 +313,7 @@ upgrade will perform the following actions: requirements - preserved: dotencode, fncache, generaldelta, revlogv1, store + preserved: dotencode, fncache, generaldelta, revlogv1, sparserevlog, store beginning upgrade... repository locked and read-only @@ -435,7 +447,7 @@ upgrade will perform the following actions: requirements - preserved: dotencode, fncache, generaldelta, revlogv1, store + preserved: dotencode, fncache, generaldelta, revlogv1, sparserevlog, store beginning upgrade... repository locked and read-only @@ -466,7 +478,7 @@ upgrade will perform the following actions: requirements - preserved: dotencode, fncache, generaldelta, revlogv1, store + preserved: dotencode, fncache, generaldelta, revlogv1, sparserevlog, store redeltafulladd each revision will be added as new content to the internal storage; this will likely drastically slow down execution time, but some extensions might need it @@ -523,13 +535,14 @@ generaldelta largefiles revlogv1 + sparserevlog store $ hg debugupgraderepo --run upgrade will perform the following actions: requirements - preserved: dotencode, fncache, generaldelta, largefiles, revlogv1, store + preserved: dotencode, fncache, generaldelta, largefiles, revlogv1, sparserevlog, store beginning upgrade... repository locked and read-only @@ -561,6 +574,7 @@ generaldelta largefiles revlogv1 + sparserevlog store $ cat << EOF >> .hg/hgrc @@ -581,7 +595,7 @@ upgrade will perform the following actions: requirements - preserved: dotencode, fncache, generaldelta, largefiles, lfs, revlogv1, store + preserved: dotencode, fncache, generaldelta, largefiles, lfs, revlogv1, sparserevlog, store beginning upgrade... repository locked and read-only @@ -667,16 +681,16 @@ $ hg config format format.maxchainlen=9001 $ hg debugdeltachain file - rev chain# chainlen prev delta size rawsize chainsize ratio lindist extradist extraratio - 0 1 1 -1 base 77 182 77 0.42308 77 0 0.00000 - 1 1 2 0 p1 21 191 98 0.51309 98 0 0.00000 - 2 2 1 -1 base 84 200 84 0.42000 84 0 0.00000 + rev chain# chainlen prev delta size rawsize chainsize ratio lindist extradist extraratio readsize largestblk rddensity srchunks + 0 1 1 -1 base 77 182 77 0.42308 77 0 0.00000 77 77 1.00000 1 + 1 1 2 0 p1 21 191 98 0.51309 98 0 0.00000 98 98 1.00000 1 + 2 1 2 0 other 30 200 107 0.53500 128 21 0.19626 128 128 0.83594 1 $ hg debugupgraderepo --run --optimize redeltaall upgrade will perform the following actions: requirements - preserved: dotencode, fncache, generaldelta, revlogv1, store + preserved: dotencode, fncache, generaldelta, revlogv1, sparserevlog, store redeltaall deltas within internal storage will be fully recomputed; this will likely drastically slow down execution time @@ -686,14 +700,14 @@ creating temporary repository to stage migrated data: $TESTTMP/localconfig/.hg/upgrade.* (glob) (it is safe to interrupt this process any time before data migration completes) migrating 9 total revisions (3 in filelogs, 3 in manifests, 3 in changelog) - migrating 1.05 KB in store; 882 bytes tracked data - migrating 1 filelogs containing 3 revisions (374 bytes in store; 573 bytes tracked data) - finished migrating 3 filelog revisions across 1 filelogs; change in size: -63 bytes + migrating 1019 bytes in store; 882 bytes tracked data + migrating 1 filelogs containing 3 revisions (320 bytes in store; 573 bytes tracked data) + finished migrating 3 filelog revisions across 1 filelogs; change in size: -9 bytes migrating 1 manifests containing 3 revisions (333 bytes in store; 138 bytes tracked data) finished migrating 3 manifest revisions across 1 manifests; change in size: 0 bytes migrating changelog containing 3 revisions (366 bytes in store; 171 bytes tracked data) finished migrating 3 changelog revisions; change in size: 0 bytes - finished migrating 9 total revisions; total change in store size: -63 bytes + finished migrating 9 total revisions; total change in store size: -9 bytes copying phaseroots data fully migrated to temporary repository marking source repository as being upgraded; clients will be unable to read from repository @@ -706,10 +720,10 @@ copy of old repository backed up at $TESTTMP/localconfig/.hg/upgradebackup.* (glob) the old repository will not be deleted; remove it to free up disk space once the upgraded repository is verified $ hg debugdeltachain file - rev chain# chainlen prev delta size rawsize chainsize ratio lindist extradist extraratio - 0 1 1 -1 base 77 182 77 0.42308 77 0 0.00000 - 1 1 2 0 p1 21 191 98 0.51309 98 0 0.00000 - 2 1 3 1 p1 21 200 119 0.59500 119 0 0.00000 + rev chain# chainlen prev delta size rawsize chainsize ratio lindist extradist extraratio readsize largestblk rddensity srchunks + 0 1 1 -1 base 77 182 77 0.42308 77 0 0.00000 77 77 1.00000 1 + 1 1 2 0 p1 21 191 98 0.51309 98 0 0.00000 98 98 1.00000 1 + 2 1 3 1 p1 21 200 119 0.59500 119 0 0.00000 119 119 1.00000 1 $ cd .. $ cat << EOF >> $HGRCPATH