sparse-revlog: implement algorithm to write sparse delta chains (
issue5480)
The classic behavior of revlog._isgooddeltainfo is to consider the span size
of the whole delta chain, and limit it to 4 * textlen.
Once sparse-revlog writing is allowed (and enforced with a requirement),
revlog._isgooddeltainfo considers the span of the largest chunk as the
distance used in the verification, instead of using the span of the whole
delta chain.
In order to compute the span of the largest chunk, we need to slice into
chunks a chain with the new revision at the top of the revlog, and take the
maximal span of these chunks. The sparse read density is a parameter to the
slicing, as it will stop when the global read density reaches this threshold.
For instance, a density of 50% means that 2 of 4 read bytes are actually used
for the reconstruction of the revision (the others are part of other chains).
This allows a new revision to be potentially stored with a diff against
another revision anywhere in the history, instead of forcing it in the last 4
* textlen. The result is a much better compression on repositories that have
many concurrent branches. Here are a comparison between using deltas from
current upstream (aggressive-merge-deltas on by default) and deltas from a
sparse-revlog
Comparison of `.hg/store/` size:
mercurial (6.74% merges):
before: 46,831,873 bytes
after: 46,795,992 bytes (no relevant change)
pypy (8.30% merges):
before: 333,524,651 bytes
after: 308,417,511 bytes -8%
netbeans (34.21% merges):
before: 1,141,847,554 bytes
after: 1,131,093,161 bytes -1%
mozilla-central (4.84% merges):
before: 2,344,248,850 bytes
after: 2,328,459,258 bytes -1%
large-private-repo-A (merge 19.73%)
before: 41,510,550,163 bytes
after: 8,121,763,428 bytes -80%
large-private-repo-B (23.77%)
before: 58,702,221,709 bytes
after: 8,351,588,828 bytes -76%
Comparison of `00manifest.d` size:
mercurial (6.74% merges):
before: 6,143,044 bytes
after: 6,107,163 bytes
pypy (8.30% merges):
before: 52,941,780 bytes
after: 27,834,082 bytes -48%
netbeans (34.21% merges):
before: 130,088,982 bytes
after: 119,337,636 bytes -10%
mozilla-central (4.84% merges):
before: 215,096,339 bytes
after: 199,496,863 bytes -8%
large-private-repo-A (merge 19.73%)
before: 33,725,285,081 bytes
after: 390,302,545 bytes -99%
large-private-repo-B (23.77%)
before: 49,457,701,645 bytes
after: 1,366,752,187 bytes -97%
The better delta chains provide a performance boost in relevant repositories:
pypy, bundling 1000 revisions:
before: 1.670s
after: 1.149s -31%
Unbundling got a bit slower. probably because the sparse algorithm is still
pure
python.
pypy, unbundling 1000 revisions:
before: 4.062s
after: 4.507s +10%
Performance of bundle/unbundle in repository with few concurrent branches (eg:
mercurial) are unaffected.
No significant differences have been noticed then timing `hg push` and `hg
pull` locally. More state timings are being gathered.
Same as for aggressive-merge-delta, better delta comes with longer delta
chains. Longer chains have a performance impact. For example. The length of
the chain needed to get the manifest of pypy's tip moves from 82 item to 1929
items. This moves the restore time from 3.88ms to 11.3ms.
Delta chain length is an independent issue that affects repository without
this changes. It will be dealt with independently.
No significant differences have been observed on repositories where
`sparse-revlog` have not much effect (mercurial, unity, netbeans). On pypy,
small differences have been observed on some operation affected by delta chain
building and retrieval.
pypy, perfmanifest
before: 0.006162s
after: 0.017899s +190%
pypy, commit:
before: 0.382
after: 0.376 -1%
pypy, status:
before: 0.157
after: 0.168 +7%
More comprehensive and stable timing comparisons are in progress.
#require darcs
$ echo "[extensions]" >> $HGRCPATH
$ echo "convert=" >> $HGRCPATH
$ DARCS_EMAIL='test@example.org'; export DARCS_EMAIL
initialize darcs repo
$ mkdir darcs-repo
$ cd darcs-repo
$ darcs init -q
$ echo a > a
$ darcs record -a -l -m p0
Finished recording patch 'p0'
$ cd ..
branch and update
$ darcs get -q darcs-repo darcs-clone >/dev/null
$ cd darcs-clone
$ echo c >> a
$ echo c > c
$ darcs record -a -l -m p1.1
Finished recording patch 'p1.1'
$ cd ..
skip if we can't import elementtree
$ if hg convert darcs-repo darcs-dummy 2>&1 | grep ElementTree > /dev/null; then
> echo 'skipped: missing feature: elementtree module'
> exit 80
> fi
update source
$ cd darcs-repo
$ echo b >> a
$ echo b > b
$ darcs record -a -l -m p1.2
Finished recording patch 'p1.2'
$ darcs pull -q -a --no-set-default ../darcs-clone
Backing up ./a(*) (glob)
We have conflicts in the following files:
./a
(?)
$ sleep 1
$ echo e > a
$ echo f > f
$ mkdir dir
$ echo d > dir/d
$ echo d > dir/d2
$ darcs record -a -l -m p2
Finished recording patch 'p2'
test file and directory move
$ darcs mv -q f ff
Test remove + move
$ darcs remove -q dir/d2
$ rm dir/d2
$ darcs mv -q dir dir2
$ darcs record -a -l -m p3
Finished recording patch 'p3'
The converter does not currently handle patch conflicts very well.
When they occur, it reverts *all* changes and moves forward,
letting the conflict resolving patch fix collisions.
Unfortunately, non-conflicting changes, like the addition of the
"c" file in p1.1 patch are reverted too.
Just to say that manifest not listing "c" here is a bug.
$ cd ..
$ hg convert darcs-repo darcs-repo-hg
initializing destination darcs-repo-hg repository
scanning source...
sorting...
converting...
4 p0
3 p1.2
2 p1.1
1 p2
0 p3
$ hg log -R darcs-repo-hg -g --template '{rev} "{desc|firstline}" ({author}) files: {files}\n' "$@"
4 "p3" (test@example.org) files: dir/d dir/d2 dir2/d f ff
3 "p2" (test@example.org) files: a dir/d dir/d2 f
2 "p1.1" (test@example.org) files:
1 "p1.2" (test@example.org) files: a b
0 "p0" (test@example.org) files: a
$ hg up -q -R darcs-repo-hg
$ hg -R darcs-repo-hg manifest --debug
7225b30cdf38257d5cc7780772c051b6f33e6d6b 644 a
1e88685f5ddec574a34c70af492f95b6debc8741 644 b
37406831adc447ec2385014019599dfec953c806 644 dir2/d
b783a337463792a5c7d548ad85a7d3253c16ba8c 644 ff
#if no-outer-repo
try converting darcs1 repository
$ hg clone -q "$TESTDIR/bundles/darcs1.hg" darcs
$ hg convert -s darcs darcs/darcs1 2>&1 | grep darcs-1.0
darcs-1.0 repository format is unsupported, please upgrade
#endif