Boris Feld <boris.feld@octobus.net> [Fri, 07 Sep 2018 11:17:29 -0400] rev 39492
snapshot: try intermediate snapshot against parents' base
# Regarding The Series Started By This Changeset
This is the first changesets of a group adjusting delta chain strategy to
build a useful chain of intermediate snapshots. The series will introduce a
full strategy to produce chains of multiple snapshots on top of which a
"usual" delta chain will be built.
That strategy will have multiple steps to maximize snapshot reuse, avoiding
pathological cases and improving overall compression in very branchy
repositories. An important property of sparse-revlog using such snapshot-chain
is that they can use very short delta chain without problematic impact on the
resulting compression. Shorter delta chains are important to achieve good
performance.
To make each step clear, we'll introduce them one by one.
See the end of this series for full details.
# Regarding This Changeset
Before this change, if we cannot store the current revision as a delta against
a "simple" candidate (p1, p2, prev), we created a new level-0 snapshot (also
called full snapshot).
As the first step, we introduce a simple strategy: try an intermediate level-1
snapshot against the chain base of the "current revision" parents.
The "current revision" is the one we are currently trying to store in the
revlog, triggering this search for a good delta base.
The first item in the chain is always a level-0 snapshot.
# Effect On The Test Repository
We can already see the effect on the test-repository. Most of the snapshots
have shifted from level 0 to level 1. The overall size has slightly decreased.
(However, keep in mind that this repository only emulates real data)
# Regarding Statistic
The current series focuses on improving the chain built. Improving the
performance of this logic will be done as a second step. Sparse-revlog is
still experimental and disabled by default.
We'll provide more statistic about resulting size and delta chain at the end
of this series.
Boris Feld <boris.feld@octobus.net> [Mon, 10 Sep 2018 09:08:24 -0700] rev 39491
sparse-revlog: add a test checking revlog deltas for a churning file
The test repository contains 5000 revisions and is therefore slow to build:
five minutes with CHG, over fifteen minutes without. It is too slow to build
during the test. Bundling all content produce a sizeable result, 20BM, too
large to be committed. Instead, we commit a script to build the expected
bundle and the test checks if the bundle is available. Any run of the script
will produce the same repository content, using resulting in the same hashes.
Using smaller repositories was tried, however, it misses most of the cases we
are planning to improve. Having them in a 5000 repository is already nice, we
usually see these case in repositories in the order of magnitude of one
million revisions.
This test will be very useful to check various changes strategy for building
delta to store in a sparse-revlog.
In this series we will focus our attention on the following metrics:
The ones that will impact the final storage performance (size, space):
* size of the revlog data file (".hg/store/data/*.d")
* chain length info
The ones that describe the deltas patterns:
* number of snapshot revision (and their level)
* size taken by snapshot revision (and their level)
Boris Feld <boris.feld@octobus.net> [Sat, 18 Aug 2018 12:45:44 +0200] rev 39490
tests: add a `tests/artifacts/` directory
That directory is meant to cache large items used by tests that are slow to
generate. See 'PURPOSE' file for details and next changesets for a first user.
Meirambek Omyrzak <meirambek77@gmail.com> [Wed, 05 Sep 2018 01:19:48 +0300] rev 39489
verify: make output less confusing (
issue5924)
output before: "500 files, 2035 changesets, 2622 total revisions"
output after: "checked 2035 changesets with 2622 changes to 500 files"
new one was suggested in the comments inside the issue.
Differential Revision: https://phab.mercurial-scm.org/D4476
Boris Feld <boris.feld@octobus.net> [Tue, 04 Sep 2018 21:28:28 +0200] rev 39488
revlog: clarify the comment attached to delta reuse
The previous version was a bit complicated and referred to a deprecated
configuration option.
Boris Feld <boris.feld@octobus.net> [Tue, 04 Sep 2018 21:05:21 +0200] rev 39487
revlog: drop duplicated code
This code probably got duplicated by a rebase/evolve conflict. We drop the
extra copy, the other copy is right below.
This had no real effects since other logic ensure that we never test the same
revision twice.
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 05 Sep 2018 09:04:40 -0700] rev 39486
wireprotov2peer: properly format errors
formatrichmessage() expects an iterable containing dicts with
well-defined keys. We were passing in something else. This caused
an exception.
Change the code to call formatrichmessage() with the proper argument.
And add a TODO to potentially emit the proper data structure from
the server in the first place.
Differential Revision: https://phab.mercurial-scm.org/D4441