Fri, 07 Sep 2018 11:17:30 -0400 snapshot: search for unrelated but reusable full-snapshot
Boris Feld <boris.feld@octobus.net> [Fri, 07 Sep 2018 11:17:30 -0400] rev 39493
snapshot: search for unrelated but reusable full-snapshot # New Strategy Step: Reusing Snapshot Outside Of Parents' Chain. If no suitable bases were found in the parent's chains, see if we could reuse a full snapshot not directly related to the current revision. Such search can be expensive, so we only search for snapshots appended to the revlog *after* the bases used by the parents of the current revision (the one we just tested). We assume the parent's bases were created because the previous snapshots were unsuitable, so there are low odds they would be useful now. This search gives a chance to reuse a delta chain unrelated to the current revision. Without this re-use, topological branches would keep reopening new full chains. Creating more and more snapshots as the repository grow. In repositories with many topological branches, the lack of delta reuse can create too many snapshots reducing overall compression to nothing. This results in a very large repository and other usability issues. For now, we still focus on creating level-1 snapshots. However, this principle will play a large part in how we avoid snapshot explosion once we have more snapshot levels. # Effects On The Test Repository In the test repository we created, we can see the beneficial effect of such reuse. We need very few level-0 snapshots and the overall revlog size has decreased. The `hg debugrevlog` call, show a "lvl-2" snapshot. It comes from the existing delta logic using the `prev` revision (revlog's tip) as the base. In this specific case, it turns out the tip was a level-1 snapshot. This is a coincidence that can be ignored. Finding and testing against all these unrelated snapshots can have a performance impact at write time. We currently focus on building good deltas chain we build. Performance concern will be dealt with later in another series.
Fri, 07 Sep 2018 11:17:29 -0400 snapshot: try intermediate snapshot against parents' base
Boris Feld <boris.feld@octobus.net> [Fri, 07 Sep 2018 11:17:29 -0400] rev 39492
snapshot: try intermediate snapshot against parents' base # Regarding The Series Started By This Changeset This is the first changesets of a group adjusting delta chain strategy to build a useful chain of intermediate snapshots. The series will introduce a full strategy to produce chains of multiple snapshots on top of which a "usual" delta chain will be built. That strategy will have multiple steps to maximize snapshot reuse, avoiding pathological cases and improving overall compression in very branchy repositories. An important property of sparse-revlog using such snapshot-chain is that they can use very short delta chain without problematic impact on the resulting compression. Shorter delta chains are important to achieve good performance. To make each step clear, we'll introduce them one by one. See the end of this series for full details. # Regarding This Changeset Before this change, if we cannot store the current revision as a delta against a "simple" candidate (p1, p2, prev), we created a new level-0 snapshot (also called full snapshot). As the first step, we introduce a simple strategy: try an intermediate level-1 snapshot against the chain base of the "current revision" parents. The "current revision" is the one we are currently trying to store in the revlog, triggering this search for a good delta base. The first item in the chain is always a level-0 snapshot. # Effect On The Test Repository We can already see the effect on the test-repository. Most of the snapshots have shifted from level 0 to level 1. The overall size has slightly decreased. (However, keep in mind that this repository only emulates real data) # Regarding Statistic The current series focuses on improving the chain built. Improving the performance of this logic will be done as a second step. Sparse-revlog is still experimental and disabled by default. We'll provide more statistic about resulting size and delta chain at the end of this series.
Mon, 10 Sep 2018 09:08:24 -0700 sparse-revlog: add a test checking revlog deltas for a churning file
Boris Feld <boris.feld@octobus.net> [Mon, 10 Sep 2018 09:08:24 -0700] rev 39491
sparse-revlog: add a test checking revlog deltas for a churning file The test repository contains 5000 revisions and is therefore slow to build: five minutes with CHG, over fifteen minutes without. It is too slow to build during the test. Bundling all content produce a sizeable result, 20BM, too large to be committed. Instead, we commit a script to build the expected bundle and the test checks if the bundle is available. Any run of the script will produce the same repository content, using resulting in the same hashes. Using smaller repositories was tried, however, it misses most of the cases we are planning to improve. Having them in a 5000 repository is already nice, we usually see these case in repositories in the order of magnitude of one million revisions. This test will be very useful to check various changes strategy for building delta to store in a sparse-revlog. In this series we will focus our attention on the following metrics: The ones that will impact the final storage performance (size, space): * size of the revlog data file (".hg/store/data/*.d") * chain length info The ones that describe the deltas patterns: * number of snapshot revision (and their level) * size taken by snapshot revision (and their level)
Sat, 18 Aug 2018 12:45:44 +0200 tests: add a `tests/artifacts/` directory
Boris Feld <boris.feld@octobus.net> [Sat, 18 Aug 2018 12:45:44 +0200] rev 39490
tests: add a `tests/artifacts/` directory That directory is meant to cache large items used by tests that are slow to generate. See 'PURPOSE' file for details and next changesets for a first user.
Wed, 05 Sep 2018 01:19:48 +0300 verify: make output less confusing (issue5924)
Meirambek Omyrzak <meirambek77@gmail.com> [Wed, 05 Sep 2018 01:19:48 +0300] rev 39489
verify: make output less confusing (issue5924) output before: "500 files, 2035 changesets, 2622 total revisions" output after: "checked 2035 changesets with 2622 changes to 500 files" new one was suggested in the comments inside the issue. Differential Revision: https://phab.mercurial-scm.org/D4476
Tue, 04 Sep 2018 21:28:28 +0200 revlog: clarify the comment attached to delta reuse
Boris Feld <boris.feld@octobus.net> [Tue, 04 Sep 2018 21:28:28 +0200] rev 39488
revlog: clarify the comment attached to delta reuse The previous version was a bit complicated and referred to a deprecated configuration option.
(0) -30000 -10000 -3000 -1000 -300 -100 -30 -10 -6 +6 +10 +30 +100 +300 +1000 +3000 +10000 tip