Pierre-Yves David <pierre-yves.david@octobus.net> [Tue, 11 Feb 2020 11:18:52 +0100] rev 44397
nodemap: introduce an option to use mmap to read the nodemap mapping
The performance and memory benefit is much greater if we don't have to copy all
the data in memory for each information. So we introduce an option (on by
default) to read the data using mmap.
This changeset is the last one definition the API for index support nodemap
data. (they have to be able to use the mmaping).
Below are some benchmark comparing the best we currently have in 5.3 with the
final step of this series (using the persistent nodemap implementation in
Rust). The benchmark run `hg perfindex` with various revset and the following
variants:
Before:
* do not use the persistent nodemap
* use the CPython implementation of the index for nodemap
* use mmapping of the changelog index
After:
* use the MixedIndex Rust code, with the NodeTree object for nodemap access
(still in review)
* use the persistent nodemap data from disk
* access the persistent nodemap data through mmap
* use mmapping of the changelog index
The persistent nodemap greatly speed up most operation on very large
repositories. Some of the previously very fast lookup end up a bit slower because
the persistent nodemap has to be setup. However the absolute slowdown is very
small and won't matters in the big picture.
Here are some numbers (in seconds) for the reference copy of mozilla-try:
Revset Before After abs-change speedup
-10000: 0.004622 0.005532 0.000910 × 0.83
-10: 0.000050 0.000132 0.000082 × 0.37
tip 0.000052 0.000085 0.000033 × 0.61
0 + (-10000:) 0.028222 0.005337 -0.022885 × 5.29
0 0.023521 0.000084 -0.023437 × 280.01
(-10000:) + 0 0.235539 0.005308 -0.230231 × 44.37
(-10:) + :9 0.232883 0.000180 -0.232703 ×1293.79
(-10000:) + (:99) 0.238735 0.005358 -0.233377 × 44.55
:99 + (-10000:) 0.317942 0.005593 -0.312349 × 56.84
:9 + (-10:) 0.313372 0.000179 -0.313193 ×1750.68
:9 0.316450 0.000143 -0.316307 ×2212.93
On smaller repositories, the cost of nodemap related operation is not as big, so
the win is much more modest. Yet it helps shaving a handful of millisecond here
and there.
Here are some numbers (in seconds) for the reference copy of mercurial:
Revset Before After abs-change speedup
-10: 0.000065 0.000097 0.000032 × 0.67
tip 0.000063 0.000078 0.000015 × 0.80
0 0.000561 0.000079 -0.000482 × 7.10
-10000: 0.004609 0.003648 -0.000961 × 1.26
0 + (-10000:) 0.005023 0.003715 -0.001307 × 1.35
(-10:) + :9 0.002187 0.000108 -0.002079 ×20.25
(-10000:) + 0 0.006252 0.003716 -0.002536 × 1.68
(-10000:) + (:99) 0.006367 0.003707 -0.002660 × 1.71
:9 + (-10:) 0.003846 0.000110 -0.003736 ×34.96
:9 0.003854 0.000099 -0.003755 ×38.92
:99 + (-10000:) 0.007644 0.003778 -0.003866 × 2.02
Differential Revision: https://phab.mercurial-scm.org/D7894
Raphaël Gomès <rgomes@octobus.net> [Fri, 14 Feb 2020 15:03:26 +0100] rev 44396
rust-dirstatemap: directly return `non_normal` and `other_entries`
This cleans up the interface which I previously thought needed to be uglier
than in reality. No performance difference, simple refactoring.
Differential Revision: https://phab.mercurial-scm.org/D8121
Martin von Zweigbergk <martinvonz@google.com> [Thu, 26 Dec 2019 14:12:45 -0800] rev 44395
copy: rename `wctx` to `ctx` since it will not necessarily be working copy
Differential Revision: https://phab.mercurial-scm.org/D8032
Martin von Zweigbergk <martinvonz@google.com> [Fri, 20 Dec 2019 14:03:12 -0800] rev 44394
copy: rewrite walkpat() to depend less on dirstate
I want to add a `hg cp/mv -r <rev>` option to mark files as
copied/moved in an existing commit (amending that commit). The code
needs to not depend on the dirstate for that.
Differential Revision: https://phab.mercurial-scm.org/D8031
Martin von Zweigbergk <martinvonz@google.com> [Thu, 13 Feb 2020 10:12:12 -0800] rev 44393
merge with stable
Yuya Nishihara <yuya@tcha.org> [Sat, 01 Feb 2020 12:57:32 +0900] rev 44392
pathutil: resurrect comment about path auditing order
It was removed at 51c86c6167c1, but expensive symlink traversal isn't the
only reason we should walk path components from the root.
Raphaël Gomès <rgomes@octobus.net> [Wed, 16 Oct 2019 14:12:48 +0200] rev 44391
rust-dirstatemap: remove additional lookup in dirstate.matches
We use the same trick as the Python implementation
Differential Revision: https://phab.mercurial-scm.org/D7119
Georges Racinet <georges.racinet@octobus.net> [Tue, 31 Dec 2019 12:43:57 +0100] rev 44390
rust-nodemap: insert method
In this implementation, we are in direct competition
with the C version: this Rust version will have a clear
startup advantage because it will read the data from disk,
but the insertion happens all in memory for both.
Differential Revision: https://phab.mercurial-scm.org/D7795
Valentin Gatien-Baron <vgatien-baron@janestreet.com> [Wed, 22 Jan 2020 14:21:34 -0500] rev 44389
recover: don't verify by default
The reason is:
- it's not that hard to trigger interrupted transactions: just run out
of disk space
- it takes forever to verify on large repos. Before --no-verify, I
told people to C-c hg recover when the progress bar showed up. Now I
tell them to pass --no-verify.
- I don't remember a single case where the verification step was
useful
This is technically a change of behavior. Perhaps this would be better
suited for tweakdefaults?
Differential Revision: https://phab.mercurial-scm.org/D7972
Augie Fackler <augie@google.com> [Tue, 11 Feb 2020 00:08:28 -0500] rev 44388
context: use manifest.find() instead of two separate calls
I noticed this while debugging an extension that's implementing the manifest
interface. Always nice to save a function call.
Differential Revision: https://phab.mercurial-scm.org/D8109
Raphaël Gomès <rgomes@octobus.net> [Thu, 16 Jan 2020 23:06:01 +0100] rev 44387
rust-matchers: implement `visit_children_set` for `FileMatcher`
As per the removed inline comment, this will become useful in a future patch
in this series as the `IncludeMatcher` is introduced.
Differential Revision: https://phab.mercurial-scm.org/D7914
Augie Fackler <augie@google.com> [Wed, 05 Feb 2020 17:13:51 -0500] rev 44386
manifest: move matches method to be outside the interface
In order to adequately smoke out any legacy consumers of the method, we rename
it to _matches so it's clear that it's class-private. To my amazement, all
consumers of this method really only wanted matching filenames, not a full
filtered manifest.
Differential Revision: https://phab.mercurial-scm.org/D8085
Augie Fackler <augie@google.com> [Mon, 10 Feb 2020 21:02:22 -0500] rev 44385
tags: use modern // operator for division
Fixes a test on Python 3.
# skip-blame only correcting a division operator, not a substantive change
Differential Revision: https://phab.mercurial-scm.org/D8108
Augie Fackler <augie@google.com> [Mon, 10 Feb 2020 20:47:19 -0500] rev 44384
tags: fix some type confusion exposed in python 3
# skip-blame just b-prefix and %-format cleanup, no meaningful change
Differential Revision: https://phab.mercurial-scm.org/D8107
Martin von Zweigbergk <martinvonz@google.com> [Fri, 10 Jan 2020 17:20:12 -0800] rev 44383
rebase: remove some now-unused parent arguments
Differential Revision: https://phab.mercurial-scm.org/D7829
Martin von Zweigbergk <martinvonz@google.com> [Fri, 10 Jan 2020 21:40:01 -0800] rev 44382
rebase: remove some redundant setting of dirstate parents
Since we're now setting the dirstate parents to its correct values
from the beginning (right after `merge.update()`), we usually don't
need to set them again before committing. The only case we need to
care about is when committing collapsed commits. So we can remove the
`setparents()` calls just before committing and add one only for the
collapse case.
Differential Revision: https://phab.mercurial-scm.org/D7828
Martin von Zweigbergk <martinvonz@google.com> [Fri, 10 Jan 2020 14:22:20 -0800] rev 44381
rebase: don't use rebased node as dirstate p2 (BC)
When rebasing a node, we currently use the rebased node as p2 in the
dirstate until just before we commit it (we then change to the desired
parents). This p2 is visible to the user when the rebase gets
interrupted because of merge conflicts. That can be useful to the user
as a reminder of which commit is currently being rebased, but I
believe it's incorrect for a few reasons:
* I think the dirstate parents should be the ones that will be set
when the commit is created.
* I think having two parents means that you're merging those two
commits, but when rebasing, you're generally grafting, not merging.
* When rebasing a merge commit, we should use the two desired parents
as dirstate parents (and we clearly can't have the rebased node as
a third dirstate parent).
* `hg graft` (and `hg update --merge`) sets only one parent and `hg
rebase` should be consistent with that.
I realize that this is a somewhat large user-visible change, but I
think it's worth it because it will simplify things quite a bit.
Differential Revision: https://phab.mercurial-scm.org/D7827
Martin von Zweigbergk <martinvonz@google.com> [Fri, 10 Jan 2020 14:17:56 -0800] rev 44380
rebase: stop relying on having two parents to resume rebase
I'm about to make it so we don't have two parents when a rebase is
interrupted (unless we're just rebasing on a merge commit). The code
for detecting if we're resuming a rebase relied on having two parents,
so this patch rewrites that to instead set a boolean when we resume.
Note that `self.resume` in the new condition implies `not
self.inmemory` (rebase cannot be resumed in memory), so that's why
that part can be omitted.
Differential Revision: https://phab.mercurial-scm.org/D7826
Martin von Zweigbergk <martinvonz@google.com> [Tue, 28 Jan 2020 21:49:50 -0800] rev 44379
graphlog: use '%' for other context in merge conflict
This lets the user more easily find the commit that is involved in the
conflict, such as the source of `hg update -m` or the commit being
grafted by `hg graft`.
Differential Revision: https://phab.mercurial-scm.org/D8043
Martin von Zweigbergk <martinvonz@google.com> [Wed, 29 Jan 2020 14:42:54 -0800] rev 44378
tests: add `hg log -G` output when there are merge conflicts
The next commit will change the behavior for these. I've used slightly
different commands in the different tests to match the surrounding
style.
Differential Revision: https://phab.mercurial-scm.org/D8042
Martin von Zweigbergk <martinvonz@google.com> [Wed, 29 Jan 2020 11:30:35 -0800] rev 44377
revset: add a revset for parents in merge state
This may be particularly useful soon, when I'm going to change how `hg
rebase` sets its parents during conflict resolution.
Differential Revision: https://phab.mercurial-scm.org/D8041
Martin von Zweigbergk <martinvonz@google.com> [Fri, 10 Jan 2020 17:46:10 -0800] rev 44376
tests: add test of rebase with conflict in merge commit
It doesn't seem like we had any tests of this. I think it's pretty
weird that the two parents we're merging are not the working copy
parents during the conflict resolution.
Differential Revision: https://phab.mercurial-scm.org/D7824
Martin von Zweigbergk <martinvonz@google.com> [Thu, 16 Jan 2020 00:03:19 -0800] rev 44375
rebase: always be graft-like, not merge-like, also for merges
Rebase works by updating to a commit and then grafting changes on
top. However, before this patch, it would actually merge in changes
instead of grafting them in in some cases. That is, it would use the
common ancestor as base instead of using one of the parents. That
seems wrong to me, so I'm changing it so `defineparents()` always
returns a value for `base`.
This fixes the bad behavior in test-rebase-newancestor.t, which was
introduced in 65f215ea3e8e (tests: add test for rebasing merges with
ancestors of the rebase destination, 2014-11-30).
The difference in test-rebase-dest.t is because the files in the tip
revision were A, D, E, F before this patch and A, D, F, G after it. I
think both files should ideally be there.
Differential Revision: https://phab.mercurial-scm.org/D7907
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:51:01 +0100] rev 44374
nodemap: update the index with the newly written data (when appropriate)
If we are to use mmap to read the nodemap data, and if the python code is
responsible for the IO, we need to refresh the mmap after each write and provide
it back to the index.
We start this dance without the mmap first.
Differential Revision: https://phab.mercurial-scm.org/D7893
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:50:52 +0100] rev 44373
nodemap: never read more than the expected data amount
Since we are tracking this number we can use it to detect corrupted rawdata file
and to only read the correct amount of data when possible.
Differential Revision: https://phab.mercurial-scm.org/D7892
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:50:43 +0100] rev 44372
nodemap: write new data from the expected current data length
If the amount of data in the file exceed the expect amount, we will overwrite
the extra data. This is a simple way to be safer.
Differential Revision: https://phab.mercurial-scm.org/D7891
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:50:33 +0100] rev 44371
nodemap: double check the source docket when doing incremental update
In theory, the index will have the information we expect it to have. However by
security, it seems safer to double check that the incremental data are generated
from the data currently on disk.
Differential Revision: https://phab.mercurial-scm.org/D7890
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:50:24 +0100] rev 44370
nodemap: track the total and unused amount of data in the rawdata file
We need to keep that information around:
* total data will allow transaction to start appending new information without
confusing other reader.
* unused data will allow to detect when we should regenerate new rawdata file.
Differential Revision: https://phab.mercurial-scm.org/D7889
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:50:14 +0100] rev 44369
nodemap: track the maximum revision tracked in the nodemap
We need a simple way to detect when the on disk data contains less revision
than the index we read from disk. The docket file is meant for this, we just had
to start tracking that data.
We should also try to detect strip operation, but we will deal with this in
later changesets. Right now we are focusing on defining the API for index
supporting persistent nodemap.
Differential Revision: https://phab.mercurial-scm.org/D7888
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:50:04 +0100] rev 44368
nodemap: add a flag to dump the details of the docket
We are about to add more information to the docket. We first introduce a way to
debug its content.
Differential Revision: https://phab.mercurial-scm.org/D7887
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:49:54 +0100] rev 44367
nodemap: introduce append-only incremental update of the persistent data
Rewriting the full nodemap for each transaction has a cost we would like to
avoid. We introduce a new way to write persistent nodemap data by adding new
information at the end for file. Any new and updated block as added at the end
of the file. The last block is the new root node.
With this method, some of the block already on disk get "dereferenced" and
become dead data. In later changesets, We'll start tracking the amount of dead
data to eventually re-generate a full nodemap.
Differential Revision: https://phab.mercurial-scm.org/D7886
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:49:45 +0100] rev 44366
nodemap: keep track of the docket for loaded data
To perform incremental update of the on disk data, we need to keep tracks of
some aspect of that data.
Differential Revision: https://phab.mercurial-scm.org/D7885