Pierre-Yves David <pierre-yves.david@octobus.net> [Mon, 03 May 2021 12:21:35 +0200] rev 47144
revlog: split the option initialisation in its own method
The part of the code is huge, keeping it separated will keep the `_loadindex`
method simpler and help keeping logic well insulated.
Differential Revision: https://phab.mercurial-scm.org/D10570
Pierre-Yves David <pierre-yves.david@octobus.net> [Mon, 03 May 2021 12:21:25 +0200] rev 47143
revlog: always "append" full size tuple
Same reasoning as the previous patch.
Differential Revision: https://phab.mercurial-scm.org/D10569
Pierre-Yves David <pierre-yves.david@octobus.net> [Mon, 03 May 2021 12:21:15 +0200] rev 47142
revlog: make the index always return the same tuple
It is simpler to manage the diferrence in on disk format in the internal index
code itself and lets the rest of the code always handle the same object.
This will become even more important when the data we store will be entirely
different (for example the changelog does not need the "linkrev" field.
We start with item reading, we will deal with item writing in the next
changesets.
Differential Revision: https://phab.mercurial-scm.org/D10568
Pierre-Yves David <pierre-yves.david@octobus.net> [Mon, 03 May 2021 12:21:05 +0200] rev 47141
revlog: introduce an explicit `format_version` member in the index struct
This will allow for cleaner check than assuming each version has a different
size. Unsurprisingly I am planning to use this to introduce more format variant.
Differential Revision: https://phab.mercurial-scm.org/D10567
Pierre-Yves David <pierre-yves.david@octobus.net> [Mon, 03 May 2021 12:20:55 +0200] rev 47140
revlog: rename `hdrsize` to `entry_size` in the C code
This is the size of and index entry, so lets make it clearer.
Differential Revision: https://phab.mercurial-scm.org/D10566
Pierre-Yves David <pierre-yves.david@octobus.net> [Mon, 03 May 2021 12:20:45 +0200] rev 47139
revlog: split the `version` attribute into its two components
The `revlog.version` attribute contained an integer coding 2 different informations:
* the revlog version number
* a bit field defining some specific feature of the revlog
We now explicitly store the two components independently. This avoid exposing
the implementation details all around the code and prepare for future revlog
version that would encode the information in a different way.
In the process we drop the `version` attribute from the interface. It was
flagged for removal when that interface was created.
Differential Revision: https://phab.mercurial-scm.org/D10565
Pierre-Yves David <pierre-yves.david@octobus.net> [Mon, 03 May 2021 12:20:35 +0200] rev 47138
verify: pass a revlog to `_checkrevlog` in `_verifymanifest`
Since `manifestrevlog` is not a `revlog`, we are passing strange thing to
`_checkrevlog`. We fix this to avoid breakage during future change.
Differential Revision: https://phab.mercurial-scm.org/D10564
Pierre-Yves David <pierre-yves.david@octobus.net> [Mon, 03 May 2021 12:20:25 +0200] rev 47137
revlog: replace flag check related to generaldelta with attribute check
Same logic as the previous changesets.
Differential Revision: https://phab.mercurial-scm.org/D10563
Pierre-Yves David <pierre-yves.david@octobus.net> [Mon, 03 May 2021 12:19:09 +0200] rev 47136
revlog: replace REVLOGV2 check related to sidedata with `hassidedata` checks
This is more flexible and semantically more correct. The associated revlog's
attribute exist since
827cb4fe62a3, so well we start linking sidedata to
revlogv2.
Differential Revision: https://phab.mercurial-scm.org/D10562
Pierre-Yves David <pierre-yves.david@octobus.net> [Mon, 03 May 2021 12:19:05 +0200] rev 47135
revlog: explicitely pass the "indexfile" parameter
Most of this was already done when introducing the `target` parameter, but some
remained. Having "indexfile" passed explicitely will help us to change the way
we address a revlog later in the stack. With the introduction of more generic
`docket`, the entry point will not necessarly be `xxx.i` file, and the actual
index files will have a variable name.
Differential Revision: https://phab.mercurial-scm.org/D10561
Pierre-Yves David <pierre-yves.david@octobus.net> [Mon, 03 May 2021 12:18:58 +0200] rev 47134
revlog: highlight current incompatibility in `rewrite_sidedata`
See comment for details. We will need to fix the test coverage when this
incompatibility is lifted.
Differential Revision: https://phab.mercurial-scm.org/D10544
Pierre-Yves David <pierre-yves.david@octobus.net> [Mon, 03 May 2021 12:18:48 +0200] rev 47133
revlog: adjust rewrite_sidedata code to not delete existing revlog content
The "w+" file mode is deleting all the content of the opened file. Which is bad…
This is not caught by the test because the test only check for a full, initial
pull where not pre-existing content exists. So we need to fix our test coverage
here.
However they are another issue that prevent "incremental" pull to work here. See
next changeset for details.
Differential Revision: https://phab.mercurial-scm.org/D10543
Simon Sapin <simon.sapin@octobus.net> [Fri, 07 May 2021 17:33:47 +0200] rev 47132
status: Add tests for some more edge cases
* Size-preserving file contents modification
* Filtering output to a deleted or removed file
Differential Revision: https://phab.mercurial-scm.org/D10701
Simon Sapin <simon.sapin@octobus.net> [Fri, 07 May 2021 16:44:36 +0200] rev 47131
status: Extend issue 6483 test to exclude patterns
With `hg status -X`, not just include pattern with `hg status -I`
Differential Revision: https://phab.mercurial-scm.org/D10700
Simon Sapin <simon.sapin@octobus.net> [Fri, 07 May 2021 16:41:07 +0200] rev 47130
dirstate-tree: Add a test showing that issue 6335 is fixed
… when using the new status algorithm and the tree-based dirstate.
The previous algorithm still has this bug.
Differential Revision: https://phab.mercurial-scm.org/D10699
Simon Sapin <simon.sapin@octobus.net> [Mon, 03 May 2021 20:04:19 +0200] rev 47129
dirstate-tree: Add a dirstate-v1-tree variant of some tests
The `dirstate-v1` variant has the previous behavior.
`dirstate-v1-tree` uses the same format on disk, but uses the new
`DirstateMap` with a tree data structure and the new `status` algorithm.
These were untested so far.
Differential Revision: https://phab.mercurial-scm.org/D10698
Matt Harbison <matt_harbison@yahoo.com> [Fri, 07 May 2021 22:06:25 -0400] rev 47128
merge with stable
Martin von Zweigbergk <martinvonz@google.com> [Fri, 07 May 2021 08:38:17 -0700] rev 47127
rename: add hint about --at-rev if source file doesn't exist
It's quite common that users want to record copy (rename) information
after committing the working copy changes (i.e. an added and a deleted
file). When they try `hg mv [--after] <src> <dst>`, that just fails
because the source file doesn't exist. It seems helpful if we can
point them to `--at-rev=.` in this case.
Differential Revision: https://phab.mercurial-scm.org/D10697
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 20:21:56 +0200] rev 47126
dirstate-tree: Borrow paths from the "on disk" bytes
Use std::borrow::Cow to avoid some memory allocations and copying.
Differential Revision: https://phab.mercurial-scm.org/D10560
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 19:33:04 +0200] rev 47125
dirstate-tree: Borrow copy source paths from the "on disk" bytes
Use std::borrow::Cow to avoid some memory allocations and copying.
These particular allocations are not visible when profiling (as many files
in a typical repo don’t have a copy source). This change is "warm up"
for doing the same with paths of files themselves, which is more involved
since those paths are used as `HashMap` keys. This gets of the way the
addition of a lifetime parameter to several types.
Differential Revision: https://phab.mercurial-scm.org/D10559
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 19:57:46 +0200] rev 47124
rust: Use `&HgPath` instead of `&HgPathBuf` in may APIs
Getting the former (through `Deref`) is almost the only useful thing one can
do with the latter anyway. With this changes, API become more flexible for the
"provider" of these paths which may store something else that Deref’s to HgPath,
such as `std::borrow::Cow<HgPath>`. Using `Cow` can help reduce memory alloactions
and copying.
Differential Revision: https://phab.mercurial-scm.org/D10558
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 18:24:54 +0200] rev 47123
dirstate-tree: Make `DirstateMap` borrow from a bytes buffer
… that has the contents of the `.hg/dirstate` file.
This only applies to the tree-based flavor of `DirstateMap`.
For now only the entire `&[u8]` slice is stored, so this is not useful yet.
Adding a lifetime parameter to the `DirstateMap` struct (in hg-core) makes
Python bindings non-trivial because we keep that struct in a Python object
that has a dynamic lifetime tied to Python’s reference-counting and GC.
As long as we keep the `PyBytes` that owns the borrowed bytes buffer next to
the borrowing struct, the buffer will live long enough for the borrows to stay
valid. However this relationship cannot be expressed in safe Rust code in a
way that would statisfy they borrow-checker. We use `unsafe` code to erase
that lifetime parameter, and encapsulate it in a safe abstraction similar to
the owning-ref crate: https://docs.rs/owning_ref/
Differential Revision: https://phab.mercurial-scm.org/D10557
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 18:13:31 +0200] rev 47122
rust: Read dirstate from disk in DirstateMap constructor
Before this changeset, Python code first creates an empty `DirstateMap` Rust
object, then immediately calls its `read` method with a byte string of the
contents of the `.hg/dirstate` file.
This makes that byte string available to the constructor of `DirstateMap`
in the hg-cpython crate. This is a first step towards enabling parts of
`DirstateMap` in the hg-core crate to borrow from this buffer without copying.
Differential Revision: https://phab.mercurial-scm.org/D10556
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 15:40:11 +0200] rev 47121
rust: Remove handling of `parents` in `DirstateMap`
The Python wrapper class `dirstatemap` can take care of it.
This removes the need to have both `_rustmap` and `_inner_rustmap`.
Differential Revision: https://phab.mercurial-scm.org/D10555
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 14:22:14 +0200] rev 47120
dirstate-tree: Fold "tracked descendants" counter update in main walk
For the purpose of implementing `has_tracked_dir` (which means "has tracked
descendants) without an expensive sub-tree traversal, we maintaing a counter
of tracked descendants on each "directory" node of the tree-shaped dirstate.
Before this changeset, mutating or inserting a node at a given path would
involve:
* Walking the tree from root through ancestors to find the node or the spot
where to insert it
* Looking at the previous node if any to decide what counter update is needed
* Performing any node mutation
* Walking the tree *again* to update counters in ancestor nodes
When profiling `hg status` on a large repo, this second walk takes times
while loading a the dirstate from disk.
It turns out we have enough information to decide before he first tree walk
what counter update is needed. This changeset merges the two walks, gaining
~10% of the total time for `hg update` (in the same hyperfine benchmark as
the previous changeset).
---
Profiling was done by compiling with this `.cargo/config`:
[profile.release]
debug = true
then running with:
py-spy record -r 500 -n -o /tmp/hg.json --format speedscope -- \
./hg status -R $REPO --config experimental.dirstate-tree.in-memory=1
then visualizing the recorded JSON file in https://www.speedscope.app/
Differential Revision: https://phab.mercurial-scm.org/D10554
Simon Sapin <simon.sapin@octobus.net> [Thu, 29 Apr 2021 11:32:57 +0200] rev 47119
dirstate-tree: Use HashMap instead of BTreeMap
BTreeMap has the advantage of its "natural" iteration order being the one we need
in the status algorithm. With HashMap however, iteration order is undefined so
we need to allocate a Vec and sort it explicitly.
Unfortunately many BTreeMap operations are slower than in HashMap, and skipping
that extra allocation and sort is not enough to compensate.
Switching to HashMap + sort makes `hg status` 17% faster in one test case,
as measure with hyperfine:
```
Benchmark #1: ../hg2/hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1
Time (mean ± σ): 765.0 ms ± 8.8 ms [User: 1.352 s, System: 0.747 s]
Range (min … max): 751.8 ms … 778.7 ms 10 runs
Benchmark #2: ./hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1
Time (mean ± σ): 651.8 ms ± 9.9 ms [User: 1.251 s, System: 0.799 s]
Range (min … max): 642.2 ms … 671.8 ms 10 runs
Summary
'./hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1' ran
1.17 ± 0.02 times faster than '../hg2/hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1'
```
* ./hg is this revision
* ../hg2/hg is its parent
* $REPO is an old snapshot of mozilla-central
Differential Revision: https://phab.mercurial-scm.org/D10553
Simon Sapin <simon.sapin@octobus.net> [Tue, 27 Apr 2021 17:49:38 +0200] rev 47118
dirstate-tree: Add #[timed] attribute to `status` and `DirstateMap::read`
When running with a `RUST_LOG=trace` environment variable, the `micro_timer`
crate prints the duration taken by each call to functions with that attribute.
Differential Revision: https://phab.mercurial-scm.org/D10552
Simon Sapin <simon.sapin@octobus.net> [Tue, 27 Apr 2021 14:20:48 +0200] rev 47117
dirstate-tree: Paralellize the status algorithm with Rayon
The `rayon` crate exposes "parallel iterators" that work like normal iterators
but dispatch work on different items to an implicit global thread pool.
Differential Revision: https://phab.mercurial-scm.org/D10551
Simon Sapin <simon.sapin@octobus.net> [Tue, 27 Apr 2021 12:42:21 +0200] rev 47116
dirstate-tree: Avoid BTreeMap double-lookup when inserting a dirstate entry
The child nodes of a given node in the tree-shaped dirstate are kept in a
`BTreeMap` where keys are file names as strings. Finding or inserting a value
in the map takes `O(log(n))` string comparisons, which adds up when constructing
the tree.
The `entry` API allows finding a "spot" in the map that may or may not be
occupied and then access that value or insert a new one without doing map
lookup again. However the current API is limited in that calling `entry`
requires an owned key (and so a memory allocation), even if it ends up not
being used in the case where the map already has a value with an equal key.
This is still a win, with 4% better end-to-end time for `hg status` measured
here with hyperfine:
```
Benchmark #1: ../hg2/hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1
Time (mean ± σ): 1.337 s ± 0.018 s [User: 892.9 ms, System: 437.5 ms]
Range (min … max): 1.316 s … 1.373 s 10 runs
Benchmark #2: ./hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1
Time (mean ± σ): 1.291 s ± 0.008 s [User: 853.4 ms, System: 431.1 ms]
Range (min … max): 1.283 s … 1.309 s 10 runs
Summary
'./hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1' ran
1.04 ± 0.02 times faster than '../hg2/hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1'
```
* ./hg is this revision
* ../hg2/hg is its parent
* $REPO is an old snapshot of mozilla-central
Differential Revision: https://phab.mercurial-scm.org/D10550
Simon Sapin <simon.sapin@octobus.net> [Mon, 26 Apr 2021 19:28:56 +0200] rev 47115
dirstate-tree: Handle I/O errors in status
Errors such as insufficient permissions when listing a directory are logged,
and the algorithm continues without considering that directory.
Differential Revision: https://phab.mercurial-scm.org/D10549