Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 15:40:11 +0200] rev 47135
rust: Remove handling of `parents` in `DirstateMap`
The Python wrapper class `dirstatemap` can take care of it.
This removes the need to have both `_rustmap` and `_inner_rustmap`.
Differential Revision: https://phab.mercurial-scm.org/D10555
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 14:22:14 +0200] rev 47134
dirstate-tree: Fold "tracked descendants" counter update in main walk
For the purpose of implementing `has_tracked_dir` (which means "has tracked
descendants) without an expensive sub-tree traversal, we maintaing a counter
of tracked descendants on each "directory" node of the tree-shaped dirstate.
Before this changeset, mutating or inserting a node at a given path would
involve:
* Walking the tree from root through ancestors to find the node or the spot
where to insert it
* Looking at the previous node if any to decide what counter update is needed
* Performing any node mutation
* Walking the tree *again* to update counters in ancestor nodes
When profiling `hg status` on a large repo, this second walk takes times
while loading a the dirstate from disk.
It turns out we have enough information to decide before he first tree walk
what counter update is needed. This changeset merges the two walks, gaining
~10% of the total time for `hg update` (in the same hyperfine benchmark as
the previous changeset).
---
Profiling was done by compiling with this `.cargo/config`:
[profile.release]
debug = true
then running with:
py-spy record -r 500 -n -o /tmp/hg.json --format speedscope -- \
./hg status -R $REPO --config experimental.dirstate-tree.in-memory=1
then visualizing the recorded JSON file in https://www.speedscope.app/
Differential Revision: https://phab.mercurial-scm.org/D10554
Simon Sapin <simon.sapin@octobus.net> [Thu, 29 Apr 2021 11:32:57 +0200] rev 47133
dirstate-tree: Use HashMap instead of BTreeMap
BTreeMap has the advantage of its "natural" iteration order being the one we need
in the status algorithm. With HashMap however, iteration order is undefined so
we need to allocate a Vec and sort it explicitly.
Unfortunately many BTreeMap operations are slower than in HashMap, and skipping
that extra allocation and sort is not enough to compensate.
Switching to HashMap + sort makes `hg status` 17% faster in one test case,
as measure with hyperfine:
```
Benchmark #1: ../hg2/hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1
Time (mean ± σ): 765.0 ms ± 8.8 ms [User: 1.352 s, System: 0.747 s]
Range (min … max): 751.8 ms … 778.7 ms 10 runs
Benchmark #2: ./hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1
Time (mean ± σ): 651.8 ms ± 9.9 ms [User: 1.251 s, System: 0.799 s]
Range (min … max): 642.2 ms … 671.8 ms 10 runs
Summary
'./hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1' ran
1.17 ± 0.02 times faster than '../hg2/hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1'
```
* ./hg is this revision
* ../hg2/hg is its parent
* $REPO is an old snapshot of mozilla-central
Differential Revision: https://phab.mercurial-scm.org/D10553
Simon Sapin <simon.sapin@octobus.net> [Tue, 27 Apr 2021 17:49:38 +0200] rev 47132
dirstate-tree: Add #[timed] attribute to `status` and `DirstateMap::read`
When running with a `RUST_LOG=trace` environment variable, the `micro_timer`
crate prints the duration taken by each call to functions with that attribute.
Differential Revision: https://phab.mercurial-scm.org/D10552
Simon Sapin <simon.sapin@octobus.net> [Tue, 27 Apr 2021 14:20:48 +0200] rev 47131
dirstate-tree: Paralellize the status algorithm with Rayon
The `rayon` crate exposes "parallel iterators" that work like normal iterators
but dispatch work on different items to an implicit global thread pool.
Differential Revision: https://phab.mercurial-scm.org/D10551
Simon Sapin <simon.sapin@octobus.net> [Tue, 27 Apr 2021 12:42:21 +0200] rev 47130
dirstate-tree: Avoid BTreeMap double-lookup when inserting a dirstate entry
The child nodes of a given node in the tree-shaped dirstate are kept in a
`BTreeMap` where keys are file names as strings. Finding or inserting a value
in the map takes `O(log(n))` string comparisons, which adds up when constructing
the tree.
The `entry` API allows finding a "spot" in the map that may or may not be
occupied and then access that value or insert a new one without doing map
lookup again. However the current API is limited in that calling `entry`
requires an owned key (and so a memory allocation), even if it ends up not
being used in the case where the map already has a value with an equal key.
This is still a win, with 4% better end-to-end time for `hg status` measured
here with hyperfine:
```
Benchmark #1: ../hg2/hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1
Time (mean ± σ): 1.337 s ± 0.018 s [User: 892.9 ms, System: 437.5 ms]
Range (min … max): 1.316 s … 1.373 s 10 runs
Benchmark #2: ./hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1
Time (mean ± σ): 1.291 s ± 0.008 s [User: 853.4 ms, System: 431.1 ms]
Range (min … max): 1.283 s … 1.309 s 10 runs
Summary
'./hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1' ran
1.04 ± 0.02 times faster than '../hg2/hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1'
```
* ./hg is this revision
* ../hg2/hg is its parent
* $REPO is an old snapshot of mozilla-central
Differential Revision: https://phab.mercurial-scm.org/D10550
Simon Sapin <simon.sapin@octobus.net> [Mon, 26 Apr 2021 19:28:56 +0200] rev 47129
dirstate-tree: Handle I/O errors in status
Errors such as insufficient permissions when listing a directory are logged,
and the algorithm continues without considering that directory.
Differential Revision: https://phab.mercurial-scm.org/D10549
Simon Sapin <simon.sapin@octobus.net> [Mon, 26 Apr 2021 19:16:23 +0200] rev 47128
dirstate-tree: Ignore FIFOs etc. in the status algorithm
If a filesystem directory contains anything that is not:
* a "normal" file
* a symbolic link
* or a directory
… act as if that directory entry was not there. For example, if that path was
previously a tracked file, mark it as deleted or removed.
Differential Revision: https://phab.mercurial-scm.org/D10548
Simon Sapin <simon.sapin@octobus.net> [Fri, 16 Apr 2021 12:12:41 +0200] rev 47127
dirstate-tree: Add the new `status()` algorithm
With the dirstate organized in a tree that mirrors the structure of the
filesystem tree, we can traverse both trees at the same time in order to
compare them. This is hopefully more efficient that building multiple
big hashmaps for all of the repository’s contents.
Differential Revision: https://phab.mercurial-scm.org/D10547
Simon Sapin <simon.sapin@octobus.net> [Fri, 16 Apr 2021 12:12:04 +0200] rev 47126
dirstate-tree: Give to `status()` mutable access to the `DirstateMap`
Differential Revision: https://phab.mercurial-scm.org/D10546
Simon Sapin <simon.sapin@octobus.net> [Tue, 06 Apr 2021 15:49:01 +0200] rev 47125
rust: Add doc-comments to DirstateStatus fields
Differential Revision: https://phab.mercurial-scm.org/D10495
Simon Sapin <simon.sapin@octobus.net> [Tue, 06 Apr 2021 15:14:19 +0200] rev 47124
rust: Move "lookup" a.k.a. "unsure" paths into `DirstateStatus` struct
Instead of having `status()` returning a tuple of those paths and
`DirstateStatus`.
Differential Revision: https://phab.mercurial-scm.org/D10494
Simon Sapin <simon.sapin@octobus.net> [Tue, 13 Apr 2021 17:02:58 +0200] rev 47123
rust: Remove DirstateMap::file_fold_map
This was a HashMap constructed on demand and then cached in the DirstateMap
struct to avoid reconstructing at the next access. However the only use is
in Python bindings converting it to a PyDict. That method in turn is wrapped
in a @cachedproperty in Python code.
This was two redudant layers of caching. This changeset removes the Rust-level
one to keep the Python dict cache, and have bindings create a PyDict by
iterating.
Differential Revision: https://phab.mercurial-scm.org/D10493
Simon Sapin <simon.sapin@octobus.net> [Fri, 09 Apr 2021 13:13:19 +0200] rev 47122
dirstate-tree: Add "non normal" and "from other parent" sets
Unlike the other DirstateMap implementation, these sets are not materialized
separately in memory. Instead we traverse the main tree.
Differential Revision: https://phab.mercurial-scm.org/D10492
Simon Sapin <simon.sapin@octobus.net> [Fri, 09 Apr 2021 12:55:35 +0200] rev 47121
dirstate-tree: Add add_file, remove_file, and drop_file
Again, various counters need to be kept up to date.
Differential Revision: https://phab.mercurial-scm.org/D10491
Simon Sapin <simon.sapin@octobus.net> [Mon, 12 Apr 2021 19:46:24 +0200] rev 47120
dirstate-tree: Add has_dir and has_tracked_dir
A node without a `DirstateMap` entry represents a directory.
Only some values of `EntryState` represent tracked files.
A directory is considered "tracked" if it contains any descendant file that
is tracked. To avoid a sub-tree traversal in `has_tracked_dir` we add a
counter for this. A boolean flag would become insufficent when we implement
remove_file and drop_file.
`add_file_node` is more general than needed here, in anticipation of adding
the `add_file` and `remove_file` methods.
Differential Revision: https://phab.mercurial-scm.org/D10490