Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 20:21:56 +0200] rev 47126
dirstate-tree: Borrow paths from the "on disk" bytes
Use std::borrow::Cow to avoid some memory allocations and copying.
Differential Revision: https://phab.mercurial-scm.org/D10560
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 19:33:04 +0200] rev 47125
dirstate-tree: Borrow copy source paths from the "on disk" bytes
Use std::borrow::Cow to avoid some memory allocations and copying.
These particular allocations are not visible when profiling (as many files
in a typical repo don’t have a copy source). This change is "warm up"
for doing the same with paths of files themselves, which is more involved
since those paths are used as `HashMap` keys. This gets of the way the
addition of a lifetime parameter to several types.
Differential Revision: https://phab.mercurial-scm.org/D10559
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 19:57:46 +0200] rev 47124
rust: Use `&HgPath` instead of `&HgPathBuf` in may APIs
Getting the former (through `Deref`) is almost the only useful thing one can
do with the latter anyway. With this changes, API become more flexible for the
"provider" of these paths which may store something else that Deref’s to HgPath,
such as `std::borrow::Cow<HgPath>`. Using `Cow` can help reduce memory alloactions
and copying.
Differential Revision: https://phab.mercurial-scm.org/D10558
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 18:24:54 +0200] rev 47123
dirstate-tree: Make `DirstateMap` borrow from a bytes buffer
… that has the contents of the `.hg/dirstate` file.
This only applies to the tree-based flavor of `DirstateMap`.
For now only the entire `&[u8]` slice is stored, so this is not useful yet.
Adding a lifetime parameter to the `DirstateMap` struct (in hg-core) makes
Python bindings non-trivial because we keep that struct in a Python object
that has a dynamic lifetime tied to Python’s reference-counting and GC.
As long as we keep the `PyBytes` that owns the borrowed bytes buffer next to
the borrowing struct, the buffer will live long enough for the borrows to stay
valid. However this relationship cannot be expressed in safe Rust code in a
way that would statisfy they borrow-checker. We use `unsafe` code to erase
that lifetime parameter, and encapsulate it in a safe abstraction similar to
the owning-ref crate: https://docs.rs/owning_ref/
Differential Revision: https://phab.mercurial-scm.org/D10557
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 18:13:31 +0200] rev 47122
rust: Read dirstate from disk in DirstateMap constructor
Before this changeset, Python code first creates an empty `DirstateMap` Rust
object, then immediately calls its `read` method with a byte string of the
contents of the `.hg/dirstate` file.
This makes that byte string available to the constructor of `DirstateMap`
in the hg-cpython crate. This is a first step towards enabling parts of
`DirstateMap` in the hg-core crate to borrow from this buffer without copying.
Differential Revision: https://phab.mercurial-scm.org/D10556
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 15:40:11 +0200] rev 47121
rust: Remove handling of `parents` in `DirstateMap`
The Python wrapper class `dirstatemap` can take care of it.
This removes the need to have both `_rustmap` and `_inner_rustmap`.
Differential Revision: https://phab.mercurial-scm.org/D10555
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 14:22:14 +0200] rev 47120
dirstate-tree: Fold "tracked descendants" counter update in main walk
For the purpose of implementing `has_tracked_dir` (which means "has tracked
descendants) without an expensive sub-tree traversal, we maintaing a counter
of tracked descendants on each "directory" node of the tree-shaped dirstate.
Before this changeset, mutating or inserting a node at a given path would
involve:
* Walking the tree from root through ancestors to find the node or the spot
where to insert it
* Looking at the previous node if any to decide what counter update is needed
* Performing any node mutation
* Walking the tree *again* to update counters in ancestor nodes
When profiling `hg status` on a large repo, this second walk takes times
while loading a the dirstate from disk.
It turns out we have enough information to decide before he first tree walk
what counter update is needed. This changeset merges the two walks, gaining
~10% of the total time for `hg update` (in the same hyperfine benchmark as
the previous changeset).
---
Profiling was done by compiling with this `.cargo/config`:
[profile.release]
debug = true
then running with:
py-spy record -r 500 -n -o /tmp/hg.json --format speedscope -- \
./hg status -R $REPO --config experimental.dirstate-tree.in-memory=1
then visualizing the recorded JSON file in https://www.speedscope.app/
Differential Revision: https://phab.mercurial-scm.org/D10554
Simon Sapin <simon.sapin@octobus.net> [Thu, 29 Apr 2021 11:32:57 +0200] rev 47119
dirstate-tree: Use HashMap instead of BTreeMap
BTreeMap has the advantage of its "natural" iteration order being the one we need
in the status algorithm. With HashMap however, iteration order is undefined so
we need to allocate a Vec and sort it explicitly.
Unfortunately many BTreeMap operations are slower than in HashMap, and skipping
that extra allocation and sort is not enough to compensate.
Switching to HashMap + sort makes `hg status` 17% faster in one test case,
as measure with hyperfine:
```
Benchmark #1: ../hg2/hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1
Time (mean ± σ): 765.0 ms ± 8.8 ms [User: 1.352 s, System: 0.747 s]
Range (min … max): 751.8 ms … 778.7 ms 10 runs
Benchmark #2: ./hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1
Time (mean ± σ): 651.8 ms ± 9.9 ms [User: 1.251 s, System: 0.799 s]
Range (min … max): 642.2 ms … 671.8 ms 10 runs
Summary
'./hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1' ran
1.17 ± 0.02 times faster than '../hg2/hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1'
```
* ./hg is this revision
* ../hg2/hg is its parent
* $REPO is an old snapshot of mozilla-central
Differential Revision: https://phab.mercurial-scm.org/D10553