Fri, 30 Apr 2021 18:24:54 +0200 dirstate-tree: Make `DirstateMap` borrow from a bytes buffer
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 18:24:54 +0200] rev 47137
dirstate-tree: Make `DirstateMap` borrow from a bytes buffer … that has the contents of the `.hg/dirstate` file. This only applies to the tree-based flavor of `DirstateMap`. For now only the entire `&[u8]` slice is stored, so this is not useful yet. Adding a lifetime parameter to the `DirstateMap` struct (in hg-core) makes Python bindings non-trivial because we keep that struct in a Python object that has a dynamic lifetime tied to Python’s reference-counting and GC. As long as we keep the `PyBytes` that owns the borrowed bytes buffer next to the borrowing struct, the buffer will live long enough for the borrows to stay valid. However this relationship cannot be expressed in safe Rust code in a way that would statisfy they borrow-checker. We use `unsafe` code to erase that lifetime parameter, and encapsulate it in a safe abstraction similar to the owning-ref crate: https://docs.rs/owning_ref/ Differential Revision: https://phab.mercurial-scm.org/D10557
Fri, 30 Apr 2021 18:13:31 +0200 rust: Read dirstate from disk in DirstateMap constructor
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 18:13:31 +0200] rev 47136
rust: Read dirstate from disk in DirstateMap constructor Before this changeset, Python code first creates an empty `DirstateMap` Rust object, then immediately calls its `read` method with a byte string of the contents of the `.hg/dirstate` file. This makes that byte string available to the constructor of `DirstateMap` in the hg-cpython crate. This is a first step towards enabling parts of `DirstateMap` in the hg-core crate to borrow from this buffer without copying. Differential Revision: https://phab.mercurial-scm.org/D10556
Fri, 30 Apr 2021 15:40:11 +0200 rust: Remove handling of `parents` in `DirstateMap`
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 15:40:11 +0200] rev 47135
rust: Remove handling of `parents` in `DirstateMap` The Python wrapper class `dirstatemap` can take care of it. This removes the need to have both `_rustmap` and `_inner_rustmap`. Differential Revision: https://phab.mercurial-scm.org/D10555
Fri, 30 Apr 2021 14:22:14 +0200 dirstate-tree: Fold "tracked descendants" counter update in main walk
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 14:22:14 +0200] rev 47134
dirstate-tree: Fold "tracked descendants" counter update in main walk For the purpose of implementing `has_tracked_dir` (which means "has tracked descendants) without an expensive sub-tree traversal, we maintaing a counter of tracked descendants on each "directory" node of the tree-shaped dirstate. Before this changeset, mutating or inserting a node at a given path would involve: * Walking the tree from root through ancestors to find the node or the spot where to insert it * Looking at the previous node if any to decide what counter update is needed * Performing any node mutation * Walking the tree *again* to update counters in ancestor nodes When profiling `hg status` on a large repo, this second walk takes times while loading a the dirstate from disk. It turns out we have enough information to decide before he first tree walk what counter update is needed. This changeset merges the two walks, gaining ~10% of the total time for `hg update` (in the same hyperfine benchmark as the previous changeset). --- Profiling was done by compiling with this `.cargo/config`: [profile.release] debug = true then running with: py-spy record -r 500 -n -o /tmp/hg.json --format speedscope -- \ ./hg status -R $REPO --config experimental.dirstate-tree.in-memory=1 then visualizing the recorded JSON file in https://www.speedscope.app/ Differential Revision: https://phab.mercurial-scm.org/D10554
Thu, 29 Apr 2021 11:32:57 +0200 dirstate-tree: Use HashMap instead of BTreeMap
Simon Sapin <simon.sapin@octobus.net> [Thu, 29 Apr 2021 11:32:57 +0200] rev 47133
dirstate-tree: Use HashMap instead of BTreeMap BTreeMap has the advantage of its "natural" iteration order being the one we need in the status algorithm. With HashMap however, iteration order is undefined so we need to allocate a Vec and sort it explicitly. Unfortunately many BTreeMap operations are slower than in HashMap, and skipping that extra allocation and sort is not enough to compensate. Switching to HashMap + sort makes `hg status` 17% faster in one test case, as measure with hyperfine: ``` Benchmark #1: ../hg2/hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1 Time (mean ± σ): 765.0 ms ± 8.8 ms [User: 1.352 s, System: 0.747 s] Range (min … max): 751.8 ms … 778.7 ms 10 runs Benchmark #2: ./hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1 Time (mean ± σ): 651.8 ms ± 9.9 ms [User: 1.251 s, System: 0.799 s] Range (min … max): 642.2 ms … 671.8 ms 10 runs Summary './hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1' ran 1.17 ± 0.02 times faster than '../hg2/hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1' ``` * ./hg is this revision * ../hg2/hg is its parent * $REPO is an old snapshot of mozilla-central Differential Revision: https://phab.mercurial-scm.org/D10553
Tue, 27 Apr 2021 17:49:38 +0200 dirstate-tree: Add #[timed] attribute to `status` and `DirstateMap::read`
Simon Sapin <simon.sapin@octobus.net> [Tue, 27 Apr 2021 17:49:38 +0200] rev 47132
dirstate-tree: Add #[timed] attribute to `status` and `DirstateMap::read` When running with a `RUST_LOG=trace` environment variable, the `micro_timer` crate prints the duration taken by each call to functions with that attribute. Differential Revision: https://phab.mercurial-scm.org/D10552
(0) -30000 -10000 -3000 -1000 -300 -100 -30 -10 -6 +6 +10 +30 +100 +300 +1000 +3000 tip