Fri, 07 May 2021 08:38:17 -0700 rename: add hint about --at-rev if source file doesn't exist
Martin von Zweigbergk <martinvonz@google.com> [Fri, 07 May 2021 08:38:17 -0700] rev 47141
rename: add hint about --at-rev if source file doesn't exist It's quite common that users want to record copy (rename) information after committing the working copy changes (i.e. an added and a deleted file). When they try `hg mv [--after] <src> <dst>`, that just fails because the source file doesn't exist. It seems helpful if we can point them to `--at-rev=.` in this case. Differential Revision: https://phab.mercurial-scm.org/D10697
Fri, 30 Apr 2021 20:21:56 +0200 dirstate-tree: Borrow paths from the "on disk" bytes
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 20:21:56 +0200] rev 47140
dirstate-tree: Borrow paths from the "on disk" bytes Use std::borrow::Cow to avoid some memory allocations and copying. Differential Revision: https://phab.mercurial-scm.org/D10560
Fri, 30 Apr 2021 19:33:04 +0200 dirstate-tree: Borrow copy source paths from the "on disk" bytes
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 19:33:04 +0200] rev 47139
dirstate-tree: Borrow copy source paths from the "on disk" bytes Use std::borrow::Cow to avoid some memory allocations and copying. These particular allocations are not visible when profiling (as many files in a typical repo don’t have a copy source). This change is "warm up" for doing the same with paths of files themselves, which is more involved since those paths are used as `HashMap` keys. This gets of the way the addition of a lifetime parameter to several types. Differential Revision: https://phab.mercurial-scm.org/D10559
Fri, 30 Apr 2021 19:57:46 +0200 rust: Use `&HgPath` instead of `&HgPathBuf` in may APIs
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 19:57:46 +0200] rev 47138
rust: Use `&HgPath` instead of `&HgPathBuf` in may APIs Getting the former (through `Deref`) is almost the only useful thing one can do with the latter anyway. With this changes, API become more flexible for the "provider" of these paths which may store something else that Deref’s to HgPath, such as `std::borrow::Cow<HgPath>`. Using `Cow` can help reduce memory alloactions and copying. Differential Revision: https://phab.mercurial-scm.org/D10558
Fri, 30 Apr 2021 18:24:54 +0200 dirstate-tree: Make `DirstateMap` borrow from a bytes buffer
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 18:24:54 +0200] rev 47137
dirstate-tree: Make `DirstateMap` borrow from a bytes buffer … that has the contents of the `.hg/dirstate` file. This only applies to the tree-based flavor of `DirstateMap`. For now only the entire `&[u8]` slice is stored, so this is not useful yet. Adding a lifetime parameter to the `DirstateMap` struct (in hg-core) makes Python bindings non-trivial because we keep that struct in a Python object that has a dynamic lifetime tied to Python’s reference-counting and GC. As long as we keep the `PyBytes` that owns the borrowed bytes buffer next to the borrowing struct, the buffer will live long enough for the borrows to stay valid. However this relationship cannot be expressed in safe Rust code in a way that would statisfy they borrow-checker. We use `unsafe` code to erase that lifetime parameter, and encapsulate it in a safe abstraction similar to the owning-ref crate: https://docs.rs/owning_ref/ Differential Revision: https://phab.mercurial-scm.org/D10557
Fri, 30 Apr 2021 18:13:31 +0200 rust: Read dirstate from disk in DirstateMap constructor
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 18:13:31 +0200] rev 47136
rust: Read dirstate from disk in DirstateMap constructor Before this changeset, Python code first creates an empty `DirstateMap` Rust object, then immediately calls its `read` method with a byte string of the contents of the `.hg/dirstate` file. This makes that byte string available to the constructor of `DirstateMap` in the hg-cpython crate. This is a first step towards enabling parts of `DirstateMap` in the hg-core crate to borrow from this buffer without copying. Differential Revision: https://phab.mercurial-scm.org/D10556
Fri, 30 Apr 2021 15:40:11 +0200 rust: Remove handling of `parents` in `DirstateMap`
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 15:40:11 +0200] rev 47135
rust: Remove handling of `parents` in `DirstateMap` The Python wrapper class `dirstatemap` can take care of it. This removes the need to have both `_rustmap` and `_inner_rustmap`. Differential Revision: https://phab.mercurial-scm.org/D10555
Fri, 30 Apr 2021 14:22:14 +0200 dirstate-tree: Fold "tracked descendants" counter update in main walk
Simon Sapin <simon.sapin@octobus.net> [Fri, 30 Apr 2021 14:22:14 +0200] rev 47134
dirstate-tree: Fold "tracked descendants" counter update in main walk For the purpose of implementing `has_tracked_dir` (which means "has tracked descendants) without an expensive sub-tree traversal, we maintaing a counter of tracked descendants on each "directory" node of the tree-shaped dirstate. Before this changeset, mutating or inserting a node at a given path would involve: * Walking the tree from root through ancestors to find the node or the spot where to insert it * Looking at the previous node if any to decide what counter update is needed * Performing any node mutation * Walking the tree *again* to update counters in ancestor nodes When profiling `hg status` on a large repo, this second walk takes times while loading a the dirstate from disk. It turns out we have enough information to decide before he first tree walk what counter update is needed. This changeset merges the two walks, gaining ~10% of the total time for `hg update` (in the same hyperfine benchmark as the previous changeset). --- Profiling was done by compiling with this `.cargo/config`: [profile.release] debug = true then running with: py-spy record -r 500 -n -o /tmp/hg.json --format speedscope -- \ ./hg status -R $REPO --config experimental.dirstate-tree.in-memory=1 then visualizing the recorded JSON file in https://www.speedscope.app/ Differential Revision: https://phab.mercurial-scm.org/D10554
Thu, 29 Apr 2021 11:32:57 +0200 dirstate-tree: Use HashMap instead of BTreeMap
Simon Sapin <simon.sapin@octobus.net> [Thu, 29 Apr 2021 11:32:57 +0200] rev 47133
dirstate-tree: Use HashMap instead of BTreeMap BTreeMap has the advantage of its "natural" iteration order being the one we need in the status algorithm. With HashMap however, iteration order is undefined so we need to allocate a Vec and sort it explicitly. Unfortunately many BTreeMap operations are slower than in HashMap, and skipping that extra allocation and sort is not enough to compensate. Switching to HashMap + sort makes `hg status` 17% faster in one test case, as measure with hyperfine: ``` Benchmark #1: ../hg2/hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1 Time (mean ± σ): 765.0 ms ± 8.8 ms [User: 1.352 s, System: 0.747 s] Range (min … max): 751.8 ms … 778.7 ms 10 runs Benchmark #2: ./hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1 Time (mean ± σ): 651.8 ms ± 9.9 ms [User: 1.251 s, System: 0.799 s] Range (min … max): 642.2 ms … 671.8 ms 10 runs Summary './hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1' ran 1.17 ± 0.02 times faster than '../hg2/hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1' ``` * ./hg is this revision * ../hg2/hg is its parent * $REPO is an old snapshot of mozilla-central Differential Revision: https://phab.mercurial-scm.org/D10553
Tue, 27 Apr 2021 17:49:38 +0200 dirstate-tree: Add #[timed] attribute to `status` and `DirstateMap::read`
Simon Sapin <simon.sapin@octobus.net> [Tue, 27 Apr 2021 17:49:38 +0200] rev 47132
dirstate-tree: Add #[timed] attribute to `status` and `DirstateMap::read` When running with a `RUST_LOG=trace` environment variable, the `micro_timer` crate prints the duration taken by each call to functions with that attribute. Differential Revision: https://phab.mercurial-scm.org/D10552
Tue, 27 Apr 2021 14:20:48 +0200 dirstate-tree: Paralellize the status algorithm with Rayon
Simon Sapin <simon.sapin@octobus.net> [Tue, 27 Apr 2021 14:20:48 +0200] rev 47131
dirstate-tree: Paralellize the status algorithm with Rayon The `rayon` crate exposes "parallel iterators" that work like normal iterators but dispatch work on different items to an implicit global thread pool. Differential Revision: https://phab.mercurial-scm.org/D10551
Tue, 27 Apr 2021 12:42:21 +0200 dirstate-tree: Avoid BTreeMap double-lookup when inserting a dirstate entry
Simon Sapin <simon.sapin@octobus.net> [Tue, 27 Apr 2021 12:42:21 +0200] rev 47130
dirstate-tree: Avoid BTreeMap double-lookup when inserting a dirstate entry The child nodes of a given node in the tree-shaped dirstate are kept in a `BTreeMap` where keys are file names as strings. Finding or inserting a value in the map takes `O(log(n))` string comparisons, which adds up when constructing the tree. The `entry` API allows finding a "spot" in the map that may or may not be occupied and then access that value or insert a new one without doing map lookup again. However the current API is limited in that calling `entry` requires an owned key (and so a memory allocation), even if it ends up not being used in the case where the map already has a value with an equal key. This is still a win, with 4% better end-to-end time for `hg status` measured here with hyperfine: ``` Benchmark #1: ../hg2/hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1 Time (mean ± σ): 1.337 s ± 0.018 s [User: 892.9 ms, System: 437.5 ms] Range (min … max): 1.316 s … 1.373 s 10 runs Benchmark #2: ./hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1 Time (mean ± σ): 1.291 s ± 0.008 s [User: 853.4 ms, System: 431.1 ms] Range (min … max): 1.283 s … 1.309 s 10 runs Summary './hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1' ran 1.04 ± 0.02 times faster than '../hg2/hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1' ``` * ./hg is this revision * ../hg2/hg is its parent * $REPO is an old snapshot of mozilla-central Differential Revision: https://phab.mercurial-scm.org/D10550
Mon, 26 Apr 2021 19:28:56 +0200 dirstate-tree: Handle I/O errors in status
Simon Sapin <simon.sapin@octobus.net> [Mon, 26 Apr 2021 19:28:56 +0200] rev 47129
dirstate-tree: Handle I/O errors in status Errors such as insufficient permissions when listing a directory are logged, and the algorithm continues without considering that directory. Differential Revision: https://phab.mercurial-scm.org/D10549
Mon, 26 Apr 2021 19:16:23 +0200 dirstate-tree: Ignore FIFOs etc. in the status algorithm
Simon Sapin <simon.sapin@octobus.net> [Mon, 26 Apr 2021 19:16:23 +0200] rev 47128
dirstate-tree: Ignore FIFOs etc. in the status algorithm If a filesystem directory contains anything that is not: * a "normal" file * a symbolic link * or a directory … act as if that directory entry was not there. For example, if that path was previously a tracked file, mark it as deleted or removed. Differential Revision: https://phab.mercurial-scm.org/D10548
Fri, 16 Apr 2021 12:12:41 +0200 dirstate-tree: Add the new `status()` algorithm
Simon Sapin <simon.sapin@octobus.net> [Fri, 16 Apr 2021 12:12:41 +0200] rev 47127
dirstate-tree: Add the new `status()` algorithm With the dirstate organized in a tree that mirrors the structure of the filesystem tree, we can traverse both trees at the same time in order to compare them. This is hopefully more efficient that building multiple big hashmaps for all of the repository’s contents. Differential Revision: https://phab.mercurial-scm.org/D10547
Fri, 16 Apr 2021 12:12:04 +0200 dirstate-tree: Give to `status()` mutable access to the `DirstateMap`
Simon Sapin <simon.sapin@octobus.net> [Fri, 16 Apr 2021 12:12:04 +0200] rev 47126
dirstate-tree: Give to `status()` mutable access to the `DirstateMap` Differential Revision: https://phab.mercurial-scm.org/D10546
Tue, 06 Apr 2021 15:49:01 +0200 rust: Add doc-comments to DirstateStatus fields
Simon Sapin <simon.sapin@octobus.net> [Tue, 06 Apr 2021 15:49:01 +0200] rev 47125
rust: Add doc-comments to DirstateStatus fields Differential Revision: https://phab.mercurial-scm.org/D10495
Tue, 06 Apr 2021 15:14:19 +0200 rust: Move "lookup" a.k.a. "unsure" paths into `DirstateStatus` struct
Simon Sapin <simon.sapin@octobus.net> [Tue, 06 Apr 2021 15:14:19 +0200] rev 47124
rust: Move "lookup" a.k.a. "unsure" paths into `DirstateStatus` struct Instead of having `status()` returning a tuple of those paths and `DirstateStatus`. Differential Revision: https://phab.mercurial-scm.org/D10494
Tue, 13 Apr 2021 17:02:58 +0200 rust: Remove DirstateMap::file_fold_map
Simon Sapin <simon.sapin@octobus.net> [Tue, 13 Apr 2021 17:02:58 +0200] rev 47123
rust: Remove DirstateMap::file_fold_map This was a HashMap constructed on demand and then cached in the DirstateMap struct to avoid reconstructing at the next access. However the only use is in Python bindings converting it to a PyDict. That method in turn is wrapped in a @cachedproperty in Python code. This was two redudant layers of caching. This changeset removes the Rust-level one to keep the Python dict cache, and have bindings create a PyDict by iterating. Differential Revision: https://phab.mercurial-scm.org/D10493
Fri, 09 Apr 2021 13:13:19 +0200 dirstate-tree: Add "non normal" and "from other parent" sets
Simon Sapin <simon.sapin@octobus.net> [Fri, 09 Apr 2021 13:13:19 +0200] rev 47122
dirstate-tree: Add "non normal" and "from other parent" sets Unlike the other DirstateMap implementation, these sets are not materialized separately in memory. Instead we traverse the main tree. Differential Revision: https://phab.mercurial-scm.org/D10492
Fri, 09 Apr 2021 12:55:35 +0200 dirstate-tree: Add add_file, remove_file, and drop_file
Simon Sapin <simon.sapin@octobus.net> [Fri, 09 Apr 2021 12:55:35 +0200] rev 47121
dirstate-tree: Add add_file, remove_file, and drop_file Again, various counters need to be kept up to date. Differential Revision: https://phab.mercurial-scm.org/D10491
Mon, 12 Apr 2021 19:46:24 +0200 dirstate-tree: Add has_dir and has_tracked_dir
Simon Sapin <simon.sapin@octobus.net> [Mon, 12 Apr 2021 19:46:24 +0200] rev 47120
dirstate-tree: Add has_dir and has_tracked_dir A node without a `DirstateMap` entry represents a directory. Only some values of `EntryState` represent tracked files. A directory is considered "tracked" if it contains any descendant file that is tracked. To avoid a sub-tree traversal in `has_tracked_dir` we add a counter for this. A boolean flag would become insufficent when we implement remove_file and drop_file. `add_file_node` is more general than needed here, in anticipation of adding the `add_file` and `remove_file` methods. Differential Revision: https://phab.mercurial-scm.org/D10490
Mon, 12 Apr 2021 18:42:51 +0200 dirstate-tree: Add clear_ambiguous_times in the new DirstateMap
Simon Sapin <simon.sapin@octobus.net> [Mon, 12 Apr 2021 18:42:51 +0200] rev 47119
dirstate-tree: Add clear_ambiguous_times in the new DirstateMap Also drive-by refactor it in the other DirstateMap Differential Revision: https://phab.mercurial-scm.org/D10489
Mon, 12 Apr 2021 17:53:37 +0200 dirstate-tree: Add copy_map_insert and copy_map_remove
Simon Sapin <simon.sapin@octobus.net> [Mon, 12 Apr 2021 17:53:37 +0200] rev 47118
dirstate-tree: Add copy_map_insert and copy_map_remove Differential Revision: https://phab.mercurial-scm.org/D10488
Mon, 12 Apr 2021 17:29:55 +0200 dirstate-tree: Maintain a counter of DirstateEntry’s and copy sources
Simon Sapin <simon.sapin@octobus.net> [Mon, 12 Apr 2021 17:29:55 +0200] rev 47117
dirstate-tree: Maintain a counter of DirstateEntry’s and copy sources This allows implementing __len__ for DirstateMap and CopyMap efficiently, without traversing the tree. Differential Revision: https://phab.mercurial-scm.org/D10487
Mon, 12 Apr 2021 14:21:47 +0200 dirstate-tree: Serialize to disk
Simon Sapin <simon.sapin@octobus.net> [Mon, 12 Apr 2021 14:21:47 +0200] rev 47116
dirstate-tree: Serialize to disk The existing `pack_dirstate` function relies on implementation details of `DirstateMap`, so extract some parts of it as separate functions for us in the tree-based `DirstateMap`. The `bytes-cast` crate is updated to a version that has an `as_bytes` method, not just `from_bytes`: https://docs.rs/bytes-cast/0.2.0/bytes_cast/trait.BytesCast.html#method.as_bytes Drive-by refactor `clear_ambiguous_times` which does part of the same thing. Differential Revision: https://phab.mercurial-scm.org/D10486
Mon, 12 Apr 2021 14:43:45 +0200 rust: Add a Timestamp struct instead of abusing Duration
Simon Sapin <simon.sapin@octobus.net> [Mon, 12 Apr 2021 14:43:45 +0200] rev 47115
rust: Add a Timestamp struct instead of abusing Duration `SystemTime` would be the standard library type semantically appropriate instead of `Duration`. But since the value is coming from Python as a plain integer and used in dirstate packing code as an integer, let’s make a type that contains a single integer instead of using one with sub-second precision. Differential Revision: https://phab.mercurial-scm.org/D10485
Tue, 06 Apr 2021 21:07:12 +0200 dirstate-tree: Add tree traversal/iteration
Simon Sapin <simon.sapin@octobus.net> [Tue, 06 Apr 2021 21:07:12 +0200] rev 47114
dirstate-tree: Add tree traversal/iteration Like Python’s, Rust’s iterators are "external" in that they are driven by a caller who calls a `next` method. This is as opposed to "internal" iterators who drive themselves and call a callback for each item. Writing an internal iterator traversing a tree is easy with recursion, but internal iterators cannot rely on the call stack in that way, they must save in an explicit object all state that they need to be preserved across two `next` calls. This algorithm uses a `Vec` as a stack that contains what would be local variables on the call stack if we could use recursion. Differential Revision: https://phab.mercurial-scm.org/D10370
Tue, 06 Apr 2021 14:35:39 +0200 dirstate-tree: Add map `get` and `contains_key` methods
Simon Sapin <simon.sapin@octobus.net> [Tue, 06 Apr 2021 14:35:39 +0200] rev 47113
dirstate-tree: Add map `get` and `contains_key` methods Differential Revision: https://phab.mercurial-scm.org/D10369
Tue, 06 Apr 2021 14:29:05 +0200 dirstate-tree: Add parsing only dirstate parents from disk
Simon Sapin <simon.sapin@octobus.net> [Tue, 06 Apr 2021 14:29:05 +0200] rev 47112
dirstate-tree: Add parsing only dirstate parents from disk Differential Revision: https://phab.mercurial-scm.org/D10368
Wed, 31 Mar 2021 18:59:49 +0200 dirstate-tree: Implement DirstateMap::read
Simon Sapin <simon.sapin@octobus.net> [Wed, 31 Mar 2021 18:59:49 +0200] rev 47111
dirstate-tree: Implement DirstateMap::read This reads the on-disk dirstate in its current format (a flat sequence of entries) and creates a tree in memory. Differential Revision: https://phab.mercurial-scm.org/D10367
Thu, 08 Apr 2021 20:12:24 +0200 dirstate-tree: Add `WithBasename` wrapper for `HgPath`
Simon Sapin <simon.sapin@octobus.net> [Thu, 08 Apr 2021 20:12:24 +0200] rev 47110
dirstate-tree: Add `WithBasename` wrapper for `HgPath` In the tree-shaped dirstate we want to have nodes representing files or directories, where directory nodes contain a map associating "base" names to child nodes for child files and directories. Many dirstate operations expect a full path from the repository root, but re-concatenating string from nested map keys all the time might be expensive. Instead, `WithBasename` stores a full path for these operations but behaves as its base name (last path component) for equality and comparison. Additionally `inclusive_ancestors` provides the successive map keys that are needed when inserting a new dirstate node at a given full path. Differential Revision: https://phab.mercurial-scm.org/D10365
(0) -30000 -10000 -3000 -1000 -300 -100 -50 -32 +32 +50 +100 +300 +1000 +3000 tip