Tue, 09 Apr 2024 02:54:19 +0200 phases: rework the logic of _pushdiscoveryphase to bound complexity
Pierre-Yves David <pierre-yves.david@octobus.net> [Tue, 09 Apr 2024 02:54:19 +0200] rev 51611
phases: rework the logic of _pushdiscoveryphase to bound complexity This rework the various graph traversal in _pushdiscoveryphase to keep the complexity in check. This is done though a couple of things: - first, limiting the space we have to explore, for example, if we are not in publishing push, we don't need to consider remote draft roots that are also draft locally, as there is nothing to be moved there. - avoid unbounded descendant computation, and use the faster "rev between" computation. This provide a massive boost to performance when exchanging with repository with a massive amount of draft, like mozilla-try: ### data-env-vars.name = mozilla-try-2023-03-22-zstd-sparse-revlog # benchmark.name = hg.command.push # bin-env-vars.hg.flavor = default # bin-env-vars.hg.py-re2-module = default # benchmark.variants.explicit-rev = all-out-heads # benchmark.variants.issue6528 = disabled # benchmark.variants.protocol = ssh # benchmark.variants.reuse-external-delta-parent = default ## benchmark.variants.revs = any-1-extra-rev before: 20.346590 seconds after: 11.232059 seconds (-38.15%, -7.48 seconds) ## benchmark.variants.revs = any-100-extra-rev before: 24.752051 seconds after: 15.367412 seconds (-37.91%, -9.38 seconds) After this changes, the push operation is still quite too slow. Some of this can be attributed to general phases slowness (reading all the roots from disk for example) and other know slowness (not using persistent-nodemap, branchmap, tags, etc. We are also working on them, but with this series, phase discovery during push no longer showing up in profile and this is a pretty nice and bit low-hanging fruit out of the way. ### (same case as the above) # benchmark.variants.revs = any-1-extra-rev pre-%ln-change: 44.235070 this-changeset: 11.232059 seconds (-74.61%, -33.00 seconds) # benchmark.variants.revs = any-100-extra-rev pre-%ln-change: 49.234697 this-changeset: 15.367412 seconds (-68.79%, -33.87 seconds) Note that with this change, the `hg push` performance is now much closer to the `hg pull` performance, even it still lagging behind a bit. (and the overall performance are still too slow). ### data-env-vars.name = mozilla-try-2023-03-22-ds2-pnm # benchmark.variants.explicit-rev = all-out-heads # benchmark.variants.issue6528 = disabled # benchmark.variants.protocol = ssh # benchmark.variants.pulled-delta-reuse-policy = default # bin-env-vars.hg.flavor = rust ## benchmark.variants.revs = any-1-extra-rev hg.command.pull: 6.517450 hg.command.push: 11.219888 ## benchmark.variants.revs = any-100-extra-rev hg.command.pull: 10.160991 hg.command.push: 14.251107 ### data-env-vars.name = mozilla-try-2023-03-22-zstd-sparse-revlog # bin-env-vars.hg.py-re2-module = default # benchmark.variants.explicit-rev = all-out-heads # benchmark.variants.issue6528 = disabled # benchmark.variants.protocol = ssh # benchmark.variants.pulled-delta-reuse-policy = default ## bin-env-vars.hg.flavor = default ## benchmark.variants.revs = any-1-extra-rev hg.command.pull: 8.577772 hg.command.push: 11.232059 ## bin-env-vars.hg.flavor = default ## benchmark.variants.revs = any-100-extra-rev hg.command.pull: 13.152976 hg.command.push: 15.367412 ## bin-env-vars.hg.flavor = rust ## benchmark.variants.revs = any-1-extra-rev hg.command.pull: 8.731982 hg.command.push: 11.178751 ## bin-env-vars.hg.flavor = rust ## benchmark.variants.revs = any-100-extra-rev hg.command.pull: 13.184236 hg.command.push: 15.620843
Fri, 05 Apr 2024 22:47:44 +0200 phases: introduce a performant efficient way to access revision in a set
Pierre-Yves David <pierre-yves.david@octobus.net> [Fri, 05 Apr 2024 22:47:44 +0200] rev 51610
phases: introduce a performant efficient way to access revision in a set This will be useful in the next changesets.
Fri, 05 Apr 2024 14:13:47 +0200 phases: use revision number in `_pushdiscoveryphase`
Pierre-Yves David <pierre-yves.david@octobus.net> [Fri, 05 Apr 2024 14:13:47 +0200] rev 51609
phases: use revision number in `_pushdiscoveryphase` We now reach our target checkpoint in terms of rev-num conversion. The `_pushdiscoveryphase` function is now performing graph computation based on revision number only. Avoiding repeated conversion from node-id to rev-num. See previous changeset updated `new_heads` for rationnal. Again, time saved in the 100 milliseconds order of magnitude for the mozilla-try benchmark I have been using. However, wow that the logic is done using revision number, we can look into having better logic in the next changesets, which will provide a much bigger speedup.
Fri, 05 Apr 2024 14:11:02 +0200 phases: move RemotePhasesSummary to revision number
Pierre-Yves David <pierre-yves.david@octobus.net> [Fri, 05 Apr 2024 14:11:02 +0200] rev 51608
phases: move RemotePhasesSummary to revision number This continue our quest to align more logic on revision number instead of node-ids. The motivation is similar to the change to `new_heads` and `analyze_remote_phases` a few changeset earlier. Again, we take this as an opportunity to rename the class, and the attribute to the new naming scheme. This will highlight the need for code update for any code using it an expecting node-ids. Many of the rev-num → node-id conversion we had to introduce in the previous changesets can now be removed. More will be removed in the future as we continue to align code toward rev-num usage. time saved in the 100 milliseconds order of magnitude for the mozilla-try benchmark I have been using.
Fri, 05 Apr 2024 12:24:47 +0200 phases: stop using `repo.set` in `remotephasessummary`
Pierre-Yves David <pierre-yves.david@octobus.net> [Fri, 05 Apr 2024 12:24:47 +0200] rev 51607
phases: stop using `repo.set` in `remotephasessummary` The `repository.set` create changectx on the fly, an expensive operation. Using `repo.revs` and a direct rev-num → node-id translation will be significantly faster. This is especially true as we prepare ourself to no longer do the rev-num → node-id transalation there. The speedup is a bit lost in the overall noisyness of the slow phase discovery algorithm, but it save a small amount of time in my benchmark.
Fri, 05 Apr 2024 12:02:43 +0200 phases: use revision number in analyze_remote_phases
Pierre-Yves David <pierre-yves.david@octobus.net> [Fri, 05 Apr 2024 12:02:43 +0200] rev 51606
phases: use revision number in analyze_remote_phases Same logic as the previous change to `new_heads`, see rationnal there. This avoids a small number of `nodes -> revs` conversion speeding thing up in the 100 milliseconds order of magnitude for the worses cases. However, the rest of the logic is noisy enough that it hardly matters for now.
Fri, 05 Apr 2024 11:33:47 +0200 phases: use revision number in new_heads
Pierre-Yves David <pierre-yves.david@octobus.net> [Fri, 05 Apr 2024 11:33:47 +0200] rev 51605
phases: use revision number in new_heads All graph operations will be done using revision numbers, so passing nodes only means they will eventually get converted to revision numbers internally. As part of an effort to align the code on using revision number we make the `phases.newheads` function operated on revision number, taking them as input and using them in returns, instead of the node-id it used to consume and produce. This is part of multiple changesets effort to translate more part of the logic, but is done step by step to facilitate the identification of issue that might arise in mercurial core and extensions. To make the change simpler to handle for third party extensions, we also rename the function, using a more modern form. This will help detecting the different between the node-id version and the rev-num version. I also take this as an opportunity to add some comment about possible performance improvement for the future. They don't matter too much now, but they are worse exploring in a while.
Mon, 08 Apr 2024 15:11:49 +0200 phases: convert remote phase root to node while reading them
Pierre-Yves David <pierre-yves.david@octobus.net> [Mon, 08 Apr 2024 15:11:49 +0200] rev 51604
phases: convert remote phase root to node while reading them This is currently a bit silly as we will convert them back to node right after, but that is an intermediate step before doing more disruptive changes.
Fri, 05 Apr 2024 11:17:25 +0200 phases: more compact error handling in analyzeremotephases
Pierre-Yves David <pierre-yves.david@octobus.net> [Fri, 05 Apr 2024 11:17:25 +0200] rev 51603
phases: more compact error handling in analyzeremotephases using an intermediate variable result in more readable code, so let us use it.
Tue, 09 Apr 2024 02:54:12 +0200 push: rework the computation of fallbackheads to be correct
Pierre-Yves David <pierre-yves.david@octobus.net> [Tue, 09 Apr 2024 02:54:12 +0200] rev 51602
push: rework the computation of fallbackheads to be correct The previous computation tried to be smart but ended up being wrong. This was caught by phase movement test while reworking the phase discovery logic to be faster. The previous logic was failing to catch case where the pushed set was not based on a common heads (i.e. when the discovery seemed to have "over discovered" content, outside the pushed set) In the following graph, `e` is a common head and we `hg push -r f`. We need to detect `c` as a fallback heads and we previous failed to do so:: e | d f |/ c | b | a The performance impact of the change seems minimal. On the most impacted repository at hand (mozilla-try), the slowdown seems mostly mixed in the overall noise `hg push` but seems to be in the hundred of milliseconds order of magnitude. When using rust, we seems to be a bit faster, probably because we leverage more accelaratd internals. I added a couple of performance related common for further investigation later on.
Fri, 05 Apr 2024 11:05:54 +0200 revset: stop serializing node when using "%ln"
Pierre-Yves David <pierre-yves.david@octobus.net> [Fri, 05 Apr 2024 11:05:54 +0200] rev 51601
revset: stop serializing node when using "%ln" Turning hundred of thousand of node from node to hex and back can be slow… what about we stop doing it? In many case were we are using node id we should be using revision id. However this is not a good reason to have a stupidly slow implementation of "%ln". This caught my attention again because the phase discovery during push make an extensive use of "%ln" or huge set. In absolute, that phase discovery probably should use "%ld" and need to improves its algorithmic complexity, but improving "%ln" seems simple and long overdue. This greatly speeds up `hg push` on repository with many drafts. Here are some relevant poulpe benchmarks: ### data-env-vars.name = mozilla-try-2023-03-22-zstd-sparse-revlog # benchmark.name = hg.command.push # bin-env-vars.hg.flavor = default # bin-env-vars.hg.py-re2-module = default # benchmark.variants.explicit-rev = all-out-heads # benchmark.variants.issue6528 = disabled # benchmark.variants.protocol = ssh # benchmark.variants.reuse-external-delta-parent = default ## benchmark.variants.revs = any-1-extra-rev before: 44.235070 after: 20.416329 (-53.85%, -23.82) ## benchmark.variants.revs = any-100-extra-rev before: 49.234697 after: 26.519829 (-46.14%, -22.71) ### benchmark.name = hg.command.bundle # bin-env-vars.hg.flavor = default # bin-env-vars.hg.py-re2-module = default # benchmark.variants.revs = all # benchmark.variants.type = none-streamv2 ## data-env-vars.name = heptapod-public-2024-03-25-zstd-sparse-revlog before: 10.138396 after: 7.750458 (-23.55%, -2.39) ## data-env-vars.name = mercurial-public-2024-03-22-zstd-sparse-revlog before: 1.263859 after: 0.700229 (-44.60%, -0.56) ## data-env-vars.name = mozilla-try-2023-03-22-zstd-sparse-revlog before: 399.484481 after: 346.5089 (-13.26%, -52.98) ## data-env-vars.name = pypy-2024-03-22-zstd-sparse-revlog before: 4.540080 after: 3.401700 (-25.07%, -1.14) ## data-env-vars.name = tryton-public-2024-03-22-zstd-sparse-revlog before: 2.975765 after: 1.870798 (-37.13%, -1.10)
Tue, 09 Apr 2024 14:41:48 +0200 bundlespec: drop unused _bundlespecvariants dictionary
Pierre-Yves David <pierre-yves.david@octobus.net> [Tue, 09 Apr 2024 14:41:48 +0200] rev 51600
bundlespec: drop unused _bundlespecvariants dictionary Why do we have a `_bundlespecvariants`?
Tue, 09 Apr 2024 14:37:24 +0200 bundlespec: type the _bundlespeccontentopts dictionary
Pierre-Yves David <pierre-yves.david@octobus.net> [Tue, 09 Apr 2024 14:37:24 +0200] rev 51599
bundlespec: type the _bundlespeccontentopts dictionary If only we had a tool to detect the kind of stupid error we just fixed… ho wait.
Tue, 09 Apr 2024 14:36:01 +0200 bundlespec: fix the "streamv2" and "streamv3-exp" variant
Pierre-Yves David <pierre-yves.david@octobus.net> [Tue, 09 Apr 2024 14:36:01 +0200] rev 51598
bundlespec: fix the "streamv2" and "streamv3-exp" variant In c4aab3661f25, we broken this feature by adding unicode instead of bytes to the dictionary. On the other hand, this feature was never tested, so augment the tests to tests this.
Wed, 03 Apr 2024 15:33:25 +0200 perf: create the temporary target next to the source in stream-consume
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 03 Apr 2024 15:33:25 +0200] rev 51597
perf: create the temporary target next to the source in stream-consume See inline comment for rational.
Tue, 02 Apr 2024 21:53:17 +0200 bundlespec: rationalize the way we specify stream bundle version
Pierre-Yves David <pierre-yves.david@octobus.net> [Tue, 02 Apr 2024 21:53:17 +0200] rev 51596
bundlespec: rationalize the way we specify stream bundle version Instead of having weird dedicated option for each version (v2, v3, etc) we reuse the same "stream" parameters. This is consistent with the ability to request a stream clone using "none-v2;stream=v2". This changeset introduce no user visible change, this is pure internal cleaning.
Tue, 02 Apr 2024 17:02:39 +0200 bundle: do no check the changegroup version if no changegroup is included
Pierre-Yves David <pierre-yves.david@octobus.net> [Tue, 02 Apr 2024 17:02:39 +0200] rev 51595
bundle: do no check the changegroup version if no changegroup is included We don't need to check the compatibility of something we will not use. In practice this was getting in the was of `streamv2` bundles on a narrow repository as the 'cg.version=02' value was rejected by this checks.
Wed, 27 Mar 2024 18:51:33 +0000 perf-stream-consume: use the source repository config when applying
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 27 Mar 2024 18:51:33 +0000] rev 51594
perf-stream-consume: use the source repository config when applying This might contains critical configuration for the benchmark, like enabling of extensions like narrow.
Wed, 27 Mar 2024 17:46:23 +0000 unbundle: move most of the logic on cmdutil to help debug::unbundle reuse
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 27 Mar 2024 17:46:23 +0000] rev 51593
unbundle: move most of the logic on cmdutil to help debug::unbundle reuse This make sure `hg debug::unbundle` focus on the core logic.
Wed, 27 Mar 2024 17:29:48 +0000 postincoming: move to cmdutil
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 27 Mar 2024 17:29:48 +0000] rev 51592
postincoming: move to cmdutil This looks like a good place for it to live.
Wed, 27 Mar 2024 17:21:46 +0000 postincoming: avoid computing branchhead if no report will be posted
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 27 Mar 2024 17:21:46 +0000] rev 51591
postincoming: avoid computing branchhead if no report will be posted This otherwise defeat some of the branch v3 optimization.
Tue, 26 Mar 2024 13:46:44 +0000 streamclone: stop listing files for entries that have no volatile files
Pierre-Yves David <pierre-yves.david@octobus.net> [Tue, 26 Mar 2024 13:46:44 +0000] rev 51590
streamclone: stop listing files for entries that have no volatile files This will save a lot of python related time. This significantly boost performance. The following number comes from a large private repository using perf::stream-locked-section: base-line: 35.04 seconds prev-change: 24.51 seconds (-30%) prev-change: 20.88 seconds (-40%) prev-change: 14.22 seconds (-60%) this-change: 11.58 seconds (-67% from baseline; -18% from prev)
Tue, 26 Mar 2024 13:34:05 +0000 stream-clone: disable gc for the initial section for the v3 format
Pierre-Yves David <pierre-yves.david@octobus.net> [Tue, 26 Mar 2024 13:34:05 +0000] rev 51589
stream-clone: disable gc for the initial section for the v3 format The number of small container created turn Python in a gc-frenzy that seriously impact performance. This significantly boost performance. The following number comes from a large private repository using perf::stream-locked-section: base-line: 35.04 seconds prev-change: 24.51 seconds (-30%) prev-change: 20.88 seconds (-40%) this-change: 14.22 seconds (-60% from baseline; -31% from prev)
Tue, 26 Mar 2024 13:32:46 +0000 stream-clone: disable gc for `_entries_walk` duration
Pierre-Yves David <pierre-yves.david@octobus.net> [Tue, 26 Mar 2024 13:32:46 +0000] rev 51588
stream-clone: disable gc for `_entries_walk` duration The number of small container created turn Python in a gc-frenzy that seriously impact performance. This significantly boost performance. The following number comes from a large private repository using perf::stream-locked-section: base-line: 35.04 seconds prev-change: 24.51 seconds (-30%) this-change: 20.88 seconds (-40% from baseline; -15% from previous changes)
(0) -30000 -10000 -3000 -1000 -300 -100 -50 -24 +24 +50 tip