Sat, 01 Apr 2023 05:57:09 +0200 match: sort patterns before compiling them into a regex stable
Pierre-Yves David <pierre-yves.david@octobus.net> [Sat, 01 Apr 2023 05:57:09 +0200] rev 50337
match: sort patterns before compiling them into a regex While investigating cripping performance for `hg cat` in some context, I discovered that, for large inputs, building a regex from out of order patterns result may result in a *much* slower regex and a much slower associated matcher's performance. So we are now sorting the patterns to help the regex engine. There is more to the story as we rely on regexp more than we should. See the next changeset for details. Benchmarks ========== In the following benchmark we are comparing the `hg cat` and `hg files` run time when matching against the full list of files in the repository. They are run: - without the rust extensions - with the standard python enfine (so without re2) sort vs non-sorted - Before this changeset (3f5137543773) --------------------------------------------------------- ###### hg files ############################################################### ### mercurial-2018-08-01-zstd-sparse-revlog sorted: 0.230092 seconds shuffled: 0.234235 seconds (+1.80%) ### pypy-2018-08-01-zstd-sparse-revlog sorted: 0.613567 seconds shuffled: 0.801880 seconds (+30.69%) ### mozilla-central-2018-08-01-zstd-sparse-revlog sorted: 62.474221 seconds shuffled: 1364.180218 seconds (+2083.59%) ### netbeans-2018-08-01-zstd-sparse-revlog sorted: 21.541828 seconds shuffled: 172.759857 seconds (+701.97%) ###### hg cat ################################################################# ### mercurial-2018-08-01-zstd-sparse-revlog sorted: 0.764407 seconds shuffled: 0.768924 seconds ### pypy-2018-08-01-zstd-sparse-revlog sorted: 2.065220 seconds shuffled: 2.276388 seconds (+10.22%) ### netbeans-2018-08-01-zstd-sparse-revlog sorted: 40.967983 seconds shuffled: 216.388709 seconds (+428.19%) ### mozilla-central-2018-08-01-zstd-sparse-revlog sorted: 105.228510 seconds shuffled: 1448.722784 seconds (+1276.74%) sort vs non-sorted - With this changeset ---------------------------------------- ###### hg files ############################################################### ### mercurial-2018-08-01-zstd-sparse-revlog all-list-pattern-sorted: 0.230069 all-list-pattern-shuffled: 0.231165 ### pypy-2018-08-01-zstd-sparse-revlog all-list-pattern-sorted: 0.616799 all-list-pattern-shuffled: 0.616393 ### netbeans-2018-08-01-zstd-sparse-revlog all-list-pattern-sorted: 21.586773 all-list-pattern-shuffled: 21.908197 ### mozilla-central-2018-08-01-zstd-sparse-revlog all-list-pattern-sorted: 61.279490 all-list-pattern-shuffled: 62.473549 ###### hg cat ################################################################# ### mercurial-2018-08-01-zstd-sparse-revlog sorted: 0.763883 seconds shuffled: 0.765848 seconds ### pypy-2018-08-01-zstd-sparse-revlog sorted: 2.070498 seconds shuffled: 2.069197 seconds ### netbeans-2018-08-01-zstd-sparse-revlog sorted: 41.392423 seconds shuffled: 41.648689 seconds ### mozilla-central-2018-08-01-zstd-sparse-revlog sorted: 103.315670 seconds shuffled: 104.369358 seconds
Mon, 27 Mar 2023 17:30:14 -0400 chg: populate CHGHG if not set stable
Arun Kulshreshtha <akulshreshtha@janestreet.com> [Mon, 27 Mar 2023 17:30:14 -0400] rev 50336
chg: populate CHGHG if not set Normally, chg determines which `hg` executable to use by first consulting the `$CHGHG` and `$HG` environment variables, and if neither are present defaults to the `hg` found in the user's `$PATH`. If built with the `HGPATHREL` compiler flag, chg will instead assume that there exists an `hg` executable in the same directory as the `chg` binary and attempt to use that. This can cause problems in situations where there are multiple actively-used Mercurial installations on the same system. When a `chg` client connects to a running command server, the server process performs some basic validation to determine whether a new command server needs to be spawned. These checks include things like checking certain "sensitive" environment variables and config sections, as well as checking whether the mtime of the extensions, hg's `__version__.py` module, and the Python interpreter have changed. Crucially, the command server doesn't explicitly check whether the executable it is running from matches the executable that the `chg` client would have otherwise invoked had there been no existing command server process. Without `HGPATHREL`, this still gets implicitly checked during the validation step, because the only way to specify an alternate hg executable (apart from `$PATH`) is via the `$CHGHG` and `$HG` environment variables, both of which are checked. With `HGPATHREL`, however, the command server has no way of knowing which hg executable the client would have run. This means that a client located at `/version_B/bin/chg` will happily connect to a command server running `/version_A/bin/hg` instead of `/version_B/bin/hg` as expected. A simple solution is to have the client set `$CHGHG` itself, which then allows the command server's environment validation to work as intended. I have tested this manually using two locally built hg installations and it seems to work with no ill effects. That said, I'm not sure how to write an automated test for this since the `chg` available to the tests isn't even built with the `HGPATHREL` compiler flag to begin with.
(0) -30000 -10000 -3000 -1000 -300 -100 -30 -10 -2 +2 +10 +30 +100 +300 +1000 tip