Matt Harbison <matt_harbison@yahoo.com> [Thu, 26 Sep 2024 18:04:31 -0400] rev 51921
interfaces: convert the dirstate zope interface to a Protocol class
This is a small trial run for converting the repository interfaces enmasse, in
the same series of steps. I'm not sure that this current code is valid (it has
zope attribute fields, and it's missing all of the `self` args on its functions,
but that was the previous state of things, and made PyCharm really unhappy).
But it will be easier to review the repository interface changes if this change
is separate from adding `self` and dropping the zope attributes all over.
Having an empty constructor in a protocol is weird. I'm not sure if these args
should be converted to fields that all subclasses would have, and comments
around existing attributes say some should be going away. Comment it out for
now so that it's not in the way, but also not forgotten.
Matt Harbison <matt_harbison@yahoo.com> [Thu, 26 Sep 2024 17:47:39 -0400] rev 51920
tests: disable `test-check-interfaces.py` while converting to protocols
The goal is to convert everything, so get it all out of the way. The interfaces
don't get that much maintenance that this needs to be tested right now.
Arseniy Alekseyev <aalekseyev@janestreet.com> [Wed, 02 Oct 2024 14:51:56 +0100] rev 51919
tests: always access the mercurial repo through `helpers-testrepo.sh`
In some contexts the mercurial repo needs to be accessed through system hg.
That's what `helpers-testrepo.sh` enforces, but some tests incorrectly
use the mercurial repo without going through that script.
This patch fixes those tests.
Arseniy Alekseyev <aalekseyev@janestreet.com> [Wed, 02 Oct 2024 14:49:07 +0100] rev 51918
tests: in helpers-testrepo.sh switch from shell aliases to functions
The reason is that I want this script to work in non-interactive shells too.
(will be used in the next commit)
Arseniy Alekseyev <aalekseyev@janestreet.com> [Fri, 27 Sep 2024 17:25:15 +0100] rev 51917
rust: fix the deprecation warning in NaiveDateTime::from_timestamp
This warning appears between chrono 0.4.34 and 0.4.38, so
isn't affecting the current lock file, but it would come
when we upgraded the version.
Pierre-Yves David <pierre-yves.david@octobus.net> [Fri, 27 Sep 2024 15:53:56 +0200] rev 51916
run-tests: ensure that --no-rust do not use rust
Since having a valid value in HGWITHRUSTEXT is enough to trigger the use of
rust, we need to unset it before install to be sure.
Joerg Sonnenberger <joerg@bec.de> [Sat, 20 Jul 2024 03:04:48 +0200] rev 51915
revlogutils: teach
issue6528 filtering about grandparents
During dynamic filtering, we should assume that the current repository
is correct. Therefore the parents of the delta base can tell us if that
parent has metadata without having to build the whole text.
Joerg Sonnenberger <joerg@bec.de> [Sat, 20 Jul 2024 00:43:08 +0200] rev 51914
revlogutils: remember known metadata parents for
issue6528
In the cases where the parent revs tell us for sure that the parent has
metadata, remember this fact to avoid content recomputations later.
Joerg Sonnenberger <joerg@bec.de> [Sat, 20 Jul 2024 00:44:59 +0200] rev 51913
revlogutils: for
issue6528 fix, pre-cache nullrev as metadata-free
Joerg Sonnenberger <joerg@bec.de> [Sat, 20 Jul 2024 00:59:50 +0200] rev 51912
revlogutils: for
issue6528 fix, cache results for null changes
Joerg Sonnenberger <joerg@bec.de> [Sat, 20 Jul 2024 00:41:37 +0200] rev 51911
revlogutils: fix _chunk() reference
_chunk is only found in the inner revlog object and not directly exposed
outside.
Pierre-Yves David <pierre-yves.david@octobus.net> [Mon, 02 Sep 2024 22:14:38 +0200] rev 51910
rev-branch-cache: reenable memory mapping of the revision data
Now that we are no longer truncating it, we can mmap it again.
This provide a sizeable speedup on repository with a very large amount of
revision for example for a mozilla-try clone with 5 793 383 revisions, this
provide a speedup of 5ms - 10ms. Since they happens within the "critical" locked
path during push. These miliseconds are important.
In addition, the v3 branchmap format is use the rev-branch-cache more than the
v2 branchmap cache so this will be important.
On smaller repository we consistently see an improvement of one or two percents,
but the gain in absolute time is usually < 10 ms.
#### benchmark.name = hg.command.unbundle
# benchmark.variants.
issue6528 = disabled
# benchmark.variants.reuse-external-delta-parent = yes
# benchmark.variants.revs = any-1-extra-rev
# benchmark.variants.source = unbundle
# benchmark.variants.verbosity = quiet
### data-env-vars.name = mozilla-try-2024-03-26-zstd-sparse-revlog
## bin-env-vars.hg.flavor = default
e51161b12c7e: 3.527923
ebdcfe85b070: 3.468178 (-1.69%, -0.06)
## bin-env-vars.hg.flavor = rust
e51161b12c7e: 3.580158
ebdcfe85b070: 3.480564 (-2.78%, -0.10)
### data-env-vars.name = mozilla-try-2024-03-26-ds2-pnm
## bin-env-vars.hg.flavor = rust
e51161b12c7e: 3.527923
ebdcfe85b070: 3.468178 (-1.69%, -0.06)
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 25 Sep 2024 12:42:47 +0200] rev 51909
rev-branch-cache: have debugupdatecache warm rbc too
Since the "v2" format can be more performant than the "v1" format (thanks to
mmap), it is useful to be able to make sure it is present
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 25 Sep 2024 12:49:32 +0200] rev 51908
rev-branch-cache: schedule a write of the "v2" format if we read from "v1"
The new file can be memorymapped, while the old one cannot. So there is value in
having the v2 format around as soon a possible.
Pierre-Yves David <pierre-yves.david@octobus.net> [Tue, 24 Sep 2024 15:44:10 +0200] rev 51907
rev-branch-cache: fallback on "v1" data if no v2 is found
This will help smooth the transition to the v2 format for existing large
repository.
Pierre-Yves David <pierre-yves.david@octobus.net> [Tue, 24 Sep 2024 03:16:35 +0200] rev 51906
rev-branch-cache: increment the version to "v2"
We want to ensure no older clients will truncate the file under us. So we need to
change their name. We don't change the rest of the format (unfortunaly).
Pierre-Yves David <pierre-yves.david@octobus.net> [Tue, 24 Sep 2024 00:16:23 +0200] rev 51905
rev-branch-cache: stop truncating cache file
Truncating the file prevent the safe use of mmap. So instead of overwrite the
existing data. If more than 20% of the file is to be overwritten, we rewrite the
whole file instead.
Such whole rewrite is done by replacing the old one with a new one, so mmap of
the old file would be affected.
This prepare a more aggressive use of mmap in later patches.
Pierre-Yves David <pierre-yves.david@octobus.net> [Tue, 24 Sep 2024 00:16:04 +0200] rev 51904
rev-branch-cache: make sure we close the name file we open
We were various opening without with or try. Adding a try would not hurt.
Pierre-Yves David <pierre-yves.david@octobus.net> [Mon, 23 Sep 2024 23:52:45 +0200] rev 51903
rev-branch-cache: add a way to force rewrite of the cache
This seems useful to be able to do this, for example during strip.
This align with the intended expressed in the `test-branches.t` test. This will
help use being more confident about future changes in the series.
Pierre-Yves David <pierre-yves.david@octobus.net> [Tue, 24 Sep 2024 00:01:30 +0200] rev 51902
rev-branch-cache: issue more truthful "truncating" message
First, don't pretend it truncate to 40 when it actually truncate to 0. Second,
don't pretend to truncate to 0 when the file is already empty/missing.
Pierre-Yves David <pierre-yves.david@octobus.net> [Sun, 22 Sep 2024 15:55:46 +0200] rev 51901
rev-branch-cache: move the code in a dedicated module
The branchmap module is getting huge and the rev branch cache is fully
independent, lets move it elsewhere.
Matt Harbison <matt_harbison@yahoo.com> [Wed, 25 Sep 2024 01:16:47 -0400] rev 51900
statichttprepo: stop shadowing the `bytes` builtin
PyCharm flagged it, but I also misunderstood when looking at the code, because
the name implied a byte string, not a number.
Matt Harbison <matt_harbison@yahoo.com> [Wed, 25 Sep 2024 01:12:39 -0400] rev 51899
statichttprepo: fix `httprangereader.read()` for py3
It looks like there were a bunch of problems, not all of them py3 related:
1) The signature of BinaryIO.read() is -1, not None
2) The `end` variable can't be bytes and interpolate into str with "%s"
3) The `end` variable can't be an int and interpolate into str with "%s"
4) The result slicing could be out of bounds if more is requested than
returned
I guess if somebody would have called `read(-1)` (either directly or because a
wrapper defaults to that), it wouldn't have been handled correctly. The fact
that it is a valid value meaning to read everything requires some additional
changes later in the method around when it slices the byte string that was read,
but that seems to have already been broken.
Matt Harbison <matt_harbison@yahoo.com> [Wed, 25 Sep 2024 00:52:44 -0400] rev 51898
statichttprepo: use a context manager to handle a file descriptor
I'm not sure if this should be reduced to `vfs.exists()`. That would seem to be
equivalent code (since the result of the read is ignored, so we can't tell if
the file actually has content, which has been the state of things going back to
98b6c3dde237), but this is at least safer file descriptor handling.
Matt Harbison <matt_harbison@yahoo.com> [Thu, 26 Sep 2024 02:58:50 +0200] rev 51897
profiling: pass bytes to `_()` and `error.Abort()`
And of course `other_tool_name` is str too, so that needs to be converted. The
type hints from PyCharm say `sys.monitoring.get_tool()` can return None, so
handle that case explicitly before it trips up pytype.
Joerg Sonnenberger <joerg@bec.de> [Mon, 08 Jul 2024 22:46:04 +0200] rev 51896
exchange: improve computation of relevant markers for large repos
Compute the candidate nodes with relevant markers directly
from keys of the predecessors/successors/children dictionaries of
obsstore. This is faster than iterating over all nodes directly.
This test could be further improved for repositories with relative
few markers compared to the repository size, but this is no longer
hot already. With the current loop structure, the obshashrange use
works as well as before as it passes lists with a single node.
Adjust the interface by allowing revision lists as well as node lists.
This helps cases that computes ancestors as it reduces the
materialisation cost. Use this in _pushdiscoveryobsmarker and
_getbundleobsmarkerpart. Improve the latter further by directly using
ancestors().
Performance benchmarks show notable and welcome improvement to no-op push and
pull (that would also apply to other push/pull). This apply to push and pull
done without evolve.
### push/pull Benchmark parameter
# bin-env-vars.hg.flavor = default
# benchmark.variants.explicit-rev = none
# benchmark.variants.protocol = ssh
# benchmark.variants.revs = none
## benchmark.name = hg.command.pull
# data-env-vars.name = mercurial-devel-2024-03-22-zstd-sparse-revlog
before: 5.968537 seconds
after: 5.668507 seconds (-5.03%, -0.30)
# data-env-vars.name = tryton-devel-2024-03-22-zstd-sparse-revlog
before: 1.446232 seconds
after: 0.835553 seconds (-42.23%, -0.61)
# data-env-vars.name = netbsd-src-draft-2024-09-19-zstd-sparse-revlog
before: 5.777412 seconds
after: 2.523454 seconds (-56.32%, -3.25)
## benchmark.name = hg.command.push
# data-env-vars.name = mercurial-devel-2024-03-22-zstd-sparse-revlog
before: 6.155501 seconds
after: 5.885072 seconds (-4.39%, -0.27)
# data-env-vars.name = tryton-devel-2024-03-22-zstd-sparse-revlog
before: 1.491054 seconds
after: 0.934882 seconds (-37.30%, -0.56)
# data-env-vars.name = netbsd-src-draft-2024-09-19-zstd-sparse-revlog
before: 5.902494 seconds
after: 2.957644 seconds (-49.89%, -2.94)
There is not notable different in these result using the "rust" flavor instead
of the "default". The performance impact on the same operation when using
evolve were also tested and no impact was noted.
Matt Harbison <matt_harbison@yahoo.com> [Fri, 20 Sep 2024 21:31:58 -0400] rev 51895
typing: make the localrepo classes known to pytype
9d4ad05bc91c and
1b17309cdaab both mentioned making `bundlerepository` and
`unionrepository` subclass `localrepository` during the type checking phase, but
that didn't apply to pytype in practice. See
bcaa5d408657 and friends for how
the zope interfaces confuse pytype, and end up converting the classes they
decorate into `Any`.
This commit is slightly more complex though, because `localrepository` has mixin
classes applied to it when it is instantiated. Specifically, `RevlogFileStorage`
is added, which adds `def file(f)` (which isn't defined on `localrepository`).
Therefore a list of `localrepository` superclasses is provided during type
checking to account for the mixins. Without this, the `bundlerepository` class
gets flagged when it attempts to call its superclass implementation of `file()`.
Note that pytype doesn't understand these mixin superclasses (it marks the
superclass of `localrepository` as `Any`, because they are zope interfaces it
doesn't understand), but that's enough to get it to not flag `bundlerepository`.
PyCharm also stops flagging it as a missing function, though it seems like it is
able to handle the zope interfaces.
Matt Harbison <matt_harbison@yahoo.com> [Mon, 23 Sep 2024 14:58:37 -0400] rev 51894
typing: add a handful more annotations to `mercurial/vfs.py`
These came out of refactoring into a protocol class, but they can stand on their
own.
The `audit` callback is kinda screwy because the internal lambda and the callable
for `pathutil.pathauditor` have different args and a different return type. It's
conditionalized where it is called, and can be cleaned up later if desired.
Matt Harbison <matt_harbison@yahoo.com> [Sat, 21 Sep 2024 13:53:05 -0400] rev 51893
typing: make `vfs.isfileorlink_checkdir()` path arg required
The only caller to this is `merge._checkunknownfile()`, which supplies a value.
That's good, because `util.localpath()` immediately uses the value to call a
method on it on Windows. The posix implementation returns the value unaltered,
but then `pathutil.finddirs_rev_noroot()` would have exploded.
Matt Harbison <matt_harbison@yahoo.com> [Fri, 20 Sep 2024 20:16:12 -0400] rev 51892
typing: manually add type annotations to `mercurial/vfs.py`
This isn't everything, but hopefully it's close enough to hack on a protocol
class.