Matt Harbison <matt_harbison@yahoo.com> [Fri, 04 Oct 2024 23:21:41 -0400] rev 51938
interfaces: introduce and use a protocol class for the `base85` module
See
f2832de2a46c for details when this was done for the `bdiff` module.
It looks like PEP-688 removed the special casing of `bytes` being a standin
for any type of `ByteString`, and defines a `typing.Buffer` class (with a
backport in `typing_extensions` for Python prior to 3.12). There's been a lot
of churn in this area with pytype, but recent versions of pytype and PyCharm
recognize this, and e.g. have `mercurial.node.hex()` defined as:
from typing_extensions import Buffer
def hex(data: Buffer, sep: str | bytes = ..., bytes_per_sep: int = ...) -> bytes
This covers `bytes`, `bytearray`, and `memoryview` by default. Both of the C
functions here use `y#` to parse the arguments, which means the arg is a
byte-like object[2], so the args would appear to be better typed as `Buffer`.
However, pytype has a bug that prevents using this from `typing_extensions`[3],
and mypy complained `Unsupported left operand type for + ("memoryview")` in the
pure module on line 37 (meaning it's only a subset of `Buffer`). So hold off on
changing any of that for now.
[1] https://peps.python.org/pep-0688/#no-special-meaning-for-bytes
[2] https://docs.python.org/3/glossary.html#term-bytes-like-object
[3] https://github.com/google/pytype/issues/1772
Matt Harbison <matt_harbison@yahoo.com> [Fri, 04 Oct 2024 23:09:56 -0400] rev 51937
base85: avoid a spurious use-before-initialized warning in `pure` module
The error wasn't possible because the only way for `acc` to not be initialized
was if `len(text) == 0`. But then `0 % 5 == 0`, so no attempt at padding was
done. It's a simple enough fix to not have PyCharm flag this though. The value
needs to be reset on each loop iteration, so it's a line copy, not a line move.
Matt Harbison <matt_harbison@yahoo.com> [Mon, 30 Sep 2024 19:40:14 -0400] rev 51936
typing: add type annotations to `mercurial/mdiff.py`
We'll leave converting `diffopts` to `attrs` as another project.
Matt Harbison <matt_harbison@yahoo.com> [Mon, 30 Sep 2024 23:50:40 -0400] rev 51935
mdiff: convert a few block definitions from lists to tuples
These were flagged by adding type hints. Some places were using a tuple of 4
ints to define a block, and others were using a list of 4. A tuple is better
for typing, because we can define the length and the type of each entry. One of
the places had to redefine the tuple, since writing to a tuple at an index isn't
supported.
This change spills out into the tests, and archeology says it was added to the
repo in this state. There was no reason given for the divergence, and I suspect
it wasn't intentional.
It looks like `splitblock()` is completely unused in the codebase.
Matt Harbison <matt_harbison@yahoo.com> [Sun, 29 Sep 2024 02:03:20 -0400] rev 51934
interfaces: add the optional `bdiff.xdiffblocks()` method
PyCharm flagged where this was called on the protocol class in `mdiff.py` in the
previous commit, but pytype completely missed it. PyCharm is correct here, but
I'm committing this separately to highlight this potential problem- some of the
implementations don't implement _all_ of the methods the others do, and there's
not a great way to indicate on a protocol class that a method or attribute is
optional- that's kinda the opposite of what static typing is about.
Making the method an `Optional[Callable]` attribute works here, and keeps both
PyCharm and pytype happy, and the generated `mdiff.pyi` and `modules.pyi` look
reasonable. We might be getting a little lucky, because the method isn't
invoked directly- it is returned from another method that selects which block
function to use. Except since it is declared on the protocol class, every
module needs this attribute (in theory, but in practice this doesn't seem to be
checked), so the check for it on the module has to change from `hasattr()` to
`getattr(..., None)`. We defer defining the optional attrs to the type checking
phase as an extra precaution- that way it isn't an attr with a `None` value at
runtime if someone is still using `hasattr()`.
As to why pytype missed this, I have no clue. The generated `mdiff.pyi` even
has the global variable typed as `bdiff: intmod.BDiff`, so uses of it really
should comply with what is on the class, protocol class or not.
Matt Harbison <matt_harbison@yahoo.com> [Sat, 28 Sep 2024 19:12:18 -0400] rev 51933
interfaces: introduce and use a protocol class for the `bdiff` module
This is allowed by PEP 544[1], and we basically follow the example there. The
class here is copied from `mercurial.pure.bdiff`, and the implementation
removed.
There are several modules that have a few different implementations, and the
implementation chosen is controlled by `HGMODULEPOLICY`. The module is loaded
via `mercurial/policy.py`, and has been inferred by pytype as `Any` up to this
point. Therefore it and PyCharm were blind to all functions on the module, and
their signatures. Also, having multiple instances of the same module allows
their signatures to get out of sync.
Introducing a protocol class allows the loaded module that is stored in a
variable to be given type info, which cascades through the various places it is
used. This change alters 11 *.pyi files, for example. In theory, this would
also allow us to ensure the various implementations of the same module are kept
in alignment- simply import the module in a test module, attempt to pass it to a
function that uses the corresponding protocol as an argument, and run pytype on
it.
In practice, this doesn't work (yet). PyCharm (erroneously) flags imported
modules being passed where a protocol class is used[2]. Pytype has problems the
other way- it fails to detect when a module that doesn't adhere to the protocol
is passed to a protocol argument. The good news is that mypy properly detects
this case. The bad news is that mypy spews a bunch of other errors when
importing even simple modules, like the various `bdiff` modules. Therefore I'm
punting on the tests for now because the type info around a loaded module in
PyCharm is a clear win by itself.
[1] https://peps.python.org/pep-0544/#modules-as-implementations-of-protocols
[2] https://youtrack.jetbrains.com/issue/PY-58679/Support-modules-implementing-protocols
Matt Harbison <matt_harbison@yahoo.com> [Sat, 28 Sep 2024 19:11:39 -0400] rev 51932
mdiff: tweak calls into `bdiff.fixws` to match its type hints
It turns out that protocol classes can be used for modules too, which is great
because all of the dynamically loaded modules (and their attributes) are
currently inferred as `Any`. See the next commit for details.
A protocol class for the `bdiff` module detected this (trivial) mismatch, so
correct it first. The various implementations of this method are typed as
taking a `bool`. The `cext` implementation parses its arguments with
`PyArg_ParseTuple(args, "Sb:fixws", &s, &allws)`, which wants an `int`. But
experimenting in `hg debugshell` under py38, passing `True` or `False` to
`cext.fixws()` also works. We can change the implementation to use "p" (which
was introduced in py33) instead of "b", but that's beyond the scope of this.
Matt Harbison <matt_harbison@yahoo.com> [Tue, 01 Oct 2024 15:04:06 -0400] rev 51931
util: minor copy editing of the documentation for `mmapread()`
Matt Harbison <matt_harbison@yahoo.com> [Tue, 01 Oct 2024 15:00:39 -0400] rev 51930
util: make `mmapread()` work on Windows again
522b4d729e89 started referencing `mmap.MAP_PRIVATE`, but that's not available on
Windows, so `hg version` worked, but `make local` did not. That commit also
started calling the constructor with the fine-grained `flags` and `prot` args,
but those aren't available on Windows either[1] (though the backing C code
doesn't seem conditionalized to disallow usage of them).
I assume the change away from from the `access` arg was to provide the same
options, plus `MAP_POPULATE`. Looking at the source code[2], they're not quite
the same- `ACCESS_READ` is equivalent to `flags = MAP_SHARED` and `prot = PROT_READ`.
`MAP_PRIVATE` is only used with `ACCESS_COPY`, which allows read and write.
Therefore, we can't quite get the same baseline flags on Windows, but this was
the status quo ante and `MAP_POPULATE` is a Linux thing, so presumably it works.
I realize that typically the OS differences are abstracted into the platform
modules, but I'm leaving it here so that it is obvious what the differences are
between the platforms.
[1] https://docs.python.org/3/library/mmap.html#mmap.mmap
[2] https://github.com/python/cpython/blob/
5e0abb47886bc665eefdcc19fde985f803e49d4c/Modules/mmapmodule.c#L1539
Matt Harbison <matt_harbison@yahoo.com> [Fri, 27 Sep 2024 12:30:37 -0400] rev 51929
typing: add type annotations to the dirstate classes
The basic procedure here was to use `merge-pyi` to merge the `git/dirstate.pyi`
file in (after renaming the interface class to match), cleaning up the import
statement mess, and then repeating the procedure for `mercurial/dirstate.pyi`.
Surprisingly, git's dirstate had more hints inferred in its *.pyi file.
After that, it was a manual examination of each method in the interface, and how
they were implemented in the core and git classes to verify what was inferred by
pytype, and fill in the missing gaps. Since this involved jumping around
between three different files, I applied the same type info to all three at the
same time. Complex types I rolled up into type aliases in the interface module,
and used that as needed. That way if it changes, there's one place to edit.
There are some hints still missing, and some documentation that doesn't match
the signatures. They should all be marked with TODOs. There are also a bunch
of methods on the core class that aren't on the Protocol class that seem like
maybe they should be (like `set_tracked()`). There are even more methods
missing from the git class. But that's a project for another time.