Manuel Jacob <me@manueljacob.de> [Tue, 30 Jun 2020 07:23:29 +0200] rev 45027
convert: handle percent-encoded bytes in file URLs like Subversion
75b59d221aa3 added most of the code that gets removed by this patch. It helped
making progress on Python 3, but the reasoning was wrong in many ways. I tried
to retract it while it was queued, but it was too late.
Back then, I was asssuming that what happened on Python 2 (preserving bytes) is
correct and my Python 3 change is a hack. However it turned out that Subversion
interprets percent-encoded bytes as UTF-8. Accepting the same format as
Subversion is a good idea.
Consistency with urlreq.pathname2url() (as described in the removed comment)
doesn’t matter because that function is only used for passing paths to urllib.
This is not a backwards-incompatible change because before
5c0d5b48e58c,
non-ASCII filenames didn’t work at all on Python 2.
When the locale encoding is ISO-8859-15, `svn` accepts `file:///tmp/a%E2%82%AC`
for `/tmp/a€`. Before this patch, this was the case for this extension on
Python 3, but not on Python 2. This patch makes it work like with `svn` on both
Python 2 and Python 3.
Manuel Jacob <me@manueljacob.de> [Tue, 30 Jun 2020 16:39:45 +0200] rev 45026
convert: add docstring on convert.subversion.geturl()
The function is unusual for a bytes-handling function in Mercurial because it
can’t handle arbitrary bytes. Therefore we should document this fact.
Pointed out by Yuya Nishihara while reviewing
e3b19004087a.
Joerg Sonnenberger <joerg@bec.de> [Thu, 18 Jun 2020 15:13:38 +0200] rev 45025
ui: add option to timestamp status and diagnostic messages
Differential Revision: https://phab.mercurial-scm.org/D8640
Manuel Jacob <me@manueljacob.de> [Tue, 30 Jun 2020 01:32:17 +0200] rev 45024
tests: use path inside test dir
This will make the diff for the next patch less noisy.
Manuel Jacob <me@manueljacob.de> [Tue, 30 Jun 2020 05:30:47 +0200] rev 45023
convert: convert URLs to UTF-8 for Subversion
Preamble: for comprehension, note that the `path` of geturl() would better be
called `path_or_url` (the argument of the call of getsvn() is called `url`).
For HTTP(S) URLs, the changes don’t make a difference, as they are restricted to
ASCII.
For file URLs, the reasoning is the same as for paths: we have to roundtrip with
what Subversion is doing.
When the locale encoding is ISO-8859-15, trying to convert a SVN repo
`file:///tmp/a€` failed before like this:
file:///tmp/a%A4 does not look like a Subversion repository to libsvn version 1.14.0
Decoding the path using the locale encoding can fail. In this case, we have to
bail out, as Subversion won’t be able to do anything useful with the path.
Manuel Jacob <me@manueljacob.de> [Mon, 29 Jun 2020 15:03:36 +0200] rev 45022
convert: correctly convert paths to UTF-8 for Subversion
The previous code using encoding.tolocal() only worked by chance in these
situations:
* The string is ASCII: The fast path was triggered and the string was returned
unmodified.
* The local encoding is UTF-8: The source and target encoding is the same.
* The string is not valid UTF-8 and the native encoding is ISO-8859-1: If the
string doesn’t decode using UTF-8, ISO-8859-1 is tried as a fallback. During
`hg convert`, the local encoding is always UTF-8. The irony is that in this
case, encoding.tolocal() behaves like what someone would expect the reverse
function, encoding.fromlocal(), to do.
When the locale encoding is ISO-8859-15, trying to convert a SVN repo `/tmp/a€`
failed before like this:
file:///tmp/a%C2%A4 does not look like a Subversion repository to libsvn version 1.14.0
The correct URL is `file:///tmp/a%E2%82%AC`.
Unlike previously (with the ISO-8859-1 fallback), decoding the path using the
locale encoding can fail. In this case, we have to bail out, as Subversion
won’t be able to do anything useful with the path.
Manuel Jacob <me@manueljacob.de> [Tue, 30 Jun 2020 05:04:36 +0200] rev 45021
py3: pass URL as str
Before the patch, HTTP(S) URLs were never recognized as a Subversion repository
on Python 3.
Manuel Jacob <me@manueljacob.de> [Tue, 30 Jun 2020 04:55:52 +0200] rev 45020
convert: bail out in Subversion source if encountering non-ASCII HTTP(S) URL
Before this patch, in the tested case, urllib raised `httplib.InvalidURL: URL
can't contain control characters. '/\xff/!svn/ver/0/.svn' (found at least
'\xff')`, which resulted in that the URL was never recognized as a Subversion
repository.
This patch adds a check that bails out if the URL contains non-ASCII characters.
The warning is not overly user-friendly, but giving the user something to type
into a search engine is definitively better than not explaining why the
repository was not recognized.
We could support non-ASCII chracters by quoting them before passing them to
urllib. However, we would want to be compatible with what the `svn` command
does, which converts the URL from the locale encoding to UTF-8, percent-encodes
it and sends it to the server. If the locale encoding is not UTF-8, the
behavior is IMHO not very intuitive, as the `svn` command may send different
(percent-encoded) octets than what was passed on the console. Instead of
copying this behavior, we better leave it forbidden.