convert: convert URLs to UTF-8 for Subversion
Preamble: for comprehension, note that the `path` of geturl() would better be
called `path_or_url` (the argument of the call of getsvn() is called `url`).
For HTTP(S) URLs, the changes don’t make a difference, as they are restricted to
ASCII.
For file URLs, the reasoning is the same as for paths: we have to roundtrip with
what Subversion is doing.
When the locale encoding is ISO-8859-15, trying to convert a SVN repo
`file:///tmp/a€` failed before like this:
file:///tmp/a%A4 does not look like a Subversion repository to libsvn version 1.14.0
Decoding the path using the locale encoding can fail. In this case, we have to
bail out, as Subversion won’t be able to do anything useful with the path.
--- a/hgext/convert/subversion.py Mon Jun 29 15:03:36 2020 +0200
+++ b/hgext/convert/subversion.py Tue Jun 30 05:30:47 2020 +0200
@@ -65,10 +65,10 @@
svn = None
-# In Subversion, paths are Unicode (encoded as UTF-8), which Subversion
-# converts from / to native strings when interfacing with the OS. When passing
-# paths to Subversion, we have to recode them such that it roundstrips with
-# what Subversion is doing.
+# In Subversion, paths and URLs are Unicode (encoded as UTF-8), which
+# Subversion converts from / to native strings when interfacing with the OS.
+# When passing paths and URLs to Subversion, we have to recode them such that
+# it roundstrips with what Subversion is doing.
fsencoding = None
@@ -141,7 +141,9 @@
def geturl(path):
try:
- return svn.client.url_from_path(svn.core.svn_path_canonicalize(path))
+ return svn.client.url_from_path(
+ svn.core.svn_path_canonicalize(fs2svn(path))
+ )
except svn.core.SubversionException:
# svn.client.url_from_path() fails with local repositories
pass
@@ -358,6 +360,19 @@
and path[2:6].lower() == b'%3a/'
):
path = path[:2] + b':/' + path[6:]
+ try:
+ path.decode(fsencoding)
+ except UnicodeDecodeError:
+ ui.warn(
+ _(
+ b'Subversion requires that file URLs can be converted '
+ b'to Unicode using the current locale encoding (%s)\n'
+ )
+ % pycompat.sysbytes(fsencoding)
+ )
+ return False
+ # FIXME: The following reasoning and logic is wrong and will be
+ # fixed in a following changeset.
# pycompat.fsdecode() / pycompat.fsencode() are used so that bytes
# in the URL roundtrip correctly on Unix. urlreq.url2pathname() on
# py3 will decode percent-encoded bytes using the utf-8 encoding
--- a/tests/test-convert-svn-encoding.t Mon Jun 29 15:03:36 2020 +0200
+++ b/tests/test-convert-svn-encoding.t Tue Jun 30 05:30:47 2020 +0200
@@ -182,6 +182,20 @@
cannot find required "p4" tool
abort: \xff: missing or unsupported repository (glob) (esc)
[255]
+ $ hg convert file://$TESTTMP/$XFF test
+ initializing destination test repository
+ Subversion requires that file URLs can be converted to Unicode using the current locale encoding (ascii)
+ file:/*/$TESTTMP/\xff does not look like a CVS checkout (glob) (esc)
+ $TESTTMP/file:$TESTTMP/\xff does not look like a Git repository (esc)
+ file:/*/$TESTTMP/\xff does not look like a Subversion repository (glob) (esc)
+ file:/*/$TESTTMP/\xff is not a local Mercurial repository (glob) (esc)
+ file:/*/$TESTTMP/\xff does not look like a darcs repository (glob) (esc)
+ file:/*/$TESTTMP/\xff does not look like a monotone repository (glob) (esc)
+ file:/*/$TESTTMP/\xff does not look like a GNU Arch repository (glob) (esc)
+ file:/*/$TESTTMP/\xff does not look like a Bazaar repository (glob) (esc)
+ file:/*/$TESTTMP/\xff does not look like a P4 repository (glob) (esc)
+ abort: file:/*/$TESTTMP/\xff: missing or unsupported repository (glob) (esc)
+ [255]
#if py3
For now, on Python 3, we abort when encountering non-UTF-8 percent-encoded