changeset 31400:fb1f70331ee6

pycompat: custom implementation of urllib.parse.quote() urllib.parse.quote() accepts either str or bytes and returns str. There exists a urllib.parse.quote_from_bytes() which only accepts bytes. We should probably use that to retain strong typing and avoid surprises. In addition, since nearly all strings in Mercurial are bytes, we probably don't want quote() returning unicode. So, this patch implements a custom quote() that only accepts bytes and returns bytes. The quoted URL should only contain URL safe characters which is a strict subset of ASCII. So `.encode('ascii', 'strict')` should be safe.
author Gregory Szorc <gregory.szorc@gmail.com>
date Mon, 13 Mar 2017 12:16:47 -0700
parents 1ed169c5e235
children ed23f929af38
files mercurial/pycompat.py
diffstat 1 files changed, 9 insertions(+), 1 deletions(-) [+]
line wrap: on
line diff
--- a/mercurial/pycompat.py	Mon Mar 13 12:14:17 2017 -0700
+++ b/mercurial/pycompat.py	Mon Mar 13 12:16:47 2017 -0700
@@ -269,7 +269,6 @@
 else:
     import urllib.parse
     urlreq._registeraliases(urllib.parse, (
-        "quote",
         "splitattr",
         "splitpasswd",
         "splitport",
@@ -313,3 +312,12 @@
         "SimpleHTTPRequestHandler",
         "CGIHTTPRequestHandler",
     ))
+
+    # urllib.parse.quote() accepts both str and bytes, decodes bytes
+    # (if necessary), and returns str. This is wonky. We provide a custom
+    # implementation that only accepts bytes and emits bytes.
+    def quote(s, safe=r'/'):
+        s = urllib.parse.quote_from_bytes(s, safe=safe)
+        return s.encode('ascii', 'strict')
+
+    urlreq.quote = quote