Gregory Szorc <gregory.szorc@gmail.com> [Sat, 10 Mar 2018 10:20:51 -0800] rev 36814
hgweb: parse WSGI request into a data structure
Currently, our WSGI applications (hgweb_mod and hgwebdir_mod) process
the raw WSGI request instance themselves. This means they have to
talk in terms of system strings. And they need to know details
about what's in the WSGI request. And in the case of hgweb_mod, it
is doing some very funky things with URL parsing to impact
dispatching. The code is difficult to read and maintain.
This commit introduces parsing of the WSGI request into a higher-level
and easier-to-reason-about data structure.
To prove it works, we hook it up to hgweb_mod and use it for populating
the relative URL on the request instance.
We hold off on using it in more places because the logic in hgweb_mod
is crazy and I don't want to involve those changes with review of
the parsing code.
The URL construction code has variations that use the HTTP: Host header
(the canonical WSGI way of reconstructing the URL) and with the use
of SERVER_NAME. We need to differentiate because hgweb is currently
using SERVER_NAME for URL construction.
Differential Revision: https://phab.mercurial-scm.org/D2734
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 15:14:32 -0800] rev 36813
hgweb: always use "?" when writing session vars
This code resolves a string to insert in URLs as part of a
query string. Essentially, it resolves the {sessionvars}
template keyword, which is used by hgweb templates to build
a URL as a string.
The whole approach here feels wrong because there's no way of
knowing when this code runs how the final URL will look. There
could be additional URL fragments added before this template
keyword that add a query string component.
Furthermore, I don't think there's *any* for req.url to have
a query string. That's because the code that populates this
variable only takes SCRIPT_NAME and REPO_NAME into account. The
"?" character it is searching for would only be added if some
code attempted to add QUERY_STRING to the URL. Hacking the code
up to raise if "?" is present in the URL yields a clean test
suite run. I'm not sure if we broke this code or if it has
always been broken.
Anyway, this commit removes support for emitting "&" as the
first character in {sessionvars} and makes it always emit "?",
which is what it was always doing before AFAICT.
Differential Revision: https://phab.mercurial-scm.org/D2733
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 15:15:59 -0800] rev 36812
hgweb: rename req to wsgireq
We will soon introduce a parsed WSGI request object so we don't
have to concern ourselves with low-level WSGI matters. Prepare
for multiple request objects by renaming the existing one so it
is clear it deals with WSGI.
We also remove a symbol import to avoid even more naming confusion.
# no-check-commit because of some new foo_bar naming that's required
Differential Revision: https://phab.mercurial-scm.org/D2732
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 09:44:27 -0800] rev 36811
hgweb: validate WSGI environment dict
The wsgiref.validate module contains useful functions for validating
that various WSGI data structures are proper.
This commit adds validation of the environment dict to our built-in
HTTP server, which turns an HTTP request into an environment dict.
The check discovered that we weren't always setting QUERY_STRING,
which would cause the cgi module to fall back to sys.argv. So we
change things to always set QUERY_STRING.
The check passes on Python 2 and 3.
Differential Revision: https://phab.mercurial-scm.org/D2731
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 09:26:51 -0800] rev 36810
hgweb: ensure all wsgi environment values are str
Previously, we had a few entries that were bytes on Python 3.
PEP-0333 states that all entries must be the native str type
(bytes on Python 2, str on Python 3).
This required a number of changes to hgweb_mod to unbreak
things on Python 3. I suspect there still may be some regressions.
I'm going to introduce a data structure that represents a parsed
WSGI request in upcoming commits. This will hold bytes and will
allow us to stop using raw literals throughout the WSGI code.
Differential Revision: https://phab.mercurial-scm.org/D2730
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 07 Mar 2018 16:18:52 -0800] rev 36809
wireproto: formalize permissions checking as part of protocol interface
Per the inline comment desiring to formalize permissions checking
in the protocol interface, we do that.
I'm not convinced this is the best way to go about things. I would love
for there to e.g. be a better exception for denoting permissions
problems. But it does feel strictly better than snipping attributes
on the proto instance.
Differential Revision: https://phab.mercurial-scm.org/D2719