Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 15:37:05 -0800] rev 36819
hgweb: use parsed request to construct query parameters
The way hgweb routes requests is kind of bonkers. If PATH_INFO is
set, we take the URL path after the repository. Otherwise, we take
the first part of the query string before "&" and the part before
";" in that.
We then kinda/sorta treat this as a path and route based on that.
This commit ports that code to use the parsed request object. This
required a new attribute on the parsed request to indicate whether
there is any PATH_INFO.
The new code still feels a bit convoluted for my liking. But we'll
need to rewrite more of the code before a better solution becomes
apparant. This code feels strictly better since we're no longer
doing low-level WSGI manipulation during routing.
Differential Revision: https://phab.mercurial-scm.org/D2739
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 11:33:33 -0800] rev 36818
hgweb: only recognize wire protocol commands from query string (BC)
Previously, we attempted to parse the wire protocol command from
`req.form`. Data could have come from the query string or POST
form data.
The wire protocol states that the command must be declared in the
query string. And AFAICT all Mercurial releases from at least 1.0
send the command in the query string.
So let's actual require this behavior.
This is technically BC. But I'm not sure how anyone in the wild
would encounter this. POST has historically been used for sending
bundle data. So there's no opportunity to encode arguments there.
And the experimental HTTP POST args also takes over the body. So
the only way someone would be impacted by this is if they wrote
a custom client that both used POST for everything and sent arguments
via the HTTP body. I don't believe such a client exists.
.. bc::
The HTTP wire protocol server no longer accepts the ``cmd``
argument to control which command to run via HTTP POST bodies.
The ``cmd`` argument must be specified on the URL query string.
Differential Revision: https://phab.mercurial-scm.org/D2738
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 11:21:46 -0800] rev 36817
hgweb: teach WSGI parser about query strings
Currently, req.form uses cgi.parse() to populate form data. Depending
on the request, form data can come from POST multipart/form-data,
application/x-www-form-urlencoded, or the URL query string.
Putting all these things into one data structure makes it difficult
to reason about how exactly parameters got to the request. It can
lead to wonkiness such as pulling parameters from both the URL and
POST data.
This commit teaches our WSGI request parser about argument data
in query strings. We populate fields containing the query string
data and only the query string data so it can't be confused with
POST data.
Differential Revision: https://phab.mercurial-scm.org/D2737
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 15:08:20 -0800] rev 36816
hgweb: use the parsed application path directly
Previously, we assigned a custom system string with a trailing slash
to wsgirequest.url.
The addition of the trailing slash felt arbitrary and seems to go
against how things typically work in WSGI.
We also want our URLs to be bytes, not system strings.
And, assigning a custom attribute to wsgirequest felt wrong.
This commit fixes all those things by removing the trailing
slash from the app path, changing consumers to use that variable
and to use it without a trailing slash, and removing the custom
attribute from wsgirequest.
We preserve the trailing slash on {url}. Also, makebreadcrumb
strips the trailing slash. So no change to it was needed.
Differential Revision: https://phab.mercurial-scm.org/D2736
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 12:59:25 -0800] rev 36815
hgweb: use computed base URL from parsed request
Let's not reinvent URL construction in a function that runs the
templater.
Differential Revision: https://phab.mercurial-scm.org/D2735
Gregory Szorc <gregory.szorc@gmail.com> [Sat, 10 Mar 2018 10:20:51 -0800] rev 36814
hgweb: parse WSGI request into a data structure
Currently, our WSGI applications (hgweb_mod and hgwebdir_mod) process
the raw WSGI request instance themselves. This means they have to
talk in terms of system strings. And they need to know details
about what's in the WSGI request. And in the case of hgweb_mod, it
is doing some very funky things with URL parsing to impact
dispatching. The code is difficult to read and maintain.
This commit introduces parsing of the WSGI request into a higher-level
and easier-to-reason-about data structure.
To prove it works, we hook it up to hgweb_mod and use it for populating
the relative URL on the request instance.
We hold off on using it in more places because the logic in hgweb_mod
is crazy and I don't want to involve those changes with review of
the parsing code.
The URL construction code has variations that use the HTTP: Host header
(the canonical WSGI way of reconstructing the URL) and with the use
of SERVER_NAME. We need to differentiate because hgweb is currently
using SERVER_NAME for URL construction.
Differential Revision: https://phab.mercurial-scm.org/D2734
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 15:14:32 -0800] rev 36813
hgweb: always use "?" when writing session vars
This code resolves a string to insert in URLs as part of a
query string. Essentially, it resolves the {sessionvars}
template keyword, which is used by hgweb templates to build
a URL as a string.
The whole approach here feels wrong because there's no way of
knowing when this code runs how the final URL will look. There
could be additional URL fragments added before this template
keyword that add a query string component.
Furthermore, I don't think there's *any* for req.url to have
a query string. That's because the code that populates this
variable only takes SCRIPT_NAME and REPO_NAME into account. The
"?" character it is searching for would only be added if some
code attempted to add QUERY_STRING to the URL. Hacking the code
up to raise if "?" is present in the URL yields a clean test
suite run. I'm not sure if we broke this code or if it has
always been broken.
Anyway, this commit removes support for emitting "&" as the
first character in {sessionvars} and makes it always emit "?",
which is what it was always doing before AFAICT.
Differential Revision: https://phab.mercurial-scm.org/D2733
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 15:15:59 -0800] rev 36812
hgweb: rename req to wsgireq
We will soon introduce a parsed WSGI request object so we don't
have to concern ourselves with low-level WSGI matters. Prepare
for multiple request objects by renaming the existing one so it
is clear it deals with WSGI.
We also remove a symbol import to avoid even more naming confusion.
# no-check-commit because of some new foo_bar naming that's required
Differential Revision: https://phab.mercurial-scm.org/D2732
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 09:44:27 -0800] rev 36811
hgweb: validate WSGI environment dict
The wsgiref.validate module contains useful functions for validating
that various WSGI data structures are proper.
This commit adds validation of the environment dict to our built-in
HTTP server, which turns an HTTP request into an environment dict.
The check discovered that we weren't always setting QUERY_STRING,
which would cause the cgi module to fall back to sys.argv. So we
change things to always set QUERY_STRING.
The check passes on Python 2 and 3.
Differential Revision: https://phab.mercurial-scm.org/D2731
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 09:26:51 -0800] rev 36810
hgweb: ensure all wsgi environment values are str
Previously, we had a few entries that were bytes on Python 3.
PEP-0333 states that all entries must be the native str type
(bytes on Python 2, str on Python 3).
This required a number of changes to hgweb_mod to unbreak
things on Python 3. I suspect there still may be some regressions.
I'm going to introduce a data structure that represents a parsed
WSGI request in upcoming commits. This will hold bytes and will
allow us to stop using raw literals throughout the WSGI code.
Differential Revision: https://phab.mercurial-scm.org/D2730
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 07 Mar 2018 16:18:52 -0800] rev 36809
wireproto: formalize permissions checking as part of protocol interface
Per the inline comment desiring to formalize permissions checking
in the protocol interface, we do that.
I'm not convinced this is the best way to go about things. I would love
for there to e.g. be a better exception for denoting permissions
problems. But it does feel strictly better than snipping attributes
on the proto instance.
Differential Revision: https://phab.mercurial-scm.org/D2719
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 07 Mar 2018 16:02:24 -0800] rev 36808
wireproto: declare permissions requirements in @wireprotocommand (API)
With the security patches from 4.5.2 merged into default, we now
have a per-command attribute defining what permissions are needed
to run that command. We now have a richer @wireprotocommand that
can be extended to record additional command metadata. So we
port the permissions mechanism to be based on @wireprotocommand.
.. api::
hgweb_mod.perms and wireproto.permissions have been removed. Wire
protocol commands should declare their required permissions in the
@wireprotocommand decorator.
Differential Revision: https://phab.mercurial-scm.org/D2718
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 06 Mar 2018 15:08:33 -0800] rev 36807
wireprotoserver: check permissions in main dispatch function
The permissions checking code merged from stable is out of place
in the refactored hgweb_mod module.
This commit moves the main call to wireprotoserver. We still have
some lingering code in hgweb_mod. This will get addressed later.
Differential Revision: https://phab.mercurial-scm.org/D2717
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 06 Mar 2018 15:02:53 -0800] rev 36806
wireprotoserver: check if command available before calling it
The previous behavior was just plain wrong. I have no clue how it
landed. My guess is a merge conflict resolution gone wrong on my
end a few weeks ago.
Differential Revision: https://phab.mercurial-scm.org/D2716
Yuya Nishihara <yuya@tcha.org> [Tue, 06 Mar 2018 02:43:17 -0600] rev 36805
py3: drop encoding.strio()
Its buffered nature makes TextIOWrapper unsuitable for temporarily wrapping
bytes I/O.
Yuya Nishihara <yuya@tcha.org> [Tue, 06 Mar 2018 02:42:37 -0600] rev 36804
ui: adjust Windows workaround to new _readline() code
It's only needed when rawinput() is called. Also made it Py3 compatible.
Yuya Nishihara <yuya@tcha.org> [Tue, 06 Mar 2018 02:38:53 -0600] rev 36803
ui: do not use rawinput() when we have to replace sys.stdin/stdout
See the inline comment for why. The current Python3 hack doesn't work if
more than one user inputs are expected because TextIOWrapper fills its
internal buffer at the first read() request.
Maybe we could write an unbuffered TextIOWrapper, but I don't want to make
things more complicated. Instead, this patch reinvents raw_input(' ') of
no readline support.
Yuya Nishihara <yuya@tcha.org> [Tue, 06 Mar 2018 02:32:26 -0600] rev 36802
ui: do not try readline support if fin/fout aren't standard streams
It's unlikely for a non-stdio stream to be a tty. Minimizing readline support
makes it much simpler to work around the unicode input() function of Python 3.
This also works on chg which duplicates client's tty to stdio fds.
Yuya Nishihara <yuya@tcha.org> [Tue, 06 Mar 2018 02:28:59 -0600] rev 36801
util: add public isstdin/isstdout() functions
Yuya Nishihara <yuya@tcha.org> [Tue, 06 Mar 2018 03:05:49 -0600] rev 36800
ui: add debug commands to test interactive prompt
Interactive operations aren't easily covered by tests. So let's add commands
to test them manually.
Yuya Nishihara <yuya@tcha.org> [Tue, 06 Mar 2018 02:14:11 -0600] rev 36799
ui: inline util.bytesinput() into ui._readline()
Prepares for rework of Python 3 support, which is currently broken due to
read-ahead buffer of TextIOWrapper.
Yuya Nishihara <yuya@tcha.org> [Tue, 06 Mar 2018 02:05:25 -0600] rev 36798
hgk: stop using util.bytesinput() to read a single line from stdin
Appears that the stdio here is an IPC channel between hg and hgk (tk)
processes, which shouldn't need a fancy readline support.
Augie Fackler <augie@google.com> [Mon, 29 Aug 2016 10:42:58 -0400] rev 36797
bookmarks: test for exchanging long bookmark names (issue5165)
As far as I can tell a test for a long bookmark name never actually
got added when this was fixed. Let's document that a 300 byte bookmark
is something we're supporting (this patch started out life over a year
ago as a way for me to validate the problem, and I recently found it.)
I think the nonzero exits from the push operations are only because no
new changesets get exchanged. Please correct me if I'm wrong on that. :)
Differential Revision: https://phab.mercurial-scm.org/D2727
Augie Fackler <augie@google.com> [Sun, 04 Mar 2018 11:46:03 -0500] rev 36796
phabricator: follow-up phab auth improvements with backwards compat mode
We'll rip this out before we ship 4.6, but this gives people a window
to migrate.
Differential Revision: https://phab.mercurial-scm.org/D2703
Tom Prince <mozilla@hocat.ca> [Sat, 20 Jan 2018 02:41:10 -0700] rev 36795
phabricator: specify API tokens per host, rather than per repo
Differential Revision: https://phab.mercurial-scm.org/D1919
Yuya Nishihara <yuya@tcha.org> [Sun, 04 Mar 2018 18:47:07 -0500] rev 36794
py3: drop b'' from generate-working-copy-states.py output
I could make everything bytes, but decoding filename seemed easier and we
don't have to worry about non-ascii filenames here.
Yuya Nishihara <yuya@tcha.org> [Sun, 04 Mar 2018 18:41:09 -0500] rev 36793
py3: make test-commit-multiple.t byte-safe
Yuya Nishihara <yuya@tcha.org> [Sun, 04 Mar 2018 18:34:46 -0500] rev 36792
py3: fix type of default username
Yuya Nishihara <yuya@tcha.org> [Sun, 04 Mar 2018 18:21:16 -0500] rev 36791
py3: read/write plain lock file in binary mode
A lock file shouldn't contain '\n', so this isn't a BC.
Augie Fackler <augie@google.com> [Mon, 05 Mar 2018 12:31:08 -0500] rev 36790
util: stop calling os.stat_float_times()
It had Python-wide side effects, and it disappears in 3.7.0.
As of this change, we're mostly working on 3.7.0b2. There are a few
worrying failures, mostly around regular expressions, but we'll have
to tackle those separately.
Differential Revision: https://phab.mercurial-scm.org/D2697
Augie Fackler <augie@google.com> [Mon, 05 Mar 2018 12:30:20 -0500] rev 36789
cleanup: use stat_result[stat.ST_MTIME] instead of stat_result.st_mtime
The latter is floating point by default, and we've been doing
os.stat_float_times(False). Unfortunately, os.stat_float_times was
removed between Python 3.7.0a1 and 3.7.0b2, so we have to stop using
it.
Differential Revision: https://phab.mercurial-scm.org/D2696
Augie Fackler <augie@google.com> [Mon, 05 Mar 2018 15:07:32 -0500] rev 36788
osutil: implement minimal __getitem__ compatibility on our custom listdir type
We previously declined to do this, but the removal of the deprecated
os.stat_float_times() method in Python 3.7 forces our hand.
Differential Revision: https://phab.mercurial-scm.org/D2695