Gregory Szorc <gregory.szorc@gmail.com> [Wed, 14 Mar 2018 16:53:30 -0700] rev 37058
wireproto: support for receiving multiple requests
Now that we have request IDs on each frame and a specification
that allows multiple requests to be issued simultaneously,
possibly interleaved, let's teach the server to deal with that.
Instead of tracking the state for *the* active command request,
we instead track the state of each receiving command by its
request ID. The multiple states in our state machine for processing
each command's state has been collapsed into a single state for
"receiving commands."
Tests have been added so our branch coverage covers all meaningful
branches.
However, we did lose some logical coverage. The implementation
of this new feature opens up the door to a server having partial
command requests when end of input is reached. We will probably
want a mechanism to deal with partial requests. For now, I've
tracked that as a known issue in the class docstring. I've
also noted an abuse vector that becomes a little bit easier to
exploit with this feature.
Differential Revision: https://phab.mercurial-scm.org/D2870
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 14 Mar 2018 16:51:34 -0700] rev 37057
wireproto: add request IDs to frames
One of my primary goals with the new wire protocol is to make
operations faster and enable both client and server-side
operations to scale to multiple CPU cores.
One of the ways we make server interactions faster is by reducing
the number of round trips to that server.
With the existing wire protocol, the "batch" command facilitates
executing multiple commands from a single request payload. The way
it works is the requests for multiple commands are serialized. The
server executes those commands sequentially then serializes all
their results. As an optimization for reducing round trips, this
is very effective. The technical implementation, however, is pretty
bad and suffers from a number of deficiencies. For example, it
creates a new place where authorization to run a command must be
checked. (The lack of this checking in older Mercurial releases
was CVE-2018-1000132.)
The principles behind the "batch" command are sound. However, the
execution is not. Therefore, I want to ditch "batch" in the
new wire protocol and have protocol level support for issuing
multiple requests in a single round trip.
This commit introduces support in the frame-based wire protocol to
facilitate this. We do this by adding a "request ID" to each frame.
If a server sees frames associated with different "request IDs," it
handles them as separate requests. All of this happening possibly
as part of the same message from client to server (the same request
body in the case of HTTP).
We /could/ model the exchange the way pipelined HTTP requests do,
where the server processes requests in order they are issued and
received. But this artifically constrains scalability. A better
model is to allow multi-requests to be executed concurrently and
for responses to be sent and handled concurrently. So the
specification explicitly allows this. There is some work to be done
around specifying dependencies between multi-requests. We take
the easy road for now and punt on this problem, declaring that
if order is important, clients must not issue the request until
responses to dependent requests have been received.
This commit focuses on the boilerplate of implementing the request
ID. The server reactor still can't manage multiple, in-flight
request IDs. This will be addressed in a subsequent commit.
Because the wire semantics have changed, we bump the version of the
media type.
Differential Revision: https://phab.mercurial-scm.org/D2869
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 14 Mar 2018 14:01:16 -0700] rev 37056
wireproto: buffer output frames when in half duplex mode
Previously, when told that a response was ready, the server reactor
would instruct the caller to send frames immediately. This was OK
as an initial implementation. But it would not work for half-duplex
connections where the sender can't receive until all data has been
transmitted - such as httplib based clients.
In this commit, we teach the reactor that output frames should
be buffered until end of input is seen. This required a new
event to inform the reactor of end of input. The result from that
event will instruct the consumer to send all buffered frames.
The HTTP server is buffered by default.
This change effectively hides the complexity of buffering within
the reactor so that transports need not be concerned about it.
This helps keep the transports "dumb" and will make implementing
multiple requests-responses per atomic exchange (like an HTTP
request) much simpler.
Differential Revision: https://phab.mercurial-scm.org/D2860
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 14 Mar 2018 13:57:52 -0700] rev 37055
wireproto: define and implement responses in framing protocol
Previously, we only had client-side frame types defined. This commit
defines and implements basic support for server-side frame types.
We introduce two frame types - one for representing the raw bytes
result of a command and another for representing error results.
The types are quite primitive and behavior will expand over time.
But you have to start somewhere.
Our server reactor gains methods to react to an intent to send a
response. Again, following the "sans I/O" pattern, the reactor
doesn't actually send the data. Instead, it gives the caller a
generator to frames that it can send out over the wire.
Differential Revision: https://phab.mercurial-scm.org/D2858
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 14 Mar 2018 13:32:31 -0700] rev 37054
wireproto: implement basic command dispatching for HTTPv2
Now that we can ingest frames and decode them to requests to run
commands, we are able to actually run those commands. So this
commit starts to implement that.
There are numerous shortcomings. We can't operate on commands
with "*" arguments. We can only emit bytesresponse results. We
don't yet issue a response in the unified framing protocol.
But it's a start.
Differential Revision: https://phab.mercurial-scm.org/D2857
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 14 Mar 2018 08:18:15 -0700] rev 37053
wireproto: nominally don't expose "batch" to version 2 wire transports
The unified frame-based protocol will (eventually) support
multiple requests per client transmission. This means that the
[very hacky] "batch" command has no purpose existing in this protocol.
This commit marks the command as applying to v1 transports only.
But because SSHv2 == SSHv1 currently, we had to hack it back in
for the SSHv2 transport. Bleh.
Tests changed because the capabilities string changed. The order of
tokens in the string is not important.
Differential Revision: https://phab.mercurial-scm.org/D2856
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 14 Mar 2018 15:25:06 -0700] rev 37052
wireproto: implement basic frame reading and processing
We just implemented support for writing frames. Now let's implement
support for reading them.
The bulk of the new code is for a class that maintains the state of
a server. Essentially, you construct an instance, feed frames to it,
and it tells you what you should do next. The design is inspired by
the "sans I/O" movement and the reactor pattern. We don't want to
perform I/O or any major blocking event during frame ingestion because
this arbitrarily limits ways that server pieces can be implemented.
For example, it makes it much harder to swap in an alternate
implementation based on asyncio or do crazy things like have requests
dispatch to other processes.
We do still implement readframe() which does I/O. But it is decoupled
from the server reactor. And important parsing of frame headers is
a standalone function. So I/O is only needed to obtain frame data.
Because testing server-side ingest is useful and difficult on running
servers, we create a new "debugreflect" endpoint that will echo back
to the client what was received and how it was interpreted. This could
be useful for a server admin, someone implementing a client. But
immediately, it is useful for testing: we're able to demonstrate that
frames are parsed correctly and turned into requests to run commands
without having to implement command dispatch on the server!
In addition, we implement Python level unit tests for the reactor.
This is vastly more efficient than sending requests to the
"debugreflect" endpoint and vastly more powerful for advanced
testing.
Differential Revision: https://phab.mercurial-scm.org/D2852
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 19 Mar 2018 16:49:53 -0700] rev 37051
wireproto: define and implement protocol for issuing requests
The existing HTTP and SSH wire protocols suffer from a host of flaws
and shortcomings. I've been wanting to rewrite the protocol for a while
now. Supporting partial clone - which will require new wire protocol
commands and capabilities - and other advanced server functionality
will be much easier if we start from a clean slate and don't have
to be constrained by limitations of the existing wire protocol.
This commit starts to introduce a new data exchange format for
use over the wire protocol.
The new protocol is built on top of "frames," which are atomic
units of metadata + data. Frames will make it easier to implement
proxies and other mechanisms that want to inspect data without
having to maintain state. The existing frame metadata is very
minimal and it will evolve heavily. (We will eventually support
things like concurrent requests, out-of-order responses,
compression, side-channels for status updates, etc. Some of
these will require additions to the frame header.)
Another benefit of frames is that all reads are of a fixed size.
A reader works by consuming a frame header, extracting the payload
length, then reading that many bytes. No lookahead, buffering, or
memory reallocations are needed.
The new protocol attempts to be transport agnostic. I want all that's
required to use the new protocol to be a pair of unidirectional,
half-duplex pipes. (Yes, we will eventually make use of full-duplex
pipes, but that's for another commit.) Notably, when the SSH
transport switches to this new protocol, stderr will be unused.
This is by design: the lack of stderr on HTTP harms protocol
behavior there. By shoehorning everything into a pair of pipes,
we can have more consistent behavior across transports.
We currently only define the client side parts of the new protocol,
specifically the bits for requesting that a command run. This keeps
the new code and feature small and somewhat easy to review.
We add support to `hg debugwireproto` for writing frames into
HTTP request bodies. Our tests that issue commands to the new
HTTP endpoint have been updated to transmit frames. The server
bits haven't been touched to consume the frames yet. This will
occur in the next commit...
Astute readers may notice that the command name is transmitted in
both the HTTP request URL and the command request frame. This is
partially a kludge from me initially implementing the frame-based
protocol for SSH first. But it is also a feature: I intend to
eventually support issuing multiple commands per HTTP request. This
will allow us to replace the abomination that is the "batch" wire
protocol command with a protocol-level mechanism for performing
multi-dispatch. Because I want the frame-based protocol to be
as similar as possible across transports, I'd rather we (redundantly)
include the command name in the frame than differ behavior between
transports that have out-of-band routing information (like HTTP)
readily available.
Differential Revision: https://phab.mercurial-scm.org/D2851
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 13 Mar 2018 19:44:59 -0700] rev 37050
wireproto: define content negotiation for HTTPv2
HTTP messages communicate their media types and what media types
they can understand via the Content-Type and Accept header,
respectively.
While I don't want the wire protocol to lean too heavily on HTTP
because I'm aiming for the wire protocol to be as transport
agnostic as possible, it is nice to play by the spec if possible.
This commit defines our media negotiation mechanism for version
2 of the HTTP protocol. Essentially, we mandate the use of a
new media type and how clients and servers should react to
various headers or lack thereof.
The name of the media type is a placeholder. We purposefully don't
yet define the format of the new media type because that's a lot
of work.
I feel pretty strongly that we should use Content-Type. I feel
less strongly about Accept. I think it is reasonable for servers
to return the media type that was submitted to them. So we may
strike this header before the protocol is finished...
Differential Revision: https://phab.mercurial-scm.org/D2850
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 13 Mar 2018 14:15:10 -0700] rev 37049
hgweb: also set Content-Type header
Our HTTP/WSGI server may convert the Content-Type HTTP request
header to the CONTENT_TYPE WSGI environment key and not set
HTTP_CONTENT_TYPE. Other WSGI server implementations
do this, so I think the behavior is acceptable.
So assuming this HTTP request header could get "lost" by the WSGI
server, let's restore it on the request object like we do for
Content-Length.
FWIW, the WSGI server may also *invent* a Content-Type value. The
default behavior of Python's RFC 822 message class returns a default
media type if Content-Type isn't defined. This is kind of annoying.
But RFC 7231 section 3.1.1.5 does say the recipient may assume a media
type of application/octet-stream. Python's defaults are for
text/plain (given we're using an RFC 822 parser). But whatever.
Differential Revision: https://phab.mercurial-scm.org/D2849