Gregory Szorc <gregory.szorc@gmail.com> [Thu, 15 Mar 2018 16:09:58 -0700] rev 37062
wireproto: use named arguments when passing around frame data
Named arguments is easier to reason about compared to positional
arguments. Especially when you have many positional arguments.
Differential Revision: https://phab.mercurial-scm.org/D2900
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 15 Mar 2018 16:03:14 -0700] rev 37061
wireproto: define attr-based classes for representing frames
When frames only had 3 attributes, it was reasonable to
represent them as a tuple. With them growing more attributes,
it will be easier to pass them around as a more formal type.
So let's define attr-based classes to represent frame headers and
full frames.
Differential Revision: https://phab.mercurial-scm.org/D2899
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 14 Mar 2018 22:19:00 -0700] rev 37060
wireproto: define human output side channel frame
Currently, the SSH protocol delivers output tailored for people over
the stderr file descriptor. The HTTP protocol doesn't have this
file descriptor (because it only has an input and output pipe). So
it encodes textual output intended for humans within the protocol
responses. So response types have a facility for capturing output
to be printed to users. Some don't. And sometimes the implementation
of how that output is conveyed is super hacky.
On top of that, bundle2 has an "output" part that is used to store
output that should be printed when this part is encountered.
bundle2 also has the concept of "interrupt" chunks, which can be
used to signal that the regular bundle2 stream is to be
preempted by an out-of-band part that should be processed immediately.
This "interrupt" part can be an "output" part and can be used to
print data on the receiver.
The status quo is inconsistent and insane. We can do better.
This commit introduces a dedicated frame type on the frame-based
protocol for denoting textual data that should be printed on the
receiver. This frame type effectively constitutes a side-channel
by which textual data can be printed on the receiver without
interfering with other in-progress transmissions, such as the
transmission of command responses.
But wait - there's more! Previous implementations that transferred
textual data basically instructed the client to "print these bytes."
This suffered from a few problems.
First, the text data that was transmitted and eventually printed
originated from a server with a specic i18n configuration. This
meant that clients would see text using whatever the i18n settings
were on the server. Someone in France could connect to a server in
Japan and see unlegible Japanese glyphs - or maybe even mojibake.
Second, the normalization of all text data originated on servers
resulted in the loss of the ability to apply formatting to that
data. Local Mercurial clients can apply specific formatting
settings to individual atoms of text. For example, a revision can
be colored differently from a commit message. With data over the
wire, the potential for this rich formatting was lost. The best you
could do (without parsing the text to be printed), was apply a
universal label to it and e.g. color it specially.
The new mechanism for instructing the peer to print data does
not have these limitations.
Frames instructing the peer to print text are composed of a
formatting string plus arguments. In other words, receivers can
plug the formatting string into the i18n database to see if a local
translation is available. In addition, each atom being instructed
to print has a series of "labels" associated with it. These labels
can be mapped to the Mercurial UI's labels so locally configured
coloring, styling, etc settings can be applied.
What this all means is that textual messages originating on servers
can be localized on the client and richly formatted, all while
respecting the client's settings. This is slightly more complicated
than "print these bytes." But it is vastly more user friendly.
FWIW, I'm not aware of other protocols that attempt to encode
i18n and textual styling in this manner. You could lobby the
claim that this feature is over-engineered. However, if I were to
sit in the shoes of a non-English speaker learning how to use
version control, I think I would *love* this feature because
it would enable me to see richly formatted text in my chosen
locale.
Anyway, we only implement support for encoding frames of this
type and basic tests for that encoding. We'll still need to
hook up the server and its ui instance to emit these frames.
I recognize this feature may be a bit more controversial than
other aspects of the wire protocol because it is a bit
"radical." So I'd figured I'd start small to test the waters and
see if others feel this feature is worthwhile.
Differential Revision: https://phab.mercurial-scm.org/D2872
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 19 Mar 2018 16:55:07 -0700] rev 37059
wireproto: service multiple command requests per HTTP request
Now that our new frame-based protocol server can understand how
to ingest multiple, possibly interleaved, command requests, let's
hook it up to the HTTP server.
The code on the HTTP side of things is still a bit hacky. We need
a bit of work around error handling, content types, etc. But it's
a start.
Among the added tests, we demonstrate that a client can send frames
for multiple commands iterleaved with each other and that a later
issued command can respond before the first one has finished
sending. This makes our multi-request model technically superior
to the previous "batch" command.
Differential Revision: https://phab.mercurial-scm.org/D2871
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 14 Mar 2018 16:53:30 -0700] rev 37058
wireproto: support for receiving multiple requests
Now that we have request IDs on each frame and a specification
that allows multiple requests to be issued simultaneously,
possibly interleaved, let's teach the server to deal with that.
Instead of tracking the state for *the* active command request,
we instead track the state of each receiving command by its
request ID. The multiple states in our state machine for processing
each command's state has been collapsed into a single state for
"receiving commands."
Tests have been added so our branch coverage covers all meaningful
branches.
However, we did lose some logical coverage. The implementation
of this new feature opens up the door to a server having partial
command requests when end of input is reached. We will probably
want a mechanism to deal with partial requests. For now, I've
tracked that as a known issue in the class docstring. I've
also noted an abuse vector that becomes a little bit easier to
exploit with this feature.
Differential Revision: https://phab.mercurial-scm.org/D2870
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 14 Mar 2018 16:51:34 -0700] rev 37057
wireproto: add request IDs to frames
One of my primary goals with the new wire protocol is to make
operations faster and enable both client and server-side
operations to scale to multiple CPU cores.
One of the ways we make server interactions faster is by reducing
the number of round trips to that server.
With the existing wire protocol, the "batch" command facilitates
executing multiple commands from a single request payload. The way
it works is the requests for multiple commands are serialized. The
server executes those commands sequentially then serializes all
their results. As an optimization for reducing round trips, this
is very effective. The technical implementation, however, is pretty
bad and suffers from a number of deficiencies. For example, it
creates a new place where authorization to run a command must be
checked. (The lack of this checking in older Mercurial releases
was CVE-2018-1000132.)
The principles behind the "batch" command are sound. However, the
execution is not. Therefore, I want to ditch "batch" in the
new wire protocol and have protocol level support for issuing
multiple requests in a single round trip.
This commit introduces support in the frame-based wire protocol to
facilitate this. We do this by adding a "request ID" to each frame.
If a server sees frames associated with different "request IDs," it
handles them as separate requests. All of this happening possibly
as part of the same message from client to server (the same request
body in the case of HTTP).
We /could/ model the exchange the way pipelined HTTP requests do,
where the server processes requests in order they are issued and
received. But this artifically constrains scalability. A better
model is to allow multi-requests to be executed concurrently and
for responses to be sent and handled concurrently. So the
specification explicitly allows this. There is some work to be done
around specifying dependencies between multi-requests. We take
the easy road for now and punt on this problem, declaring that
if order is important, clients must not issue the request until
responses to dependent requests have been received.
This commit focuses on the boilerplate of implementing the request
ID. The server reactor still can't manage multiple, in-flight
request IDs. This will be addressed in a subsequent commit.
Because the wire semantics have changed, we bump the version of the
media type.
Differential Revision: https://phab.mercurial-scm.org/D2869
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 14 Mar 2018 14:01:16 -0700] rev 37056
wireproto: buffer output frames when in half duplex mode
Previously, when told that a response was ready, the server reactor
would instruct the caller to send frames immediately. This was OK
as an initial implementation. But it would not work for half-duplex
connections where the sender can't receive until all data has been
transmitted - such as httplib based clients.
In this commit, we teach the reactor that output frames should
be buffered until end of input is seen. This required a new
event to inform the reactor of end of input. The result from that
event will instruct the consumer to send all buffered frames.
The HTTP server is buffered by default.
This change effectively hides the complexity of buffering within
the reactor so that transports need not be concerned about it.
This helps keep the transports "dumb" and will make implementing
multiple requests-responses per atomic exchange (like an HTTP
request) much simpler.
Differential Revision: https://phab.mercurial-scm.org/D2860
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 14 Mar 2018 13:57:52 -0700] rev 37055
wireproto: define and implement responses in framing protocol
Previously, we only had client-side frame types defined. This commit
defines and implements basic support for server-side frame types.
We introduce two frame types - one for representing the raw bytes
result of a command and another for representing error results.
The types are quite primitive and behavior will expand over time.
But you have to start somewhere.
Our server reactor gains methods to react to an intent to send a
response. Again, following the "sans I/O" pattern, the reactor
doesn't actually send the data. Instead, it gives the caller a
generator to frames that it can send out over the wire.
Differential Revision: https://phab.mercurial-scm.org/D2858
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 14 Mar 2018 13:32:31 -0700] rev 37054
wireproto: implement basic command dispatching for HTTPv2
Now that we can ingest frames and decode them to requests to run
commands, we are able to actually run those commands. So this
commit starts to implement that.
There are numerous shortcomings. We can't operate on commands
with "*" arguments. We can only emit bytesresponse results. We
don't yet issue a response in the unified framing protocol.
But it's a start.
Differential Revision: https://phab.mercurial-scm.org/D2857
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 14 Mar 2018 08:18:15 -0700] rev 37053
wireproto: nominally don't expose "batch" to version 2 wire transports
The unified frame-based protocol will (eventually) support
multiple requests per client transmission. This means that the
[very hacky] "batch" command has no purpose existing in this protocol.
This commit marks the command as applying to v1 transports only.
But because SSHv2 == SSHv1 currently, we had to hack it back in
for the SSHv2 transport. Bleh.
Tests changed because the capabilities string changed. The order of
tokens in the string is not important.
Differential Revision: https://phab.mercurial-scm.org/D2856