Daniel Ploch <dploch@google.com> [Thu, 22 Mar 2018 17:08:25 -0700] rev 37091
fancyopts: fix rendering of customopt defaults in help text
Differential Revision: https://phab.mercurial-scm.org/D2935
Pulkit Goyal <7895pulkit@gmail.com> [Mon, 12 Mar 2018 18:38:26 +0530] rev 37090
remotenames: show remote bookmarks in `hg bookmarks`
This patch adds functionality to show list of remote bookmarks in `hg bookmarks`
command.
There is some indenting problem in the test output as the current bookmark
printing code in core can handle bookmark names of size 25 only gracefully.
The idea is taken from hgremotenames extension which has --remote and --all
flags to show remote bookmarks. However, this patch by defaults support showing
list of remote bookmarks if remotenames extension is enabled and remotebookmarks
are turned on.
Differential Revision: https://phab.mercurial-scm.org/D2808
Pulkit Goyal <7895pulkit@gmail.com> [Sun, 11 Mar 2018 16:17:51 +0530] rev 37089
remotenames: add functionality to hoist remotebookmarks
This patch adds the functionality to hoist remotebookmarks to the top level
namespace. The peer of which bookmarks should be hoisted can be set using
`remotenames.hoistedpeer` config option. Only bookmarks can be hoisted. If a
hoisted name and local bookmark exists of the same name, the local bookmark
takes precedence.
While I was here, I documented the default values of two other remotenames
config options.
Differential Revision: https://phab.mercurial-scm.org/D2807
Kyle Lippincott <spectral@google.com> [Thu, 08 Mar 2018 11:44:03 -0800] rev 37088
copyfile: preserve stat info (mtime, etc.) when doing copies/renames
Differential Revision: https://phab.mercurial-scm.org/D2729
Matt Harbison <matt_harbison@yahoo.com> [Thu, 22 Mar 2018 22:58:31 -0400] rev 37087
merge: add 'isknown=True' to a dirstate.normalize() in _unknowndirschecker
Per the docstring for dirstate.normalize().
Matt Harbison <matt_harbison@yahoo.com> [Thu, 22 Mar 2018 22:56:29 -0400] rev 37086
merge: pconvert paths in _unknowndirschecker before dirstate-normalizing
This fixes the failure in test-pathconflicts-basic.t on Windows. The test was
passing in 'a\b', which was getting normalized to 'A\B', which isn't in
dirstate. (The filesystem path is all lowercase anyway.)
This isn't the only case of calling dirstate.normalize(), but other methods here
(util.finddirs()) seem to assume the input paths are already using '/'. I think
the backslash comes from wvfs.reljoin() (in this case), but could also come from
wvfs.walk(), so this is the only case that needs it.
Yuya Nishihara <yuya@tcha.org> [Thu, 22 Mar 2018 22:39:43 +0900] rev 37085
util: enable deprecation warning for stringutil proxy (API)
.. api::
Several generic string helper functions have been moved to utils.stringutil
module.
Yuya Nishihara <yuya@tcha.org> [Thu, 22 Mar 2018 21:56:20 +0900] rev 37084
stringutil: bulk-replace call sites to point to new module
This might conflict with other patches floating around, sorry.
Yuya Nishihara <yuya@tcha.org> [Thu, 22 Mar 2018 21:19:31 +0900] rev 37083
stringutil: move generic string helpers to new module
Per https://phab.mercurial-scm.org/D2903#46738
URL and file paths functions are left since they are big enough to make
separate modules.
Yuya Nishihara <yuya@tcha.org> [Thu, 22 Mar 2018 21:32:19 +0900] rev 37082
util: remove unused private constant '_hextochr'
The only user, _urlunquote(), was removed by
81d38478fced.
Yuya Nishihara <yuya@tcha.org> [Thu, 22 Mar 2018 21:20:47 +0900] rev 37081
util: mark internal constants of escapedata() as private
Yuya Nishihara <yuya@tcha.org> [Thu, 22 Mar 2018 21:14:12 +0900] rev 37080
util: adjust indent level in wrap()
Yuya Nishihara <yuya@tcha.org> [Thu, 22 Mar 2018 21:13:31 +0900] rev 37079
util: mark MBTextWrapper as private
Makes porting slightly easier.
Yuya Nishihara <yuya@tcha.org> [Thu, 22 Mar 2018 21:10:42 +0900] rev 37078
util: add helper to define proxy functions to utils.*
Kyle Lippincott <spectral@google.com> [Wed, 21 Mar 2018 12:36:29 -0700] rev 37077
filemerge: make the 'local' path match the format that 'base' and 'other' use
If we pass a separate '$output' arg to the merge tool, we produce four files:
local, base, other, and output. In this situation, 'output' will be the
original filename, 'base' and 'other' are temporary files, and previously
'local' would be the backup file (so if 'output' was foo.txt, 'local' would be
foo.txt.orig).
This change makes it so that 'local' follows the same pattern as 'base' and
'other' - it will be a temporary file either in the
`experimental.mergetempdirprefix`-controlled directory with a name like
foo~local.txt, or in the normal system-wide temp dir with a name like
foo~local.RaNd0m.txt.
For the cases where the merge tool does not use an '$output' arg, 'local' is
still the destination filename, and 'base' and 'other' are unchanged.
The hope is that this is much easier for people to reason about; rather than
having a tool like Meld pop up with three panes, one of them with the filename
"foo.txt.orig", one with the filename "foo.txt", and one with
"foo~other.StuFf2.txt", we can (when the merge temp dir stuff is enabled) make
it show up as "foo~local.txt", "foo.txt" and "foo~other.txt", respectively.
This also opens the door to future customization, such as getting the
operation-provided labels and a hash prefix into the filenames (so we see
something like "foo~dest.abc123", "foo.txt", and "foo~src.d4e5f6").
Differential Revision: https://phab.mercurial-scm.org/D2889
Matt Harbison <matt_harbison@yahoo.com> [Wed, 21 Mar 2018 22:36:26 -0400] rev 37076
test-strip-narrow: adjust bundle removal for Windows test stability
MSYS was mangling $TESTTMP to C:\\Users\\...\\test-narrow-strip.t-flat/, which
caused `rm` to fail. The -f was suppressing -ENOENT, so the only clue something
was wrong was when 2 bundles were applied via `hg unbundle` on line 91, instead
of just 1. This changed the text output of `hg unbundle`.
The first `rm` wasn't causing an issue, but is changed for consistency with the
rest of the file.
Yuya Nishihara <yuya@tcha.org> [Thu, 15 Mar 2018 21:38:57 +0900] rev 37075
templater: drop symbols which should be overridden by new 'ctx' (
issue5612)
This problem is caused by impedance mismatch between the templater and the
formatter interface, which is that the template keywords are generally
evaluated dynamically, but the formatter puts static values into a template
mapping.
This patch avoids the problem by removing conflicting values from a mapping
dict when a 'ctx' is switched.
Yuya Nishihara <yuya@tcha.org> [Thu, 15 Mar 2018 21:22:52 +0900] rev 37074
templater: factor out function to create mapping dict for nested evaluation
overlaymap() is the hook point to drop mapping items conflicting with the
default keywords which have to be re-evaluated with new 'ctx' resource.
Yuya Nishihara <yuya@tcha.org> [Thu, 15 Mar 2018 20:43:39 +0900] rev 37073
templater: introduce resourcemapper class
A couple more functions will be added later to work around nested mapping
bugs such as the issue 5612.
Yuya Nishihara <yuya@tcha.org> [Thu, 15 Mar 2018 20:27:38 +0900] rev 37072
log: do no expect templateresources() returning a dict
The resources dict will be replaced with new resource mapper object, which
won't implement __getitem__(key). Share the whole resources object with
_graphnodeformater() to make porting easier.
Yuya Nishihara <yuya@tcha.org> [Fri, 16 Mar 2018 23:11:55 +0900] rev 37071
templatekw: mark _showlist() as deprecated (API)
.. api::
``templatekw._showlist()`` is deprecated in favor of
``templateutil._showcompatlist()``, which takes ``context`` in place of
``templ``.
Yuya Nishihara <yuya@tcha.org> [Fri, 16 Mar 2018 23:09:21 +0900] rev 37070
templater: drop 'templ' from resources dict
Partially resolves cycle, templ -> context -> templ. This will make it easier
to replace the resources dict with new immutable resource mapper interface.
Yuya Nishihara <yuya@tcha.org> [Fri, 16 Mar 2018 23:01:51 +0900] rev 37069
templatekw: stop using _showlist() which is about to be deprecated
Use the new context-based API instead.
Yuya Nishihara <yuya@tcha.org> [Fri, 16 Mar 2018 22:47:15 +0900] rev 37068
templater: use template context to render old-style list template
Prepares for dropping the 'templ' resource.
This means old-style list templates are processed by the same engine class
as the one for the list node. I think that's fine since templates for the
same list should be tightly coupled, and I believe the extension point for
the engine classes isn't actually used.
Now templatekw._showlist() is a compatibility wrapper for _showcompatlist(),
and will be deprecated soon. The function is still marked as private since
I plan to change the interface to get rid of closures capturing context and
mapping.
Yuya Nishihara <yuya@tcha.org> [Fri, 16 Mar 2018 22:36:40 +0900] rev 37067
templater: add context.preload(t) to test if the specified template exists
I'm going to remove 'templ' from the resources dict because it is the only
resource that the caller can't provide. This also implies that putting
'templ' into the resources dict creates a reference cycle.
context.preload(t) will be used in place of templater.__contains__().
Yuya Nishihara <yuya@tcha.org> [Sun, 18 Mar 2018 12:28:19 +0900] rev 37066
annotate: pack line content into annotateline object (API)
Just for code readability. We can do that since the annotateline type is
no longer used while computing the history.
Yuya Nishihara <yuya@tcha.org> [Tue, 13 Mar 2018 22:18:06 +0900] rev 37065
annotate: drop linenumber flag from fctx.annotate() (API)
Now linenumber=True is fast enough to be enabled by default.
Yuya Nishihara <yuya@tcha.org> [Mon, 12 Mar 2018 20:45:10 +0900] rev 37064
annotate: do not construct attr.s object per line while computing history
Unfortunately, good abstraction has a cost. It's way slower to construct
an annotateline() object than creating a plain tuple or a list. This patch
changes the internal data structure from row-based to columnar, so the
decorate() function can be instant (i.e. no Python in hot loop.)
For code readability, the outermost tuple is switched to an attr.s object
instead.
(original, row-based attr.s)
$ hg annot mercurial/commands.py --time > /dev/null
time: real 11.470 secs (user 11.400+0.000 sys 0.070+0.000)
$ hg annot mercurial/commands.py --time --line-number > /dev/null
time: real 39.590 secs (user 39.500+0.000 sys 0.080+0.000)
(this patch, columnar)
$ hg annot mercurial/commands.py --time > /dev/null
time: real 11.780 secs (user 11.710+0.000 sys 0.070+0.000)
$ hg annot mercurial/commands.py --time --line-number > /dev/null
time: real 12.240 secs (user 12.170+0.000 sys 0.090+0.000)
(cf. 4.3.3, row-based tuple)
$ hg annot mercurial/commands.py --time --line-number > /dev/null
time: real 19.540 secs (user 19.460+0.000 sys 0.080+0.000)
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 15 Mar 2018 18:05:49 -0700] rev 37063
wireproto: explicitly track which requests are active
We previously only tracked which requests are receiving. A
misbehaving client could accidentally have multiple requests with
the same ID in flight.
We now explicitly track which request IDs are currently active.
We make it illegal to receive a frame associated with a request
ID that has already been dispatched.
Differential Revision: https://phab.mercurial-scm.org/D2901
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 15 Mar 2018 16:09:58 -0700] rev 37062
wireproto: use named arguments when passing around frame data
Named arguments is easier to reason about compared to positional
arguments. Especially when you have many positional arguments.
Differential Revision: https://phab.mercurial-scm.org/D2900
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 15 Mar 2018 16:03:14 -0700] rev 37061
wireproto: define attr-based classes for representing frames
When frames only had 3 attributes, it was reasonable to
represent them as a tuple. With them growing more attributes,
it will be easier to pass them around as a more formal type.
So let's define attr-based classes to represent frame headers and
full frames.
Differential Revision: https://phab.mercurial-scm.org/D2899
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 14 Mar 2018 22:19:00 -0700] rev 37060
wireproto: define human output side channel frame
Currently, the SSH protocol delivers output tailored for people over
the stderr file descriptor. The HTTP protocol doesn't have this
file descriptor (because it only has an input and output pipe). So
it encodes textual output intended for humans within the protocol
responses. So response types have a facility for capturing output
to be printed to users. Some don't. And sometimes the implementation
of how that output is conveyed is super hacky.
On top of that, bundle2 has an "output" part that is used to store
output that should be printed when this part is encountered.
bundle2 also has the concept of "interrupt" chunks, which can be
used to signal that the regular bundle2 stream is to be
preempted by an out-of-band part that should be processed immediately.
This "interrupt" part can be an "output" part and can be used to
print data on the receiver.
The status quo is inconsistent and insane. We can do better.
This commit introduces a dedicated frame type on the frame-based
protocol for denoting textual data that should be printed on the
receiver. This frame type effectively constitutes a side-channel
by which textual data can be printed on the receiver without
interfering with other in-progress transmissions, such as the
transmission of command responses.
But wait - there's more! Previous implementations that transferred
textual data basically instructed the client to "print these bytes."
This suffered from a few problems.
First, the text data that was transmitted and eventually printed
originated from a server with a specic i18n configuration. This
meant that clients would see text using whatever the i18n settings
were on the server. Someone in France could connect to a server in
Japan and see unlegible Japanese glyphs - or maybe even mojibake.
Second, the normalization of all text data originated on servers
resulted in the loss of the ability to apply formatting to that
data. Local Mercurial clients can apply specific formatting
settings to individual atoms of text. For example, a revision can
be colored differently from a commit message. With data over the
wire, the potential for this rich formatting was lost. The best you
could do (without parsing the text to be printed), was apply a
universal label to it and e.g. color it specially.
The new mechanism for instructing the peer to print data does
not have these limitations.
Frames instructing the peer to print text are composed of a
formatting string plus arguments. In other words, receivers can
plug the formatting string into the i18n database to see if a local
translation is available. In addition, each atom being instructed
to print has a series of "labels" associated with it. These labels
can be mapped to the Mercurial UI's labels so locally configured
coloring, styling, etc settings can be applied.
What this all means is that textual messages originating on servers
can be localized on the client and richly formatted, all while
respecting the client's settings. This is slightly more complicated
than "print these bytes." But it is vastly more user friendly.
FWIW, I'm not aware of other protocols that attempt to encode
i18n and textual styling in this manner. You could lobby the
claim that this feature is over-engineered. However, if I were to
sit in the shoes of a non-English speaker learning how to use
version control, I think I would *love* this feature because
it would enable me to see richly formatted text in my chosen
locale.
Anyway, we only implement support for encoding frames of this
type and basic tests for that encoding. We'll still need to
hook up the server and its ui instance to emit these frames.
I recognize this feature may be a bit more controversial than
other aspects of the wire protocol because it is a bit
"radical." So I'd figured I'd start small to test the waters and
see if others feel this feature is worthwhile.
Differential Revision: https://phab.mercurial-scm.org/D2872
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 19 Mar 2018 16:55:07 -0700] rev 37059
wireproto: service multiple command requests per HTTP request
Now that our new frame-based protocol server can understand how
to ingest multiple, possibly interleaved, command requests, let's
hook it up to the HTTP server.
The code on the HTTP side of things is still a bit hacky. We need
a bit of work around error handling, content types, etc. But it's
a start.
Among the added tests, we demonstrate that a client can send frames
for multiple commands iterleaved with each other and that a later
issued command can respond before the first one has finished
sending. This makes our multi-request model technically superior
to the previous "batch" command.
Differential Revision: https://phab.mercurial-scm.org/D2871
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 14 Mar 2018 16:53:30 -0700] rev 37058
wireproto: support for receiving multiple requests
Now that we have request IDs on each frame and a specification
that allows multiple requests to be issued simultaneously,
possibly interleaved, let's teach the server to deal with that.
Instead of tracking the state for *the* active command request,
we instead track the state of each receiving command by its
request ID. The multiple states in our state machine for processing
each command's state has been collapsed into a single state for
"receiving commands."
Tests have been added so our branch coverage covers all meaningful
branches.
However, we did lose some logical coverage. The implementation
of this new feature opens up the door to a server having partial
command requests when end of input is reached. We will probably
want a mechanism to deal with partial requests. For now, I've
tracked that as a known issue in the class docstring. I've
also noted an abuse vector that becomes a little bit easier to
exploit with this feature.
Differential Revision: https://phab.mercurial-scm.org/D2870
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 14 Mar 2018 16:51:34 -0700] rev 37057
wireproto: add request IDs to frames
One of my primary goals with the new wire protocol is to make
operations faster and enable both client and server-side
operations to scale to multiple CPU cores.
One of the ways we make server interactions faster is by reducing
the number of round trips to that server.
With the existing wire protocol, the "batch" command facilitates
executing multiple commands from a single request payload. The way
it works is the requests for multiple commands are serialized. The
server executes those commands sequentially then serializes all
their results. As an optimization for reducing round trips, this
is very effective. The technical implementation, however, is pretty
bad and suffers from a number of deficiencies. For example, it
creates a new place where authorization to run a command must be
checked. (The lack of this checking in older Mercurial releases
was CVE-2018-1000132.)
The principles behind the "batch" command are sound. However, the
execution is not. Therefore, I want to ditch "batch" in the
new wire protocol and have protocol level support for issuing
multiple requests in a single round trip.
This commit introduces support in the frame-based wire protocol to
facilitate this. We do this by adding a "request ID" to each frame.
If a server sees frames associated with different "request IDs," it
handles them as separate requests. All of this happening possibly
as part of the same message from client to server (the same request
body in the case of HTTP).
We /could/ model the exchange the way pipelined HTTP requests do,
where the server processes requests in order they are issued and
received. But this artifically constrains scalability. A better
model is to allow multi-requests to be executed concurrently and
for responses to be sent and handled concurrently. So the
specification explicitly allows this. There is some work to be done
around specifying dependencies between multi-requests. We take
the easy road for now and punt on this problem, declaring that
if order is important, clients must not issue the request until
responses to dependent requests have been received.
This commit focuses on the boilerplate of implementing the request
ID. The server reactor still can't manage multiple, in-flight
request IDs. This will be addressed in a subsequent commit.
Because the wire semantics have changed, we bump the version of the
media type.
Differential Revision: https://phab.mercurial-scm.org/D2869
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 14 Mar 2018 14:01:16 -0700] rev 37056
wireproto: buffer output frames when in half duplex mode
Previously, when told that a response was ready, the server reactor
would instruct the caller to send frames immediately. This was OK
as an initial implementation. But it would not work for half-duplex
connections where the sender can't receive until all data has been
transmitted - such as httplib based clients.
In this commit, we teach the reactor that output frames should
be buffered until end of input is seen. This required a new
event to inform the reactor of end of input. The result from that
event will instruct the consumer to send all buffered frames.
The HTTP server is buffered by default.
This change effectively hides the complexity of buffering within
the reactor so that transports need not be concerned about it.
This helps keep the transports "dumb" and will make implementing
multiple requests-responses per atomic exchange (like an HTTP
request) much simpler.
Differential Revision: https://phab.mercurial-scm.org/D2860
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 14 Mar 2018 13:57:52 -0700] rev 37055
wireproto: define and implement responses in framing protocol
Previously, we only had client-side frame types defined. This commit
defines and implements basic support for server-side frame types.
We introduce two frame types - one for representing the raw bytes
result of a command and another for representing error results.
The types are quite primitive and behavior will expand over time.
But you have to start somewhere.
Our server reactor gains methods to react to an intent to send a
response. Again, following the "sans I/O" pattern, the reactor
doesn't actually send the data. Instead, it gives the caller a
generator to frames that it can send out over the wire.
Differential Revision: https://phab.mercurial-scm.org/D2858
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 14 Mar 2018 13:32:31 -0700] rev 37054
wireproto: implement basic command dispatching for HTTPv2
Now that we can ingest frames and decode them to requests to run
commands, we are able to actually run those commands. So this
commit starts to implement that.
There are numerous shortcomings. We can't operate on commands
with "*" arguments. We can only emit bytesresponse results. We
don't yet issue a response in the unified framing protocol.
But it's a start.
Differential Revision: https://phab.mercurial-scm.org/D2857
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 14 Mar 2018 08:18:15 -0700] rev 37053
wireproto: nominally don't expose "batch" to version 2 wire transports
The unified frame-based protocol will (eventually) support
multiple requests per client transmission. This means that the
[very hacky] "batch" command has no purpose existing in this protocol.
This commit marks the command as applying to v1 transports only.
But because SSHv2 == SSHv1 currently, we had to hack it back in
for the SSHv2 transport. Bleh.
Tests changed because the capabilities string changed. The order of
tokens in the string is not important.
Differential Revision: https://phab.mercurial-scm.org/D2856
Gregory Szorc <gregory.szorc@gmail.com> [Wed, 14 Mar 2018 15:25:06 -0700] rev 37052
wireproto: implement basic frame reading and processing
We just implemented support for writing frames. Now let's implement
support for reading them.
The bulk of the new code is for a class that maintains the state of
a server. Essentially, you construct an instance, feed frames to it,
and it tells you what you should do next. The design is inspired by
the "sans I/O" movement and the reactor pattern. We don't want to
perform I/O or any major blocking event during frame ingestion because
this arbitrarily limits ways that server pieces can be implemented.
For example, it makes it much harder to swap in an alternate
implementation based on asyncio or do crazy things like have requests
dispatch to other processes.
We do still implement readframe() which does I/O. But it is decoupled
from the server reactor. And important parsing of frame headers is
a standalone function. So I/O is only needed to obtain frame data.
Because testing server-side ingest is useful and difficult on running
servers, we create a new "debugreflect" endpoint that will echo back
to the client what was received and how it was interpreted. This could
be useful for a server admin, someone implementing a client. But
immediately, it is useful for testing: we're able to demonstrate that
frames are parsed correctly and turned into requests to run commands
without having to implement command dispatch on the server!
In addition, we implement Python level unit tests for the reactor.
This is vastly more efficient than sending requests to the
"debugreflect" endpoint and vastly more powerful for advanced
testing.
Differential Revision: https://phab.mercurial-scm.org/D2852
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 19 Mar 2018 16:49:53 -0700] rev 37051
wireproto: define and implement protocol for issuing requests
The existing HTTP and SSH wire protocols suffer from a host of flaws
and shortcomings. I've been wanting to rewrite the protocol for a while
now. Supporting partial clone - which will require new wire protocol
commands and capabilities - and other advanced server functionality
will be much easier if we start from a clean slate and don't have
to be constrained by limitations of the existing wire protocol.
This commit starts to introduce a new data exchange format for
use over the wire protocol.
The new protocol is built on top of "frames," which are atomic
units of metadata + data. Frames will make it easier to implement
proxies and other mechanisms that want to inspect data without
having to maintain state. The existing frame metadata is very
minimal and it will evolve heavily. (We will eventually support
things like concurrent requests, out-of-order responses,
compression, side-channels for status updates, etc. Some of
these will require additions to the frame header.)
Another benefit of frames is that all reads are of a fixed size.
A reader works by consuming a frame header, extracting the payload
length, then reading that many bytes. No lookahead, buffering, or
memory reallocations are needed.
The new protocol attempts to be transport agnostic. I want all that's
required to use the new protocol to be a pair of unidirectional,
half-duplex pipes. (Yes, we will eventually make use of full-duplex
pipes, but that's for another commit.) Notably, when the SSH
transport switches to this new protocol, stderr will be unused.
This is by design: the lack of stderr on HTTP harms protocol
behavior there. By shoehorning everything into a pair of pipes,
we can have more consistent behavior across transports.
We currently only define the client side parts of the new protocol,
specifically the bits for requesting that a command run. This keeps
the new code and feature small and somewhat easy to review.
We add support to `hg debugwireproto` for writing frames into
HTTP request bodies. Our tests that issue commands to the new
HTTP endpoint have been updated to transmit frames. The server
bits haven't been touched to consume the frames yet. This will
occur in the next commit...
Astute readers may notice that the command name is transmitted in
both the HTTP request URL and the command request frame. This is
partially a kludge from me initially implementing the frame-based
protocol for SSH first. But it is also a feature: I intend to
eventually support issuing multiple commands per HTTP request. This
will allow us to replace the abomination that is the "batch" wire
protocol command with a protocol-level mechanism for performing
multi-dispatch. Because I want the frame-based protocol to be
as similar as possible across transports, I'd rather we (redundantly)
include the command name in the frame than differ behavior between
transports that have out-of-band routing information (like HTTP)
readily available.
Differential Revision: https://phab.mercurial-scm.org/D2851
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 13 Mar 2018 19:44:59 -0700] rev 37050
wireproto: define content negotiation for HTTPv2
HTTP messages communicate their media types and what media types
they can understand via the Content-Type and Accept header,
respectively.
While I don't want the wire protocol to lean too heavily on HTTP
because I'm aiming for the wire protocol to be as transport
agnostic as possible, it is nice to play by the spec if possible.
This commit defines our media negotiation mechanism for version
2 of the HTTP protocol. Essentially, we mandate the use of a
new media type and how clients and servers should react to
various headers or lack thereof.
The name of the media type is a placeholder. We purposefully don't
yet define the format of the new media type because that's a lot
of work.
I feel pretty strongly that we should use Content-Type. I feel
less strongly about Accept. I think it is reasonable for servers
to return the media type that was submitted to them. So we may
strike this header before the protocol is finished...
Differential Revision: https://phab.mercurial-scm.org/D2850
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 13 Mar 2018 14:15:10 -0700] rev 37049
hgweb: also set Content-Type header
Our HTTP/WSGI server may convert the Content-Type HTTP request
header to the CONTENT_TYPE WSGI environment key and not set
HTTP_CONTENT_TYPE. Other WSGI server implementations
do this, so I think the behavior is acceptable.
So assuming this HTTP request header could get "lost" by the WSGI
server, let's restore it on the request object like we do for
Content-Length.
FWIW, the WSGI server may also *invent* a Content-Type value. The
default behavior of Python's RFC 822 message class returns a default
media type if Content-Type isn't defined. This is kind of annoying.
But RFC 7231 section 3.1.1.5 does say the recipient may assume a media
type of application/octet-stream. Python's defaults are for
text/plain (given we're using an RFC 822 parser). But whatever.
Differential Revision: https://phab.mercurial-scm.org/D2849
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 13 Mar 2018 11:57:43 -0700] rev 37048
wireproto: require POST for all HTTPv2 requests
Wire protocol version 1 transfers argument data via request
headers by default. This has historically caused problems because
servers institute limits on the length of individual HTTP headers
as well as the total size of all request headers. Mercurial servers
can advertise the maximum length of an individual header. But
there's no guarantee any intermediate HTTP agents will accept
headers up to that length.
In the existing wire protocol, server operators typically also
key off the HTTP request method to implement authentication.
For example, GET requests translate to read-only requests and
can be allowed. But read-write commands must use POST and require
authentication. This has typically worked because the only wire
protocol commands that use POST modify the repo (e.g. the
"unbundle" command).
There is an experimental feature to enable clients to transmit
argument data via POST request bodies. This is technically a
better and more robust solution. But we can't enable it by default
because of servers assuming POST means write access.
In version 2 of the wire protocol, the permissions of a request
are encoded in the URL. And with it being a new protocol in a new
URL space, we're not constrained by backwards compatibility
requirements.
This commit adopts the technically superior mechanism of using
HTTP request bodies to send argument data by requiring POST for
all commands. Strictly speaking, it may be possible to send
request bodies on GET requests. But my experience is that not all
HTTP stacks support this. POST pretty much always works. Using POST
for read-only operations does sacrifice some RESTful design
purity. But this API cares about practicality, not about being
in Roy T. Fielding's REST ivory tower.
There's a chance we may relax this restriction in the future. But
for now, I want to see how far we can get with a POST only API.
Differential Revision: https://phab.mercurial-scm.org/D2837
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 19 Mar 2018 16:43:47 -0700] rev 37047
wireproto: define permissions-based routing of HTTPv2 wire protocol
Now that we have a scaffolding for serving version 2 of the HTTP
protocol, let's start implementing it.
A good place to start is URL routing and basic request processing
semantics. We can focus on content types, capabilities detect, etc
later.
Version 2 of the HTTP wire protocol encodes the needed permissions
of the request in the URL path. The reasons for this are documented
in the added documentation. In short, a) it makes it really easy and
fail proof for server administrators to implement path-based
authentication and b) it will enable clients to realize very early in
a server exchange that authentication will be required to complete
the operation. This latter point avoids all kinds of complexity and
problems, like dealing with Expect: 100-continue and clients finding
out later during `hg push` that they need to provide authentication.
This will avoid the current badness where clients send a full bundle,
get an HTTP 403, provide authentication, then retransmit the bundle.
In order to implement command checking, we needed to implement a
protocol handler for the new wire protocol. Our handler is just
small enough to run the code we've implemented.
Tests for the defined functionality have been added.
I very much want to refactor the permissions checking code and define
a better response format. But this can be done later. Nothing is
covered by backwards compatibility at this point.
Differential Revision: https://phab.mercurial-scm.org/D2836
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 13 Mar 2018 16:53:21 -0700] rev 37046
wireproto: support /api/* URL space for exposing APIs
I will soon be introducing a new version of the HTTP wire protocol.
One of the things I want to change with it is the URL routing.
I want to rely on URL paths to define endpoints rather than the
"cmd" query string argument. That should be pretty straightforward.
I was thinking about what URL space to reserve for the new protocol.
We /could/ put everything at a top-level path. e.g.
/wireproto/* or /http-v2-wireproto/*. However, these constrain us
a bit because they assume there will only be 1 API: version 2 of
the HTTP wire protocol. I think there is room to grow multiple
APIs. For example, there may someday be a proper JSON API to query
or even manipulate the repository. And I don't think we should have
to create a new top-level URL space for each API nor should we
attempt to shoehorn each future API into the same shared URL space:
that would just be too chaotic.
This commits reserves the /api/* URL space for all our future API
needs. Essentially, all requests to /api/* get routed to a new WSGI
handler. By default, it 404's the entire URL space unless the
"api server" feature is enabled. When enabled, requests to "/api"
list available APIs. URLs of the form /api/<name>/* are reserved for
a particular named API. Behavior within each API is left up to that
API. So, we can grow new APIs easily without worrying about URL
space conflicts.
APIs can be registered by adding entries to a global dict. This allows
extensions to provide their own APIs should they choose to do so.
This is probably a premature feature. But IMO the code is easier
to read if we're not dealing with API-specific behavior like config
option querying inline.
To prove it works, we implement a very basic API for version 2
of the HTTP wire protocol. It does nothing of value except
facilitate testing of the /api/* URL space.
We currently emit plain text responses for all /api/* endpoints.
There's definitely room to look at Accept and other request headers
to vary the response format. But we have to start somewhere.
Differential Revision: https://phab.mercurial-scm.org/D2834
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 13 Mar 2018 10:34:36 -0700] rev 37045
url: support suppressing Accept header
Sending this header automatically could interfere with future
testing and client behavior. Let's add a knob to disable the
behavior.
We don't have a control for User-Agent because urllib will send
it if we don't set something. I don't feel like hacking into the
bowels of urllib to figure out how to suppress that. UA shouldn't
be used for anything meaningful. So it shouldn't pose any problems
beyond non-determinism (since the header has the Mercurial version in
it).
Differential Revision: https://phab.mercurial-scm.org/D2843
Gregory Szorc <gregory.szorc@gmail.com> [Tue, 13 Mar 2018 11:20:07 -0700] rev 37044
util: don't log low-level I/O calls for HTTP peer
`hg debugwireproto` is useful for testing HTTP interactions. Possibly
more useful than `get-with-headers.py`. But one thing that makes it
annoying for mid-level tests is that it logs all API calls, such
as readline(). This makes output - especially headers - overly
verbose.
This commit teaches our file and socket observers to not log API
calls on functions dealing with data.
We change the behavior of `hg debugwireproto` to enable this mode
by default. --debug can be added to restore the previous behavior.
As the test changes demonstrate, this makes tests much easier to
read. As a bonus, it also removes some required (glob) over lengths
in system call results.
One thing that's lacking is knowing which side sent data. But we can
fix this in a follow-up once it becomes a problem.
Differential Revision: https://phab.mercurial-scm.org/D2842