hg-stable: Changelog

httppeer: advertise and support application/mercurial-0.2 Now that servers expose a capability indicating they support application/mercurial-0.2 and compression, clients can key off this to say they support responses that are compressed with various compression formats. After this commit, the HTTP wire protocol client now sends an "X-HgProto-<N>" request header indicating its support for "application/mercurial-0.2" media type and various compression formats. This commit also implements support for handling "application/mercurial-0.2" responses. It simply reads the header compression engine identifier then routes the remainder of the response to the appropriate decompressor. There were some test changes, but only to logging. That points to an obvious gap in our test coverage. This will be addressed in a subsequent commit once server support is in place (it is hard to test without server support).

wireproto: advertise supported media types and compression formats This commit introduces support for advertising a server's support for media types and compression formats in accordance with the spec defined in internals.wireproto. The bulk of the new code is a helper function in wireproto.py to obtain a prioritized list of compression engines available to the wire protocol. While not utilized yet, we implement support for obtaining the list of compression engines advertised by the client. The upcoming HTTP protocol enhancements are a bit lower-level than existing tests (most existing tests are command centric). So, this commit establishes a new test file that will be appropriate for holding tests around the functionality of the HTTP protocol itself. Rounding out this change, `hg debuginstall` now prints compression engines available to the server.

util: declare wire protocol support of compression engines This patch implements a new compression engine API allowing compression engines to declare support for the wire protocol. Support is declared by returning a compression format string identifier that will be added to payloads to signal the compression type of data that follows and default integer priorities of the engine. Accessor methods have been added to the compression engine manager class to facilitate use. Note that the "none" and "bz2" engines declare wire protocol support but aren't enabled by default due to their priorities being 0. It is essentially free from a coding perspective to support these compression formats, so we do it in case anyone may derive use from it.

internals: document compression negotiation As part of adding zstd support to all of the things, we'll need to teach the wire protocol to support non-zlib compression formats. This commit documents how we'll implement that. To understand how we arrived at this proposal, let's look at how things are done today. The wire protocol today doesn't have a unified format. Instead, there is a limited facility for differentiating replies as successful or not. And, each command essentially defines its own response format. A significant deficiency in the current protocol is the lack of payload framing over the SSH transport. In the HTTP transport, chunked transfer is used and the end of an HTTP response body (and the end of a Mercurial command response) can be identified by a 0 length chunk. This is how HTTP chunked transfer works. But in the SSH transport, there is no such framing, at least for certain responses (notably the response to "getbundle" requests). Clients can't simply read until end of stream because the socket is persistent and reused for multiple requests. Clients need to know when they've encountered the end of a request but there is nothing simple for them to key off of to detect this. So what happens is the client must decode the payload (as opposed to being dumb and forwarding frames/packets). This means the payload itself needs to support identifying end of stream. In some cases (bundle2), it also means the payload can encode "error" or "interrupt" events telling the client to e.g. abort processing. The lack of framing on the SSH transport and the transfer of its responsibilities to e.g. bundle2 is a massive layering violation and a wart on the protocol architecture. It needs to be fixed someday by inventing a proper framing protocol. So about compression. The client transport abstractions have a "_callcompressable()" API. This API is called to invoke a remote command that will send a compressible response. The response is essentially a "streaming" response (no framing data at the Mercurial layer) that is fed into a decompressor. On the HTTP transport, the decompressor is zlib and only zlib. There is currently no mechanism for the client to specify an alternate compression format. And, clients don't advertise what compression formats they support or ask the server to send a specific compression format. Instead, it is assumed that non-error responses to "compressible" commands are zlib compressed. On the SSH transport, there is no compression at the Mercurial protocol layer. Instead, compression must be handled by SSH itself (e.g. `ssh -C`) or within the payload data (e.g. bundle compression). For the HTTP transport, adding new compression formats is pretty straightforward. Once you know what decompressor to use, you can stream data into the decompressor until you reach a 0 size HTTP chunk, at which point you are at end of stream. So our wire protocol changes for the HTTP transport are pretty straightforward: the client and server advertise what compression formats they support and an appropriate compression format is chosen. We introduce a new HTTP media type to hold compressed payloads. The header of the payload defines the compression format being used. Whoever is on the receiving end can sniff the first few bytes route to an appropriate decompressor. Support for multiple compression formats is advertised on both server and client. The server advertises a "compression" capability saying which compression formats it supports and in what order they are preferred. Clients advertise their support for multiple compression formats and media types via the introduced "X-HgProto" request header. Strictly speaking, servers don't need to advertise which compression formats they support. But doing so allows clients to fail fast if they don't support any of the formats the server does. This is useful in situations like sending bundles, where the client may have to perform expensive computation before sending data to the server. Rather than simply advertise a list of supported compression formats, we introduce an additional "httpmediatype" server capability advertising which media types the server supports. This means servers are explicit about what formats they exchange. IMO, this is superior to inferring support from other capabilities (like "compression"). By advertising compression support on each request in the "X-HgProto" header and media type and direction at the server level, we are able to gradually transition existing commands/responses to the new media type and possibly compression. Contrast with the old world, where we only supported a single media type and the use of compression was built-in to the semantics of the command on both client and server. In the new world, if "application/mercurial-0.2" is supported, compression is supported. It's that simple. It's worth noting that we explicitly don't use "Accept," "Accept-Encoding," "Content-Encoding," or "Transfer-Encoding" for content negotiation and compression. People knowledgeable of the HTTP specifications will say that we should use these because that's what they are designed to be used for. They have a point and I sympathize with the argument. Earlier versions of this commit even defined supported media types in the "Accept" header. However, my years of experience rolling out services leveraging HTTP has taught me to not trust the HTTP layer, especially if you are going outside the normal spec (such as using a custom "Content-Encoding" value to represent zstd streams). I've seen load balancers, proxies, and other network devices do very bad and unexpected things to HTTP messages (like insisting zlib compressed content is decoded and then re-encoded at a different compression level or even stripping compression completely). I've found that the best way to avoid surprises when writing protocols on top of HTTP is to use HTTP as a dumb transport as much as possible to minimize the chances that an "intelligent" agent between endpoints will muck with your data. While the widespread use of TLS is mitigating many intermediate network agents interfering with HTTP, there are still problems at the edges, with e.g. the origin HTTP server needing to convert HTTP to and from WSGI and buggy or feature-lacking HTTP client implementations. I've found the best way to avoid these problems is to avoid using headers like "Content-Encoding" and to bake as much logic as possible into media types and HTTP message bodies. The protocol changes in this commit do rely on a custom HTTP request header and the "Content-Type" headers. But we used them before, so we shouldn't be increasing our exposure to "bad" HTTP agents. For the SSH transport, we can't easily implement content negotiation to determine compression formats because the SSH transport has no content negotiation capabilities today. And without a framing protocol, we don't know how much data to feed into a decompressor. So in order to implement compression support on the SSH transport, we'd need to invent a mechanism to represent content types and an outer framing protocol to stream data robustly. While I'm fully capable of doing that, it is a lot of work and not something that should be undertaken lightly. My opinion is that if we're going to change the SSH transport protocol, we should take a long hard look at implementing a grand unified protocol that attempts to address all the deficiencies with the existing protocol. While I want this to happen, that would be massive scope bloat standing in the way of zstd support. So, I've decided to take the easy solution: the SSH transport will not gain support for multiple compression formats. Keep in mind it doesn't support *any* compression today. So essentially nothing is changing on the SSH front.

httppeer: extract code for HTTP header spanning A second consumer of HTTP header spanning will soon be introduced. Factor out the code to do this so it can be reused.

commands: config option to control bundle compression level Currently, bundle compression uses the default compression level for the active compression engine. The default compression level is tuned as a compromise between speed and size. Some scenarios may call for a different compression level. For example, with clone bundles, bundles are generated once and used several times. Since the cost to generate is paid infrequently, server operators may wish to trade extra CPU time for better compression ratios. This patch introduces an experimental and undocumented config option to control the bundle compression level. As the inline comment says, this approach is a bit hacky. I'd prefer for the compression level to be encoded in the bundle spec. e.g. "zstd-v2;complevel=15." However, given that the 4.1 freeze is imminent, I'm not comfortable implementing this user-facing change without much time to test and consider the implications. So, we're going with the quick and dirty solution for now. Having this option in the 4.1 release will enable Mozilla to easily produce and test zlib and zstd bundles with non-default compression levels in production. This will help drive future development of the feature and zstd integration with Mercurial.

bundle2: allow compression options to be passed to compressor Compression engines allow options to be passed to them to control behavior. This patch exposes an argument to bundle2.writebundle() that passes options to the compression engine when writing compressed bundles. The argument is honored for both bundle1 and bundle2, the latter requiring a bit of plumbing to pass the value around.

chg: check snprintf result strictly This makes the program more robust when somebody changes hgclient's maxdatasize in the future.

rebase: provide detailed hint to abort message if working dir is not clean Detailed hint message is now provided when 'pull --rebase' operation detects unclean working dir, for example: abort: uncommitted changes (cannot pull with rebase: please commit or shelve your changes first) Added tests for uncommitted merge, and for subrepo support verifying that same hint is also passed to subrepo state check.

revset: parse variable-length arguments of followlines() by getargsdict()

parser: extend buildargsdict() to support variable-length positional args This can simplify the argument parsing of followlines(). Tests are added by the next patch.

parser: make buildargsdict() precompute position where keyword args start This prepares for adding *varargs support. See the next patch.

chg: change server's process title This patch uses the newly introduced "setprocname" interface to update the process title server-side, to make it easier to tell what a worker is actually doing. The new title is "chg[worker/$PID]", where PID is the process ID of the connected client. It can be directly observed using "ps -AF" under Linux, or "ps -A" under FreeBSD.

chgserver: add the setprocname interface This allows clients to change its process title freely.

hgweb: use archivespecs for links on repo index page too Moving archivespecs to the module level allows using it from other modules (such as hgwebdir_mod), and keeping a reference to it in requestcontext allows current code to just work.

hgweb: use util.sortdict for archivespecs Thus we allow dict-like indexing and "in" checks, and also preserve the order of archive types and can generate links in a certain order (so requestcontext.archives is no longer needed).

hgweb: test the order of archive links

revlog: REVIDX_EXTSTORED flag This flag will be used by the lfs extension to mark the revision data as stored externally.

revlog: flag processor Add the ability for revlog objects to process revision flags and apply registered transforms on read/write operations. This patch introduces: - the 'revlog._processflags()' method that looks at revision flags and applies flag processors registered on them. Due to the need to handle non-commutative operations, flag transforms are applied in stable order but the order in which the transforms are applied is reversed between read and write operations. - the 'addflagprocessor()' method allowing to register processors on flags. Flag processors are defined as a 3-tuple of (read, write, raw) functions to be applied depending on the operation being performed. - an update on 'revlog.addrevision()' behavior. The current flagprocessor design relies on extensions to wrap around 'addrevision()' to set flags on revision data, and on the flagprocessor to perform the actual transformation of its contents. In the lfs case, this means we need to process flags before we meet the 2GB size check, leading to performing some operations before it happens: - if flags are set on the revision data, we assume some extensions might be modifying the contents using the flag processor next, and we compute the node for the original revision data (still allowing extension to override the node by wrapping around 'addrevision()'). - we then invoke the flag processor to apply registered transforms (in lfs's case, drastically reducing the size of large blobs). - finally, we proceed with the 2GB size check. Note: In the case a cachedelta is passed to 'addrevision()' and we detect the flag processor modified the revision data, we chose to trust the flag processor and drop the cachedelta.

revlog: pass revlog flags to addrevision Adding the ability to passing flags to addrevision instead of simply passing default flags to _addrevision will allow extensions relying on flag transforms to wrap around addrevision() in order to update revlog flags. The first use case of this patch will be the lfs extension marking nodes as stored externally when the contents are larger than the defined threshold. One of the reasons leading to setting flags in addrevision() wrappers in the flag processor design is that it allows to detect files larger than the 2GB limit before the check is performed, which allows lfs to transform the contents into metadata.

revlog: add 'raw' argument to revision and _addrevision This patch introduces a new 'raw' argument (defaults to False) to revlog's revision() and _addrevision() methods. When the 'raw' argument is set to True, it indicates the revision data should be handled as raw data by the flagprocessor. Note: Given revlog.addgroup() calls are restricted to changegroup generation, we can always set raw to True when calling revlog._addrevision() from revlog.addgroup().

pager: do not special case chg Since chg has its own _runpager implementation, it's no longer necessary to special-case chg in the pager extension. This will effectively enable the new chg pager code path that runs inside runcommand.

chg: remove getpager support We have enough bits to switch to the new chg pager code path in runcommand. So just remove the legacy getpager support. This is a red-only patch, and will break chg's pager support temporarily.

chgserver: implement chgui._runpager This patch implements chgui._runpager in a relatively simple way. A more clean way is to move the core logic of "attachio" to "ui", which will be done later after chg runs uisetup per request.