changeset 39558:b0e0db1565d1

internals: extract frame-based protocol docs to own document wireprotocol.txt is quite long and difficult to digest. The frame-based protocol is effectively a standalone concept (and could even be used outside of Mercurial). So this commit extracts its docs to a standalone file. The first few paragraphs were rewritten as part of the extraction. Sections headers were adjusted accordingly. Existing referalls in wireprotocol.txt were updated to refer to the new doc / concept, which I've started referring to as `hgrpc`. I'm on the fence as to whether to move the HTTP and SSH transport details to the new doc as well. For now, I'm leaving them in wireprotocol.txt. Differential Revision: https://phab.mercurial-scm.org/D4443
author Gregory Szorc <gregory.szorc@gmail.com>
date Mon, 27 Aug 2018 13:30:44 -0700
parents 623081f2abc2
children 07b58266bce3
files contrib/wix/help.wxs mercurial/help/internals/wireprotocol.txt mercurial/help/internals/wireprotocolrpc.txt
diffstat 3 files changed, 528 insertions(+), 508 deletions(-) [+]
line wrap: on
line diff
--- a/contrib/wix/help.wxs	Wed Sep 12 22:19:29 2018 +0900
+++ b/contrib/wix/help.wxs	Mon Aug 27 13:30:44 2018 -0700
@@ -51,6 +51,7 @@
             <File Id="internals.requirements.txt" Name="requirements.txt" />
             <File Id="internals.revlogs.txt"      Name="revlogs.txt" />
             <File Id="internals.wireprotocol.txt" Name="wireprotocol.txt" />
+            <File Id="internals.wireprotocolrpc.txt" Name="wireprotocolrpc.txt" />
             <File Id="internals.wireprotocolv2.txt" Name="wireprotocolv2.txt" />
           </Component>
         </Directory>
--- a/mercurial/help/internals/wireprotocol.txt	Wed Sep 12 22:19:29 2018 +0900
+++ b/mercurial/help/internals/wireprotocol.txt	Mon Aug 27 13:30:44 2018 -0700
@@ -220,9 +220,10 @@
 Requests to unknown commands or URLS result in an HTTP 404.
 TODO formally define response type, how error is communicated, etc.
 
-HTTP request and response bodies use the *Unified Frame-Based Protocol*
-(defined below) for media exchange. The entirety of the HTTP message
-body is 0 or more frames as defined by this protocol.
+HTTP request and response bodies use the ``hgrpc`` protocol for media
+exchange.` (See :hg:`help internals.wireprotocolrpc` for details of
+the protocol.) The entirety of the HTTP message body is 0 or more frames
+as defined by this protocol.
 
 Clients and servers MUST advertise the ``TBD`` media type via the
 ``Content-Type`` request and response headers. In addition, clients MUST
@@ -236,11 +237,10 @@
 Servers receiving requests with an invalid ``Content-Type`` header SHOULD
 respond with an HTTP 415.
 
-The command to run is specified in the POST payload as defined by the
-*Unified Frame-Based Protocol*. This is redundant with data already
-encoded in the URL. This is by design, so server operators can have
-better understanding about server activity from looking merely at
-HTTP access logs.
+The command to run is specified in the POST payload as defined by ``hgrpc``.
+This is redundant with data already encoded in the URL. This is by design,
+so server operators can have better understanding about server activity from
+looking merely at HTTP access logs.
 
 In most circumstances, the command specified in the URL MUST match
 the command specified in the frame-based payload or the server will
@@ -254,9 +254,9 @@
 *any* command and allow the execution of multiple commands. If the
 HTTP request issues multiple commands across multiple frames, all
 issued commands will be processed by the server. Per the defined
-behavior of the *Unified Frame-Based Protocol*, commands may be
-issued interleaved and responses may come back in a different order
-than they were issued. Clients MUST be able to deal with this.
+behavior of ``hgrpc```, commands may be issued interleaved and responses
+may come back in a different order than they were issued. Clients MUST
+be able to deal with this.
 
 SSH Protocol
 ============
@@ -513,503 +513,6 @@
 Following capabilities advertisement, the peers communicate using version
 1 of the SSH transport.
 
-Unified Frame-Based Protocol
-============================
-
-**Experimental and under development**
-
-The *Unified Frame-Based Protocol* is a communications protocol between
-Mercurial peers. The protocol aims to be mostly transport agnostic
-(works similarly on HTTP, SSH, etc).
-
-To operate the protocol, a bi-directional, half-duplex pipe supporting
-ordered sends and receives is required. That is, each peer has one pipe
-for sending data and another for receiving.
-
-All data is read and written in atomic units called *frames*. These
-are conceptually similar to TCP packets. Higher-level functionality
-is built on the exchange and processing of frames.
-
-All frames are associated with a *stream*. A *stream* provides a
-unidirectional grouping of frames. Streams facilitate two goals:
-content encoding and parallelism. There is a dedicated section on
-streams below.
-
-The protocol is request-response based: the client issues requests to
-the server, which issues replies to those requests. Server-initiated
-messaging is not currently supported, but this specification carves
-out room to implement it.
-
-All frames are associated with a numbered request. Frames can thus
-be logically grouped by their request ID.
-
-Frames begin with an 8 octet header followed by a variable length
-payload::
-
-    +------------------------------------------------+
-    |                 Length (24)                    |
-    +--------------------------------+---------------+
-    |         Request ID (16)        | Stream ID (8) |
-    +------------------+-------------+---------------+
-    | Stream Flags (8) |
-    +-----------+------+
-    | Type (4)  |
-    +-----------+
-    | Flags (4) |
-    +===========+===================================================|
-    |                     Frame Payload (0...)                    ...
-    +---------------------------------------------------------------+
-
-The length of the frame payload is expressed as an unsigned 24 bit
-little endian integer. Values larger than 65535 MUST NOT be used unless
-given permission by the server as part of the negotiated capabilities
-during the handshake. The frame header is not part of the advertised
-frame length. The payload length is the over-the-wire length. If there
-is content encoding applied to the payload as part of the frame's stream,
-the length is the output of that content encoding, not the input.
-
-The 16-bit ``Request ID`` field denotes the integer request identifier,
-stored as an unsigned little endian integer. Odd numbered requests are
-client-initiated. Even numbered requests are server-initiated. This
-refers to where the *request* was initiated - not where the *frame* was
-initiated, so servers will send frames with odd ``Request ID`` in
-response to client-initiated requests. Implementations are advised to
-start ordering request identifiers at ``1`` and ``0``, increment by
-``2``, and wrap around if all available numbers have been exhausted.
-
-The 8-bit ``Stream ID`` field denotes the stream that the frame is
-associated with. Frames belonging to a stream may have content
-encoding applied and the receiver may need to decode the raw frame
-payload to obtain the original data. Odd numbered IDs are
-client-initiated. Even numbered IDs are server-initiated.
-
-The 8-bit ``Stream Flags`` field defines stream processing semantics.
-See the section on streams below.
-
-The 4-bit ``Type`` field denotes the type of frame being sent.
-
-The 4-bit ``Flags`` field defines special, per-type attributes for
-the frame.
-
-The sections below define the frame types and their behavior.
-
-Command Request (``0x01``)
---------------------------
-
-This frame contains a request to run a command.
-
-The payload consists of a CBOR map defining the command request. The
-bytestring keys of that map are:
-
-name
-   Name of the command that should be executed (bytestring).
-args
-   Map of bytestring keys to various value types containing the named
-   arguments to this command.
-
-   Each command defines its own set of argument names and their expected
-   types.
-
-This frame type MUST ONLY be sent from clients to servers: it is illegal
-for a server to send this frame to a client.
-
-The following flag values are defined for this type:
-
-0x01
-   New command request. When set, this frame represents the beginning
-   of a new request to run a command. The ``Request ID`` attached to this
-   frame MUST NOT be active.
-0x02
-   Command request continuation. When set, this frame is a continuation
-   from a previous command request frame for its ``Request ID``. This
-   flag is set when the CBOR data for a command request does not fit
-   in a single frame.
-0x04
-   Additional frames expected. When set, the command request didn't fit
-   into a single frame and additional CBOR data follows in a subsequent
-   frame.
-0x08
-   Command data frames expected. When set, command data frames are
-   expected to follow the final command request frame for this request.
-
-``0x01`` MUST be set on the initial command request frame for a
-``Request ID``.
-
-``0x01`` or ``0x02`` MUST be set to indicate this frame's role in
-a series of command request frames.
-
-If command data frames are to be sent, ``0x08`` MUST be set on ALL
-command request frames.
-
-Command Data (``0x02``)
------------------------
-
-This frame contains raw data for a command.
-
-Most commands can be executed by specifying arguments. However,
-arguments have an upper bound to their length. For commands that
-accept data that is beyond this length or whose length isn't known
-when the command is initially sent, they will need to stream
-arbitrary data to the server. This frame type facilitates the sending
-of this data.
-
-The payload of this frame type consists of a stream of raw data to be
-consumed by the command handler on the server. The format of the data
-is command specific.
-
-The following flag values are defined for this type:
-
-0x01
-   Command data continuation. When set, the data for this command
-   continues into a subsequent frame.
-
-0x02
-   End of data. When set, command data has been fully sent to the
-   server. The command has been fully issued and no new data for this
-   command will be sent. The next frame will belong to a new command.
-
-Command Response Data (``0x03``)
---------------------------------
-
-This frame contains response data to an issued command.
-
-Response data ALWAYS consists of a series of 1 or more CBOR encoded
-values. A CBOR value may be using indefinite length encoding. And the
-bytes constituting the value may span several frames.
-
-The following flag values are defined for this type:
-
-0x01
-   Data continuation. When set, an additional frame containing response data
-   will follow.
-0x02
-   End of data. When set, the response data has been fully sent and
-   no additional frames for this response will be sent.
-
-The ``0x01`` flag is mutually exclusive with the ``0x02`` flag.
-
-Error Occurred (``0x05``)
--------------------------
-
-Some kind of error occurred.
-
-There are 3 general kinds of failures that can occur:
-
-* Command error encountered before any response issued
-* Command error encountered after a response was issued
-* Protocol or stream level error
-
-This frame type is used to capture the latter cases. (The general
-command error case is handled by the leading CBOR map in
-``Command Response`` frames.)
-
-The payload of this frame contains a CBOR map detailing the error. That
-map has the following bytestring keys:
-
-type
-   (bytestring) The overall type of error encountered. Can be one of the
-   following values:
-
-   protocol
-      A protocol-level error occurred. This typically means someone
-      is violating the framing protocol semantics and the server is
-      refusing to proceed.
-
-   server
-      A server-level error occurred. This typically indicates some kind of
-      logic error on the server, likely the fault of the server.
-
-   command
-      A command-level error, likely the fault of the client.
-
-message
-   (array of maps) A richly formatted message that is intended for
-   human consumption. See the ``Human Output Side-Channel`` frame
-   section for a description of the format of this data structure.
-
-Human Output Side-Channel (``0x06``)
-------------------------------------
-
-This frame contains a message that is intended to be displayed to
-people. Whereas most frames communicate machine readable data, this
-frame communicates textual data that is intended to be shown to
-humans.
-
-The frame consists of a series of *formatting requests*. Each formatting
-request consists of a formatting string, arguments for that formatting
-string, and labels to apply to that formatting string.
-
-A formatting string is a printf()-like string that allows variable
-substitution within the string. Labels allow the rendered text to be
-*decorated*. Assuming use of the canonical Mercurial code base, a
-formatting string can be the input to the ``i18n._`` function. This
-allows messages emitted from the server to be localized. So even if
-the server has different i18n settings, people could see messages in
-their *native* settings. Similarly, the use of labels allows
-decorations like coloring and underlining to be applied using the
-client's configured rendering settings.
-
-Formatting strings are similar to ``printf()`` strings or how
-Python's ``%`` operator works. The only supported formatting sequences
-are ``%s`` and ``%%``. ``%s`` will be replaced by whatever the string
-at that position resolves to. ``%%`` will be replaced by ``%``. All
-other 2-byte sequences beginning with ``%`` represent a literal
-``%`` followed by that character. However, future versions of the
-wire protocol reserve the right to allow clients to opt in to receiving
-formatting strings with additional formatters, hence why ``%%`` is
-required to represent the literal ``%``.
-
-The frame payload consists of a CBOR array of CBOR maps. Each map
-defines an *atom* of text data to print. Each *atom* has the following
-bytestring keys:
-
-msg
-   (bytestring) The formatting string. Content MUST be ASCII.
-args (optional)
-   Array of bytestrings defining arguments to the formatting string.
-labels (optional)
-   Array of bytestrings defining labels to apply to this atom.
-
-All data to be printed MUST be encoded into a single frame: this frame
-does not support spanning data across multiple frames.
-
-All textual data encoded in these frames is assumed to be line delimited.
-The last atom in the frame SHOULD end with a newline (``\n``). If it
-doesn't, clients MAY add a newline to facilitate immediate printing.
-
-Progress Update (``0x07``)
---------------------------
-
-This frame holds the progress of an operation on the peer. Consumption
-of these frames allows clients to display progress bars, estimated
-completion times, etc.
-
-Each frame defines the progress of a single operation on the peer. The
-payload consists of a CBOR map with the following bytestring keys:
-
-topic
-   Topic name (string)
-pos
-   Current numeric position within the topic (integer)
-total
-   Total/end numeric position of this topic (unsigned integer)
-label (optional)
-   Unit label (string)
-item (optional)
-   Item name (string)
-
-Progress state is created when a frame is received referencing a
-*topic* that isn't currently tracked. Progress tracking for that
-*topic* is finished when a frame is received reporting the current
-position of that topic as ``-1``.
-
-Multiple *topics* may be active at any given time.
-
-Rendering of progress information is not mandated or governed by this
-specification: implementations MAY render progress information however
-they see fit, including not at all.
-
-The string data describing the topic SHOULD be static strings to
-facilitate receivers localizing that string data. The emitter
-MUST normalize all string data to valid UTF-8 and receivers SHOULD
-validate that received data conforms to UTF-8. The topic name
-SHOULD be ASCII.
-
-Stream Encoding Settings (``0x08``)
------------------------------------
-
-This frame type holds information defining the content encoding
-settings for a *stream*.
-
-This frame type is likely consumed by the protocol layer and is not
-passed on to applications.
-
-This frame type MUST ONLY occur on frames having the *Beginning of Stream*
-``Stream Flag`` set.
-
-The payload of this frame defines what content encoding has (possibly)
-been applied to the payloads of subsequent frames in this stream.
-
-The payload begins with an 8-bit integer defining the length of the
-encoding *profile*, followed by the string name of that profile, which
-must be an ASCII string. All bytes that follow can be used by that
-profile for supplemental settings definitions. See the section below
-on defined encoding profiles.
-
-Stream States and Flags
------------------------
-
-Streams can be in two states: *open* and *closed*. An *open* stream
-is active and frames attached to that stream could arrive at any time.
-A *closed* stream is not active. If a frame attached to a *closed*
-stream arrives, that frame MUST have an appropriate stream flag
-set indicating beginning of stream. All streams are in the *closed*
-state by default.
-
-The ``Stream Flags`` field denotes a set of bit flags for defining
-the relationship of this frame within a stream. The following flags
-are defined:
-
-0x01
-   Beginning of stream. The first frame in the stream MUST set this
-   flag. When received, the ``Stream ID`` this frame is attached to
-   becomes ``open``.
-
-0x02
-   End of stream. The last frame in a stream MUST set this flag. When
-   received, the ``Stream ID`` this frame is attached to becomes
-   ``closed``. Any content encoding context associated with this stream
-   can be destroyed after processing the payload of this frame.
-
-0x04
-   Apply content encoding. When set, any content encoding settings
-   defined by the stream should be applied when attempting to read
-   the frame. When not set, the frame payload isn't encoded.
-
-Streams
--------
-
-Streams - along with ``Request IDs`` - facilitate grouping of frames.
-But the purpose of each is quite different and the groupings they
-constitute are independent.
-
-A ``Request ID`` is essentially a tag. It tells you which logical
-request a frame is associated with.
-
-A *stream* is a sequence of frames grouped for the express purpose
-of applying a stateful encoding or for denoting sub-groups of frames.
-
-Unlike ``Request ID``s which span the request and response, a stream
-is unidirectional and stream IDs are independent from client to
-server.
-
-There is no strict hierarchical relationship between ``Request IDs``
-and *streams*. A stream can contain frames having multiple
-``Request IDs``. Frames belonging to the same ``Request ID`` can
-span multiple streams.
-
-One goal of streams is to facilitate content encoding. A stream can
-define an encoding to be applied to frame payloads. For example, the
-payload transmitted over the wire may contain output from a
-zstandard compression operation and the receiving end may decompress
-that payload to obtain the original data.
-
-The other goal of streams is to facilitate concurrent execution. For
-example, a server could spawn 4 threads to service a request that can
-be easily parallelized. Each of those 4 threads could write into its
-own stream. Those streams could then in turn be delivered to 4 threads
-on the receiving end, with each thread consuming its stream in near
-isolation. The *main* thread on both ends merely does I/O and
-encodes/decodes frame headers: the bulk of the work is done by worker
-threads.
-
-In addition, since content encoding is defined per stream, each
-*worker thread* could perform potentially CPU bound work concurrently
-with other threads. This approach of applying encoding at the
-sub-protocol / stream level eliminates a potential resource constraint
-on the protocol stream as a whole (it is common for the throughput of
-a compression engine to be smaller than the throughput of a network).
-
-Having multiple streams - each with their own encoding settings - also
-facilitates the use of advanced data compression techniques. For
-example, a transmitter could see that it is generating data faster
-and slower than the receiving end is consuming it and adjust its
-compression settings to trade CPU for compression ratio accordingly.
-
-While streams can define a content encoding, not all frames within
-that stream must use that content encoding. This can be useful when
-data is being served from caches and being derived dynamically. A
-cache could pre-compressed data so the server doesn't have to
-recompress it. The ability to pick and choose which frames are
-compressed allows servers to easily send data to the wire without
-involving potentially expensive encoding overhead.
-
-Content Encoding Profiles
--------------------------
-
-Streams can have named content encoding *profiles* associated with
-them. A profile defines a shared understanding of content encoding
-settings and behavior.
-
-The following profiles are defined:
-
-TBD
-
-Command Protocol
-----------------
-
-A client can request that a remote run a command by sending it
-frames defining that command. This logical stream is composed of
-1 or more ``Command Request`` frames and and 0 or more ``Command Data``
-frames.
-
-All frames composing a single command request MUST be associated with
-the same ``Request ID``.
-
-Clients MAY send additional command requests without waiting on the
-response to a previous command request. If they do so, they MUST ensure
-that the ``Request ID`` field of outbound frames does not conflict
-with that of an active ``Request ID`` whose response has not yet been
-fully received.
-
-Servers MAY respond to commands in a different order than they were
-sent over the wire. Clients MUST be prepared to deal with this. Servers
-also MAY start executing commands in a different order than they were
-received, or MAY execute multiple commands concurrently.
-
-If there is a dependency between commands or a race condition between
-commands executing (e.g. a read-only command that depends on the results
-of a command that mutates the repository), then clients MUST NOT send
-frames issuing a command until a response to all dependent commands has
-been received.
-TODO think about whether we should express dependencies between commands
-to avoid roundtrip latency.
-
-A command is defined by a command name, 0 or more command arguments,
-and optional command data.
-
-Arguments are the recommended mechanism for transferring fixed sets of
-parameters to a command. Data is appropriate for transferring variable
-data. Thinking in terms of HTTP, arguments would be headers and data
-would be the message body.
-
-It is recommended for servers to delay the dispatch of a command
-until all argument have been received. Servers MAY impose limits on the
-maximum argument size.
-TODO define failure mechanism.
-
-Servers MAY dispatch to commands immediately once argument data
-is available or delay until command data is received in full.
-
-Once a ``Command Request`` frame is sent, a client must be prepared to
-receive any of the following frames associated with that request:
-``Command Response``, ``Error Response``, ``Human Output Side-Channel``,
-``Progress Update``.
-
-The *main* response for a command will be in ``Command Response`` frames.
-The payloads of these frames consist of 1 or more CBOR encoded values.
-The first CBOR value on the first ``Command Response`` frame is special
-and denotes the overall status of the command. This CBOR map contains
-the following bytestring keys:
-
-status
-   (bytestring) A well-defined message containing the overall status of
-   this command request. The following values are defined:
-
-   ok
-      The command was received successfully and its response follows.
-   error
-      There was an error processing the command. More details about the
-      error are encoded in the ``error`` key.
-
-error (optional)
-   A map containing information about an encountered error. The map has the
-   following keys:
-
-   message
-      (array of maps) A message describing the error. The message uses the
-      same format as those in the ``Human Output Side-Channel`` frame.
-
 Capabilities
 ============
 
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/mercurial/help/internals/wireprotocolrpc.txt	Mon Aug 27 13:30:44 2018 -0700
@@ -0,0 +1,516 @@
+**Experimental and under development**
+
+This document describe's Mercurial's transport-agnostic remote procedure
+call (RPC) protocol which is used to perform interactions with remote
+servers. This protocol is also referred to as ``hgrpc``.
+
+The protocol has the following high-level features:
+
+* Concurrent request and response support (multiple commands can be issued
+  simultaneously and responses can be streamed simultaneously).
+* Supports half-duplex and full-duplex connections.
+* All data is transmitted within *frames*, which have a well-defined
+  header and encode their length.
+* Side-channels for sending progress updates and printing output. Text
+  output from the remote can be localized locally.
+* Support for simultaneous and long-lived compression streams, even across
+  requests.
+* Uses CBOR for data exchange.
+
+The protocol is not specific to Mercurial and could be used by other
+applications.
+
+High-level Overview
+===================
+
+To operate the protocol, a bi-directional, half-duplex pipe supporting
+ordered sends and receives is required. That is, each peer has one pipe
+for sending data and another for receiving. Full-duplex pipes are also
+supported.
+
+All data is read and written in atomic units called *frames*. These
+are conceptually similar to TCP packets. Higher-level functionality
+is built on the exchange and processing of frames.
+
+All frames are associated with a *stream*. A *stream* provides a
+unidirectional grouping of frames. Streams facilitate two goals:
+content encoding and parallelism. There is a dedicated section on
+streams below.
+
+The protocol is request-response based: the client issues requests to
+the server, which issues replies to those requests. Server-initiated
+messaging is not currently supported, but this specification carves
+out room to implement it.
+
+All frames are associated with a numbered request. Frames can thus
+be logically grouped by their request ID.
+
+Frames
+======
+
+Frames begin with an 8 octet header followed by a variable length
+payload::
+
+    +------------------------------------------------+
+    |                 Length (24)                    |
+    +--------------------------------+---------------+
+    |         Request ID (16)        | Stream ID (8) |
+    +------------------+-------------+---------------+
+    | Stream Flags (8) |
+    +-----------+------+
+    | Type (4)  |
+    +-----------+
+    | Flags (4) |
+    +===========+===================================================|
+    |                     Frame Payload (0...)                    ...
+    +---------------------------------------------------------------+
+
+The length of the frame payload is expressed as an unsigned 24 bit
+little endian integer. Values larger than 65535 MUST NOT be used unless
+given permission by the server as part of the negotiated capabilities
+during the handshake. The frame header is not part of the advertised
+frame length. The payload length is the over-the-wire length. If there
+is content encoding applied to the payload as part of the frame's stream,
+the length is the output of that content encoding, not the input.
+
+The 16-bit ``Request ID`` field denotes the integer request identifier,
+stored as an unsigned little endian integer. Odd numbered requests are
+client-initiated. Even numbered requests are server-initiated. This
+refers to where the *request* was initiated - not where the *frame* was
+initiated, so servers will send frames with odd ``Request ID`` in
+response to client-initiated requests. Implementations are advised to
+start ordering request identifiers at ``1`` and ``0``, increment by
+``2``, and wrap around if all available numbers have been exhausted.
+
+The 8-bit ``Stream ID`` field denotes the stream that the frame is
+associated with. Frames belonging to a stream may have content
+encoding applied and the receiver may need to decode the raw frame
+payload to obtain the original data. Odd numbered IDs are
+client-initiated. Even numbered IDs are server-initiated.
+
+The 8-bit ``Stream Flags`` field defines stream processing semantics.
+See the section on streams below.
+
+The 4-bit ``Type`` field denotes the type of frame being sent.
+
+The 4-bit ``Flags`` field defines special, per-type attributes for
+the frame.
+
+The sections below define the frame types and their behavior.
+
+Command Request (``0x01``)
+--------------------------
+
+This frame contains a request to run a command.
+
+The payload consists of a CBOR map defining the command request. The
+bytestring keys of that map are:
+
+name
+   Name of the command that should be executed (bytestring).
+args
+   Map of bytestring keys to various value types containing the named
+   arguments to this command.
+
+   Each command defines its own set of argument names and their expected
+   types.
+
+This frame type MUST ONLY be sent from clients to servers: it is illegal
+for a server to send this frame to a client.
+
+The following flag values are defined for this type:
+
+0x01
+   New command request. When set, this frame represents the beginning
+   of a new request to run a command. The ``Request ID`` attached to this
+   frame MUST NOT be active.
+0x02
+   Command request continuation. When set, this frame is a continuation
+   from a previous command request frame for its ``Request ID``. This
+   flag is set when the CBOR data for a command request does not fit
+   in a single frame.
+0x04
+   Additional frames expected. When set, the command request didn't fit
+   into a single frame and additional CBOR data follows in a subsequent
+   frame.
+0x08
+   Command data frames expected. When set, command data frames are
+   expected to follow the final command request frame for this request.
+
+``0x01`` MUST be set on the initial command request frame for a
+``Request ID``.
+
+``0x01`` or ``0x02`` MUST be set to indicate this frame's role in
+a series of command request frames.
+
+If command data frames are to be sent, ``0x08`` MUST be set on ALL
+command request frames.
+
+Command Data (``0x02``)
+-----------------------
+
+This frame contains raw data for a command.
+
+Most commands can be executed by specifying arguments. However,
+arguments have an upper bound to their length. For commands that
+accept data that is beyond this length or whose length isn't known
+when the command is initially sent, they will need to stream
+arbitrary data to the server. This frame type facilitates the sending
+of this data.
+
+The payload of this frame type consists of a stream of raw data to be
+consumed by the command handler on the server. The format of the data
+is command specific.
+
+The following flag values are defined for this type:
+
+0x01
+   Command data continuation. When set, the data for this command
+   continues into a subsequent frame.
+
+0x02
+   End of data. When set, command data has been fully sent to the
+   server. The command has been fully issued and no new data for this
+   command will be sent. The next frame will belong to a new command.
+
+Command Response Data (``0x03``)
+--------------------------------
+
+This frame contains response data to an issued command.
+
+Response data ALWAYS consists of a series of 1 or more CBOR encoded
+values. A CBOR value may be using indefinite length encoding. And the
+bytes constituting the value may span several frames.
+
+The following flag values are defined for this type:
+
+0x01
+   Data continuation. When set, an additional frame containing response data
+   will follow.
+0x02
+   End of data. When set, the response data has been fully sent and
+   no additional frames for this response will be sent.
+
+The ``0x01`` flag is mutually exclusive with the ``0x02`` flag.
+
+Error Occurred (``0x05``)
+-------------------------
+
+Some kind of error occurred.
+
+There are 3 general kinds of failures that can occur:
+
+* Command error encountered before any response issued
+* Command error encountered after a response was issued
+* Protocol or stream level error
+
+This frame type is used to capture the latter cases. (The general
+command error case is handled by the leading CBOR map in
+``Command Response`` frames.)
+
+The payload of this frame contains a CBOR map detailing the error. That
+map has the following bytestring keys:
+
+type
+   (bytestring) The overall type of error encountered. Can be one of the
+   following values:
+
+   protocol
+      A protocol-level error occurred. This typically means someone
+      is violating the framing protocol semantics and the server is
+      refusing to proceed.
+
+   server
+      A server-level error occurred. This typically indicates some kind of
+      logic error on the server, likely the fault of the server.
+
+   command
+      A command-level error, likely the fault of the client.
+
+message
+   (array of maps) A richly formatted message that is intended for
+   human consumption. See the ``Human Output Side-Channel`` frame
+   section for a description of the format of this data structure.
+
+Human Output Side-Channel (``0x06``)
+------------------------------------
+
+This frame contains a message that is intended to be displayed to
+people. Whereas most frames communicate machine readable data, this
+frame communicates textual data that is intended to be shown to
+humans.
+
+The frame consists of a series of *formatting requests*. Each formatting
+request consists of a formatting string, arguments for that formatting
+string, and labels to apply to that formatting string.
+
+A formatting string is a printf()-like string that allows variable
+substitution within the string. Labels allow the rendered text to be
+*decorated*. Assuming use of the canonical Mercurial code base, a
+formatting string can be the input to the ``i18n._`` function. This
+allows messages emitted from the server to be localized. So even if
+the server has different i18n settings, people could see messages in
+their *native* settings. Similarly, the use of labels allows
+decorations like coloring and underlining to be applied using the
+client's configured rendering settings.
+
+Formatting strings are similar to ``printf()`` strings or how
+Python's ``%`` operator works. The only supported formatting sequences
+are ``%s`` and ``%%``. ``%s`` will be replaced by whatever the string
+at that position resolves to. ``%%`` will be replaced by ``%``. All
+other 2-byte sequences beginning with ``%`` represent a literal
+``%`` followed by that character. However, future versions of the
+wire protocol reserve the right to allow clients to opt in to receiving
+formatting strings with additional formatters, hence why ``%%`` is
+required to represent the literal ``%``.
+
+The frame payload consists of a CBOR array of CBOR maps. Each map
+defines an *atom* of text data to print. Each *atom* has the following
+bytestring keys:
+
+msg
+   (bytestring) The formatting string. Content MUST be ASCII.
+args (optional)
+   Array of bytestrings defining arguments to the formatting string.
+labels (optional)
+   Array of bytestrings defining labels to apply to this atom.
+
+All data to be printed MUST be encoded into a single frame: this frame
+does not support spanning data across multiple frames.
+
+All textual data encoded in these frames is assumed to be line delimited.
+The last atom in the frame SHOULD end with a newline (``\n``). If it
+doesn't, clients MAY add a newline to facilitate immediate printing.
+
+Progress Update (``0x07``)
+--------------------------
+
+This frame holds the progress of an operation on the peer. Consumption
+of these frames allows clients to display progress bars, estimated
+completion times, etc.
+
+Each frame defines the progress of a single operation on the peer. The
+payload consists of a CBOR map with the following bytestring keys:
+
+topic
+   Topic name (string)
+pos
+   Current numeric position within the topic (integer)
+total
+   Total/end numeric position of this topic (unsigned integer)
+label (optional)
+   Unit label (string)
+item (optional)
+   Item name (string)
+
+Progress state is created when a frame is received referencing a
+*topic* that isn't currently tracked. Progress tracking for that
+*topic* is finished when a frame is received reporting the current
+position of that topic as ``-1``.
+
+Multiple *topics* may be active at any given time.
+
+Rendering of progress information is not mandated or governed by this
+specification: implementations MAY render progress information however
+they see fit, including not at all.
+
+The string data describing the topic SHOULD be static strings to
+facilitate receivers localizing that string data. The emitter
+MUST normalize all string data to valid UTF-8 and receivers SHOULD
+validate that received data conforms to UTF-8. The topic name
+SHOULD be ASCII.
+
+Stream Encoding Settings (``0x08``)
+-----------------------------------
+
+This frame type holds information defining the content encoding
+settings for a *stream*.
+
+This frame type is likely consumed by the protocol layer and is not
+passed on to applications.
+
+This frame type MUST ONLY occur on frames having the *Beginning of Stream*
+``Stream Flag`` set.
+
+The payload of this frame defines what content encoding has (possibly)
+been applied to the payloads of subsequent frames in this stream.
+
+The payload begins with an 8-bit integer defining the length of the
+encoding *profile*, followed by the string name of that profile, which
+must be an ASCII string. All bytes that follow can be used by that
+profile for supplemental settings definitions. See the section below
+on defined encoding profiles.
+
+Stream States and Flags
+=======================
+
+Streams can be in two states: *open* and *closed*. An *open* stream
+is active and frames attached to that stream could arrive at any time.
+A *closed* stream is not active. If a frame attached to a *closed*
+stream arrives, that frame MUST have an appropriate stream flag
+set indicating beginning of stream. All streams are in the *closed*
+state by default.
+
+The ``Stream Flags`` field denotes a set of bit flags for defining
+the relationship of this frame within a stream. The following flags
+are defined:
+
+0x01
+   Beginning of stream. The first frame in the stream MUST set this
+   flag. When received, the ``Stream ID`` this frame is attached to
+   becomes ``open``.
+
+0x02
+   End of stream. The last frame in a stream MUST set this flag. When
+   received, the ``Stream ID`` this frame is attached to becomes
+   ``closed``. Any content encoding context associated with this stream
+   can be destroyed after processing the payload of this frame.
+
+0x04
+   Apply content encoding. When set, any content encoding settings
+   defined by the stream should be applied when attempting to read
+   the frame. When not set, the frame payload isn't encoded.
+
+Streams
+=======
+
+Streams - along with ``Request IDs`` - facilitate grouping of frames.
+But the purpose of each is quite different and the groupings they
+constitute are independent.
+
+A ``Request ID`` is essentially a tag. It tells you which logical
+request a frame is associated with.
+
+A *stream* is a sequence of frames grouped for the express purpose
+of applying a stateful encoding or for denoting sub-groups of frames.
+
+Unlike ``Request ID``s which span the request and response, a stream
+is unidirectional and stream IDs are independent from client to
+server.
+
+There is no strict hierarchical relationship between ``Request IDs``
+and *streams*. A stream can contain frames having multiple
+``Request IDs``. Frames belonging to the same ``Request ID`` can
+span multiple streams.
+
+One goal of streams is to facilitate content encoding. A stream can
+define an encoding to be applied to frame payloads. For example, the
+payload transmitted over the wire may contain output from a
+zstandard compression operation and the receiving end may decompress
+that payload to obtain the original data.
+
+The other goal of streams is to facilitate concurrent execution. For
+example, a server could spawn 4 threads to service a request that can
+be easily parallelized. Each of those 4 threads could write into its
+own stream. Those streams could then in turn be delivered to 4 threads
+on the receiving end, with each thread consuming its stream in near
+isolation. The *main* thread on both ends merely does I/O and
+encodes/decodes frame headers: the bulk of the work is done by worker
+threads.
+
+In addition, since content encoding is defined per stream, each
+*worker thread* could perform potentially CPU bound work concurrently
+with other threads. This approach of applying encoding at the
+sub-protocol / stream level eliminates a potential resource constraint
+on the protocol stream as a whole (it is common for the throughput of
+a compression engine to be smaller than the throughput of a network).
+
+Having multiple streams - each with their own encoding settings - also
+facilitates the use of advanced data compression techniques. For
+example, a transmitter could see that it is generating data faster
+and slower than the receiving end is consuming it and adjust its
+compression settings to trade CPU for compression ratio accordingly.
+
+While streams can define a content encoding, not all frames within
+that stream must use that content encoding. This can be useful when
+data is being served from caches and being derived dynamically. A
+cache could pre-compressed data so the server doesn't have to
+recompress it. The ability to pick and choose which frames are
+compressed allows servers to easily send data to the wire without
+involving potentially expensive encoding overhead.
+
+Content Encoding Profiles
+=========================
+
+Streams can have named content encoding *profiles* associated with
+them. A profile defines a shared understanding of content encoding
+settings and behavior.
+
+The following profiles are defined:
+
+TBD
+
+Command Protocol
+================
+
+A client can request that a remote run a command by sending it
+frames defining that command. This logical stream is composed of
+1 or more ``Command Request`` frames and and 0 or more ``Command Data``
+frames.
+
+All frames composing a single command request MUST be associated with
+the same ``Request ID``.
+
+Clients MAY send additional command requests without waiting on the
+response to a previous command request. If they do so, they MUST ensure
+that the ``Request ID`` field of outbound frames does not conflict
+with that of an active ``Request ID`` whose response has not yet been
+fully received.
+
+Servers MAY respond to commands in a different order than they were
+sent over the wire. Clients MUST be prepared to deal with this. Servers
+also MAY start executing commands in a different order than they were
+received, or MAY execute multiple commands concurrently.
+
+If there is a dependency between commands or a race condition between
+commands executing (e.g. a read-only command that depends on the results
+of a command that mutates the repository), then clients MUST NOT send
+frames issuing a command until a response to all dependent commands has
+been received.
+TODO think about whether we should express dependencies between commands
+to avoid roundtrip latency.
+
+A command is defined by a command name, 0 or more command arguments,
+and optional command data.
+
+Arguments are the recommended mechanism for transferring fixed sets of
+parameters to a command. Data is appropriate for transferring variable
+data. Thinking in terms of HTTP, arguments would be headers and data
+would be the message body.
+
+It is recommended for servers to delay the dispatch of a command
+until all argument have been received. Servers MAY impose limits on the
+maximum argument size.
+TODO define failure mechanism.
+
+Servers MAY dispatch to commands immediately once argument data
+is available or delay until command data is received in full.
+
+Once a ``Command Request`` frame is sent, a client must be prepared to
+receive any of the following frames associated with that request:
+``Command Response``, ``Error Response``, ``Human Output Side-Channel``,
+``Progress Update``.
+
+The *main* response for a command will be in ``Command Response`` frames.
+The payloads of these frames consist of 1 or more CBOR encoded values.
+The first CBOR value on the first ``Command Response`` frame is special
+and denotes the overall status of the command. This CBOR map contains
+the following bytestring keys:
+
+status
+   (bytestring) A well-defined message containing the overall status of
+   this command request. The following values are defined:
+
+   ok
+      The command was received successfully and its response follows.
+   error
+      There was an error processing the command. More details about the
+      error are encoded in the ``error`` key.
+
+error (optional)
+   A map containing information about an encountered error. The map has the
+   following keys:
+
+   message
+      (array of maps) A message describing the error. The message uses the
+      same format as those in the ``Human Output Side-Channel`` frame.