wireprotov2: update stream encoding specification
authorGregory Szorc <gregory.szorc@gmail.com>
Thu, 04 Oct 2018 14:05:16 -0700
changeset 40125 e2fe1074024c
parent 40124 b638219a23c3
child 40126 327d40b94bed
wireprotov2: update stream encoding specification The encoding of data within streams in the frame-based protocol is not yet defined or implemented. This means that all data in wire protocol version 2 is currently being sent out raw, without compression. That's obviously not ideal. This commit formalizes the beginnings of stream encoding support in the protocol. I suspect we'll change behavior substantially in the future. My goal is to get something landed so we can use compression. We can build out more robust support later. Because the frame type ID changed, this is strictly BC. But existing code wasn't using the frame. I'll bump the framing protocol version later once code is introduced to use the new frame. Differential Revision: https://phab.mercurial-scm.org/D4915
mercurial/help/internals/wireprotocolrpc.txt
mercurial/wireprotoframing.py
--- a/mercurial/help/internals/wireprotocolrpc.txt	Thu Oct 04 15:08:42 2018 -0700
+++ b/mercurial/help/internals/wireprotocolrpc.txt	Thu Oct 04 14:05:16 2018 -0700
@@ -336,7 +336,53 @@
 validate that received data conforms to UTF-8. The topic name
 SHOULD be ASCII.
 
-Stream Encoding Settings (``0x08``)
+Sender Protocol Settings (``0x08``)
+-----------------------------------
+
+This frame type advertises the sender's support for various protocol and
+stream level features. The data advertised in this frame is used to influence
+subsequent behavior of the current frame exchange channel.
+
+The frame payload consists of a CBOR map. It may contain the following
+bytestring keys:
+
+contentencodings
+   (array of bytestring) A list of content encodings supported by the
+   sender, in order of most to least preferred.
+
+   Peers are allowed to encode stream data using any of the listed
+   encodings.
+
+   See the ``Content Encoding Profiles`` section for an enumeration
+   of supported content encodings.
+
+   If not defined, the value is assumed to be a list with the single value
+   ``identity``, meaning only the no-op encoding is supported.
+
+   Senders MAY filter the set of advertised encodings against what it
+   knows the receiver supports (e.g. if the receiver advertised encodings
+   via the capabilities descriptor). However, doing so will prevent
+   servers from gaining an understanding of the aggregate capabilities
+   of clients. So clients are discouraged from doing so.
+
+When this frame is not sent/received, the receiver assumes default values
+for all keys.
+
+If encountered, this frame type MUST be sent before any other frame type
+in a channel.
+
+The following flag values are defined for this frame type:
+
+0x01
+   Data continuation. When set, an additional frame containing more protocol
+   settings immediately follows.
+0x02
+   End of data. When set, the protocol settings data has been completely
+   sent.
+
+The ``0x01`` flag is mutually exclusive with the ``0x02`` flag.
+
+Stream Encoding Settings (``0x09``)
 -----------------------------------
 
 This frame type holds information defining the content encoding
@@ -351,11 +397,25 @@
 The payload of this frame defines what content encoding has (possibly)
 been applied to the payloads of subsequent frames in this stream.
 
-The payload begins with an 8-bit integer defining the length of the
-encoding *profile*, followed by the string name of that profile, which
-must be an ASCII string. All bytes that follow can be used by that
-profile for supplemental settings definitions. See the section below
-on defined encoding profiles.
+The payload consists of a series of CBOR values. The first value is a
+bytestring denoting the content encoding profile of the data in this
+stream. Subsequent CBOR values supplement this simple value in a
+profile-specific manner. See the ``Content Encoding Profiles`` section
+for more.
+
+In the absence of this frame on a stream, it is assumed the stream is
+using the ``identity`` content encoding.
+
+The following flag values are defined for this frame type:
+
+0x01
+   Data continuation. When set, an additional frame containing more encoding
+   settings immediately follows.
+0x02
+   End of data. When set, the encoding settings data has been completely
+   sent.
+
+The ``0x01`` flag is mutually exclusive with the ``0x02`` flag.
 
 Stream States and Flags
 =======================
@@ -387,6 +447,11 @@
    defined by the stream should be applied when attempting to read
    the frame. When not set, the frame payload isn't encoded.
 
+TODO consider making stream opening and closing communicated via
+explicit frame types (e.g. a "stream state change" frame) rather than
+flags on all frames. This would make stream state changes more explicit,
+as they could only occur on specific frame types.
+
 Streams
 =======
 
@@ -452,9 +517,35 @@
 them. A profile defines a shared understanding of content encoding
 settings and behavior.
 
-The following profiles are defined:
+Profiles are described in the following sections.
+
+identity
+--------
+
+The ``identity`` profile is a no-op encoding: the encoded bytes are
+exactly the input bytes.
+
+This profile MUST be supported by all peers.
+
+In the absence of an identified profile, the ``identity`` profile is
+assumed.
 
-TBD
+zstd-8mb
+--------
+
+Zstandard encoding (RFC 8478). Zstandard is a fast and effective lossless
+compression format.
+
+This profile allows decompressor window sizes of up to 8 MB.
+
+zlib
+----
+
+zlib compressed data (RFC 1950). zlib is a widely-used and supported
+lossless compression format.
+
+It isn't as fast as zstandard and it is recommended to use zstandard instead,
+if possible.
 
 Command Protocol
 ================
--- a/mercurial/wireprotoframing.py	Thu Oct 04 15:08:42 2018 -0700
+++ b/mercurial/wireprotoframing.py	Thu Oct 04 14:05:16 2018 -0700
@@ -49,7 +49,8 @@
 FRAME_TYPE_ERROR_RESPONSE = 0x05
 FRAME_TYPE_TEXT_OUTPUT = 0x06
 FRAME_TYPE_PROGRESS = 0x07
-FRAME_TYPE_STREAM_SETTINGS = 0x08
+FRAME_TYPE_SENDER_PROTOCOL_SETTINGS = 0x08
+FRAME_TYPE_STREAM_SETTINGS = 0x09
 
 FRAME_TYPES = {
     b'command-request': FRAME_TYPE_COMMAND_REQUEST,
@@ -58,6 +59,7 @@
     b'error-response': FRAME_TYPE_ERROR_RESPONSE,
     b'text-output': FRAME_TYPE_TEXT_OUTPUT,
     b'progress': FRAME_TYPE_PROGRESS,
+    b'sender-protocol-settings': FRAME_TYPE_SENDER_PROTOCOL_SETTINGS,
     b'stream-settings': FRAME_TYPE_STREAM_SETTINGS,
 }
 
@@ -89,6 +91,22 @@
     b'eos': FLAG_COMMAND_RESPONSE_EOS,
 }
 
+FLAG_SENDER_PROTOCOL_SETTINGS_CONTINUATION = 0x01
+FLAG_SENDER_PROTOCOL_SETTINGS_EOS = 0x02
+
+FLAGS_SENDER_PROTOCOL_SETTINGS = {
+    b'continuation': FLAG_SENDER_PROTOCOL_SETTINGS_CONTINUATION,
+    b'eos': FLAG_SENDER_PROTOCOL_SETTINGS_EOS,
+}
+
+FLAG_STREAM_ENCODING_SETTINGS_CONTINUATION = 0x01
+FLAG_STREAM_ENCODING_SETTINGS_EOS = 0x02
+
+FLAGS_STREAM_ENCODING_SETTINGS = {
+    b'continuation': FLAG_STREAM_ENCODING_SETTINGS_CONTINUATION,
+    b'eos': FLAG_STREAM_ENCODING_SETTINGS_EOS,
+}
+
 # Maps frame types to their available flags.
 FRAME_TYPE_FLAGS = {
     FRAME_TYPE_COMMAND_REQUEST: FLAGS_COMMAND_REQUEST,
@@ -97,7 +115,8 @@
     FRAME_TYPE_ERROR_RESPONSE: {},
     FRAME_TYPE_TEXT_OUTPUT: {},
     FRAME_TYPE_PROGRESS: {},
-    FRAME_TYPE_STREAM_SETTINGS: {},
+    FRAME_TYPE_SENDER_PROTOCOL_SETTINGS: FLAGS_SENDER_PROTOCOL_SETTINGS,
+    FRAME_TYPE_STREAM_SETTINGS: FLAGS_STREAM_ENCODING_SETTINGS,
 }
 
 ARGUMENT_RECORD_HEADER = struct.Struct(r'<HH')