changeset 37060:0a6c5cc09a88

wireproto: define human output side channel frame Currently, the SSH protocol delivers output tailored for people over the stderr file descriptor. The HTTP protocol doesn't have this file descriptor (because it only has an input and output pipe). So it encodes textual output intended for humans within the protocol responses. So response types have a facility for capturing output to be printed to users. Some don't. And sometimes the implementation of how that output is conveyed is super hacky. On top of that, bundle2 has an "output" part that is used to store output that should be printed when this part is encountered. bundle2 also has the concept of "interrupt" chunks, which can be used to signal that the regular bundle2 stream is to be preempted by an out-of-band part that should be processed immediately. This "interrupt" part can be an "output" part and can be used to print data on the receiver. The status quo is inconsistent and insane. We can do better. This commit introduces a dedicated frame type on the frame-based protocol for denoting textual data that should be printed on the receiver. This frame type effectively constitutes a side-channel by which textual data can be printed on the receiver without interfering with other in-progress transmissions, such as the transmission of command responses. But wait - there's more! Previous implementations that transferred textual data basically instructed the client to "print these bytes." This suffered from a few problems. First, the text data that was transmitted and eventually printed originated from a server with a specic i18n configuration. This meant that clients would see text using whatever the i18n settings were on the server. Someone in France could connect to a server in Japan and see unlegible Japanese glyphs - or maybe even mojibake. Second, the normalization of all text data originated on servers resulted in the loss of the ability to apply formatting to that data. Local Mercurial clients can apply specific formatting settings to individual atoms of text. For example, a revision can be colored differently from a commit message. With data over the wire, the potential for this rich formatting was lost. The best you could do (without parsing the text to be printed), was apply a universal label to it and e.g. color it specially. The new mechanism for instructing the peer to print data does not have these limitations. Frames instructing the peer to print text are composed of a formatting string plus arguments. In other words, receivers can plug the formatting string into the i18n database to see if a local translation is available. In addition, each atom being instructed to print has a series of "labels" associated with it. These labels can be mapped to the Mercurial UI's labels so locally configured coloring, styling, etc settings can be applied. What this all means is that textual messages originating on servers can be localized on the client and richly formatted, all while respecting the client's settings. This is slightly more complicated than "print these bytes." But it is vastly more user friendly. FWIW, I'm not aware of other protocols that attempt to encode i18n and textual styling in this manner. You could lobby the claim that this feature is over-engineered. However, if I were to sit in the shoes of a non-English speaker learning how to use version control, I think I would *love* this feature because it would enable me to see richly formatted text in my chosen locale. Anyway, we only implement support for encoding frames of this type and basic tests for that encoding. We'll still need to hook up the server and its ui instance to emit these frames. I recognize this feature may be a bit more controversial than other aspects of the wire protocol because it is a bit "radical." So I'd figured I'd start small to test the waters and see if others feel this feature is worthwhile. Differential Revision: https://phab.mercurial-scm.org/D2872
author Gregory Szorc <gregory.szorc@gmail.com>
date Wed, 14 Mar 2018 22:19:00 -0700
parents bbea991635d0
children 884a0c1604ad
files mercurial/help/internals/wireprotocol.txt mercurial/wireprotoframing.py tests/test-wireproto-serverreactor.py
diffstat 3 files changed, 232 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- a/mercurial/help/internals/wireprotocol.txt	Mon Mar 19 16:55:07 2018 -0700
+++ b/mercurial/help/internals/wireprotocol.txt	Wed Mar 14 22:19:00 2018 -0700
@@ -660,6 +660,64 @@
 0x02
    The error occurred at the application level. e.g. invalid command.
 
+Human Output Side-Channel (``0x06``)
+------------------------------------
+
+This frame contains a message that is intended to be displayed to
+people. Whereas most frames communicate machine readable data, this
+frame communicates textual data that is intended to be shown to
+humans.
+
+The frame consists of a series of *formatting requests*. Each formatting
+request consists of a formatting string, arguments for that formatting
+string, and labels to apply to that formatting string.
+
+A formatting string is a printf()-like string that allows variable
+substitution within the string. Labels allow the rendered text to be
+*decorated*. Assuming use of the canonical Mercurial code base, a
+formatting string can be the input to the ``i18n._`` function. This
+allows messages emitted from the server to be localized. So even if
+the server has different i18n settings, people could see messages in
+their *native* settings. Similarly, the use of labels allows
+decorations like coloring and underlining to be applied using the
+client's configured rendering settings.
+
+Formatting strings are similar to ``printf()`` strings or how
+Python's ``%`` operator works. The only supported formatting sequences
+are ``%s`` and ``%%``. ``%s`` will be replaced by whatever the string
+at that position resolves to. ``%%`` will be replaced by ``%``. All
+other 2-byte sequences beginning with ``%`` represent a literal
+``%`` followed by that character. However, future versions of the
+wire protocol reserve the right to allow clients to opt in to receiving
+formatting strings with additional formatters, hence why ``%%`` is
+required to represent the literal ``%``.
+
+The raw frame consists of a series of data structures representing
+textual atoms to print. Each atom begins with a struct defining the
+size of the data that follows:
+
+* A 16-bit little endian unsigned integer denoting the length of the
+  formatting string.
+* An 8-bit unsigned integer denoting the number of label strings
+  that follow.
+* An 8-bit unsigned integer denoting the number of formatting string
+  arguments strings that follow.
+* An array of 8-bit unsigned integers denoting the lengths of
+  *labels* data.
+* An array of 16-bit unsigned integers denoting the lengths of
+  formatting strings.
+* The formatting string, encoded as UTF-8.
+* 0 or more ASCII strings defining labels to apply to this atom.
+* 0 or more UTF-8 strings that will be used as arguments to the
+  formatting string.
+
+All data to be printed MUST be encoded into a single frame: this frame
+does not support spanning data across multiple frames.
+
+All textual data encoded in these frames is assumed to be line delimited.
+The last atom in the frame SHOULD end with a newline (``\n``). If it
+doesn't, clients MAY add a newline to facilitate immediate printing.
+
 Issuing Commands
 ----------------
 
--- a/mercurial/wireprotoframing.py	Mon Mar 19 16:55:07 2018 -0700
+++ b/mercurial/wireprotoframing.py	Wed Mar 14 22:19:00 2018 -0700
@@ -27,6 +27,7 @@
 FRAME_TYPE_COMMAND_DATA = 0x03
 FRAME_TYPE_BYTES_RESPONSE = 0x04
 FRAME_TYPE_ERROR_RESPONSE = 0x05
+FRAME_TYPE_TEXT_OUTPUT = 0x06
 
 FRAME_TYPES = {
     b'command-name': FRAME_TYPE_COMMAND_NAME,
@@ -34,6 +35,7 @@
     b'command-data': FRAME_TYPE_COMMAND_DATA,
     b'bytes-response': FRAME_TYPE_BYTES_RESPONSE,
     b'error-response': FRAME_TYPE_ERROR_RESPONSE,
+    b'text-output': FRAME_TYPE_TEXT_OUTPUT,
 }
 
 FLAG_COMMAND_NAME_EOS = 0x01
@@ -85,6 +87,7 @@
     FRAME_TYPE_COMMAND_DATA: FLAGS_COMMAND_DATA,
     FRAME_TYPE_BYTES_RESPONSE: FLAGS_BYTES_RESPONSE,
     FRAME_TYPE_ERROR_RESPONSE: FLAGS_ERROR_RESPONSE,
+    FRAME_TYPE_TEXT_OUTPUT: {},
 }
 
 ARGUMENT_FRAME_HEADER = struct.Struct(r'<HH')
@@ -281,6 +284,74 @@
 
     yield makeframe(requestid, FRAME_TYPE_ERROR_RESPONSE, flags, msg)
 
+def createtextoutputframe(requestid, atoms):
+    """Create a text output frame to render text to people.
+
+    ``atoms`` is a 3-tuple of (formatting string, args, labels).
+
+    The formatting string contains ``%s`` tokens to be replaced by the
+    corresponding indexed entry in ``args``. ``labels`` is an iterable of
+    formatters to be applied at rendering time. In terms of the ``ui``
+    class, each atom corresponds to a ``ui.write()``.
+    """
+    bytesleft = DEFAULT_MAX_FRAME_SIZE
+    atomchunks = []
+
+    for (formatting, args, labels) in atoms:
+        if len(args) > 255:
+            raise ValueError('cannot use more than 255 formatting arguments')
+        if len(labels) > 255:
+            raise ValueError('cannot use more than 255 labels')
+
+        # TODO look for localstr, other types here?
+
+        if not isinstance(formatting, bytes):
+            raise ValueError('must use bytes formatting strings')
+        for arg in args:
+            if not isinstance(arg, bytes):
+                raise ValueError('must use bytes for arguments')
+        for label in labels:
+            if not isinstance(label, bytes):
+                raise ValueError('must use bytes for labels')
+
+        # Formatting string must be UTF-8.
+        formatting = formatting.decode(r'utf-8', r'replace').encode(r'utf-8')
+
+        # Arguments must be UTF-8.
+        args = [a.decode(r'utf-8', r'replace').encode(r'utf-8') for a in args]
+
+        # Labels must be ASCII.
+        labels = [l.decode(r'ascii', r'strict').encode(r'ascii')
+                  for l in labels]
+
+        if len(formatting) > 65535:
+            raise ValueError('formatting string cannot be longer than 64k')
+
+        if any(len(a) > 65535 for a in args):
+            raise ValueError('argument string cannot be longer than 64k')
+
+        if any(len(l) > 255 for l in labels):
+            raise ValueError('label string cannot be longer than 255 bytes')
+
+        chunks = [
+            struct.pack(r'<H', len(formatting)),
+            struct.pack(r'<BB', len(labels), len(args)),
+            struct.pack(r'<' + r'B' * len(labels), *map(len, labels)),
+            struct.pack(r'<' + r'H' * len(args), *map(len, args)),
+        ]
+        chunks.append(formatting)
+        chunks.extend(labels)
+        chunks.extend(args)
+
+        atom = b''.join(chunks)
+        atomchunks.append(atom)
+        bytesleft -= len(atom)
+
+    if bytesleft < 0:
+        raise ValueError('cannot encode data in a single frame')
+
+    yield makeframe(requestid, FRAME_TYPE_TEXT_OUTPUT, 0, b''.join(atomchunks))
+
 class serverreactor(object):
     """Holds state of a server handling frame-based protocol requests.
 
--- a/tests/test-wireproto-serverreactor.py	Mon Mar 19 16:55:07 2018 -0700
+++ b/tests/test-wireproto-serverreactor.py	Wed Mar 14 22:19:00 2018 -0700
@@ -67,6 +67,109 @@
             ffs(b'1 command-data eos %s' % data.getvalue()),
         ])
 
+    def testtextoutputexcessiveargs(self):
+        """At most 255 formatting arguments are allowed."""
+        with self.assertRaisesRegexp(ValueError,
+                                     'cannot use more than 255 formatting'):
+            args = [b'x' for i in range(256)]
+            list(framing.createtextoutputframe(1, [(b'bleh', args, [])]))
+
+    def testtextoutputexcessivelabels(self):
+        """At most 255 labels are allowed."""
+        with self.assertRaisesRegexp(ValueError,
+                                     'cannot use more than 255 labels'):
+            labels = [b'l' for i in range(256)]
+            list(framing.createtextoutputframe(1, [(b'bleh', [], labels)]))
+
+    def testtextoutputformattingstringtype(self):
+        """Formatting string must be bytes."""
+        with self.assertRaisesRegexp(ValueError, 'must use bytes formatting '):
+            list(framing.createtextoutputframe(1, [
+                (b'foo'.decode('ascii'), [], [])]))
+
+    def testtextoutputargumentbytes(self):
+        with self.assertRaisesRegexp(ValueError, 'must use bytes for argument'):
+            list(framing.createtextoutputframe(1, [
+                (b'foo', [b'foo'.decode('ascii')], [])]))
+
+    def testtextoutputlabelbytes(self):
+        with self.assertRaisesRegexp(ValueError, 'must use bytes for labels'):
+            list(framing.createtextoutputframe(1, [
+                (b'foo', [], [b'foo'.decode('ascii')])]))
+
+    def testtextoutputtoolongformatstring(self):
+        with self.assertRaisesRegexp(ValueError,
+                                     'formatting string cannot be longer than'):
+            list(framing.createtextoutputframe(1, [
+                (b'x' * 65536, [], [])]))
+
+    def testtextoutputtoolongargumentstring(self):
+        with self.assertRaisesRegexp(ValueError,
+                                     'argument string cannot be longer than'):
+            list(framing.createtextoutputframe(1, [
+                (b'bleh', [b'x' * 65536], [])]))
+
+    def testtextoutputtoolonglabelstring(self):
+        with self.assertRaisesRegexp(ValueError,
+                                     'label string cannot be longer than'):
+            list(framing.createtextoutputframe(1, [
+                (b'bleh', [], [b'x' * 65536])]))
+
+    def testtextoutput1simpleatom(self):
+        val = list(framing.createtextoutputframe(1, [
+            (b'foo', [], [])]))
+
+        self.assertEqual(val, [
+            ffs(br'1 text-output 0 \x03\x00\x00\x00foo'),
+        ])
+
+    def testtextoutput2simpleatoms(self):
+        val = list(framing.createtextoutputframe(1, [
+            (b'foo', [], []),
+            (b'bar', [], []),
+        ]))
+
+        self.assertEqual(val, [
+            ffs(br'1 text-output 0 \x03\x00\x00\x00foo\x03\x00\x00\x00bar'),
+        ])
+
+    def testtextoutput1arg(self):
+        val = list(framing.createtextoutputframe(1, [
+            (b'foo %s', [b'val1'], []),
+        ]))
+
+        self.assertEqual(val, [
+            ffs(br'1 text-output 0 \x06\x00\x00\x01\x04\x00foo %sval1'),
+        ])
+
+    def testtextoutput2arg(self):
+        val = list(framing.createtextoutputframe(1, [
+            (b'foo %s %s', [b'val', b'value'], []),
+        ]))
+
+        self.assertEqual(val, [
+            ffs(br'1 text-output 0 \x09\x00\x00\x02\x03\x00\x05\x00'
+                br'foo %s %svalvalue'),
+        ])
+
+    def testtextoutput1label(self):
+        val = list(framing.createtextoutputframe(1, [
+            (b'foo', [], [b'label']),
+        ]))
+
+        self.assertEqual(val, [
+            ffs(br'1 text-output 0 \x03\x00\x01\x00\x05foolabel'),
+        ])
+
+    def testargandlabel(self):
+        val = list(framing.createtextoutputframe(1, [
+            (b'foo %s', [b'arg'], [b'label']),
+        ]))
+
+        self.assertEqual(val, [
+            ffs(br'1 text-output 0 \x06\x00\x01\x01\x05\x03\x00foo %slabelarg'),
+        ])
+
 class ServerReactorTests(unittest.TestCase):
     def _sendsingleframe(self, reactor, s):
         results = list(sendframes(reactor, [ffs(s)]))