internals: document CBOR utilization
authorGregory Szorc <gregory.szorc@gmail.com>
Tue, 28 Aug 2018 20:27:36 -0700
changeset 39436 2fe21c65777e
parent 39435 c7f213d7f4d2
child 39437 fcc6bd11444b
internals: document CBOR utilization I spoke with some people at Mozilla about CBOR and they advised me that we should be careful about the subset of CBOR we use in order to mitigate security, performance, and compatibility concerns. This commit establishes a document that attempts to formalize our use of CBOR. Its main limitations are on what types are allowed. It explicitly enumerates which types are supported. Notable missing features include: * Indefinite-length arrays and maps * Text strings (bytes all the way) * Floats * Date/time types * Big integers * Use of indefinite-length byte strings for map keys, values in containers. If we have a need for any of these, we can have a discussion about them when the time comes. Differential Revision: https://phab.mercurial-scm.org/D4412
contrib/wix/help.wxs
mercurial/help.py
mercurial/help/internals/cbor.txt
tests/test-help.t
--- a/contrib/wix/help.wxs	Mon Sep 03 13:56:53 2018 +0300
+++ b/contrib/wix/help.wxs	Tue Aug 28 20:27:36 2018 -0700
@@ -43,6 +43,7 @@
           <Component Id="help.internals" Guid="$(var.help.internals.guid)" Win64='$(var.IsX64)'>
             <File Id="internals.bundle2.txt"      Name="bundle2.txt" />
             <File Id="internals.bundles.txt"      Name="bundles.txt" KeyPath="yes" />
+            <File Id="internals.cbor.txt"         Name="cbor.txt" />
             <File Id="internals.censor.txt"       Name="censor.txt" />
             <File Id="internals.changegroups.txt" Name="changegroups.txt" />
             <File Id="internals.config.txt"       Name="config.txt" />
--- a/mercurial/help.py	Mon Sep 03 13:56:53 2018 +0300
+++ b/mercurial/help.py	Tue Aug 28 20:27:36 2018 -0700
@@ -205,6 +205,8 @@
      loaddoc('bundle2', subdir='internals')),
     (['bundles'], _('Bundles'),
      loaddoc('bundles', subdir='internals')),
+    (['cbor'], _('CBOR'),
+     loaddoc('cbor', subdir='internals')),
     (['censor'], _('Censor'),
      loaddoc('censor', subdir='internals')),
     (['changegroups'], _('Changegroups'),
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/mercurial/help/internals/cbor.txt	Tue Aug 28 20:27:36 2018 -0700
@@ -0,0 +1,130 @@
+Mercurial uses Concise Binary Object Representation (CBOR)
+(RFC 7049) for various data formats.
+
+This document describes the subset of CBOR that Mercurial uses and
+gives recommendations for appropriate use of CBOR within Mercurial.
+
+Type Limitations
+================
+
+Major types 0 and 1 (unsigned integers and negative integers) MUST be
+fully supported.
+
+Major type 2 (byte strings) MUST be fully supported. However, there
+are limitations around the use of indefinite-length byte strings.
+(See below.)
+
+Major type 3 (text strings) are NOT supported.
+
+Major type 4 (arrays) MUST be supported. However, values are limited
+to the set of types described in the "Container Types" section below.
+And indefinite-length arrays are NOT supported.
+
+Major type 5 (maps) MUST be supported. However, key values are limited
+to the set of types described in the "Container Types" section below.
+And indefinite-length maps are NOT supported.
+
+Major type 6 (semantic tagging of major types) can be used with the
+following semantic tag values:
+
+258
+   Mathematical finite set. Suitable for representing Python's
+   ``set`` type.
+
+All other semantic tag values are not allowed.
+
+Major type 7 (simple data types) can be used with the following
+type values:
+
+20
+   False
+21
+   True
+22
+   Null
+31
+   Break stop code (for indefinite-length items).
+
+All other simple data type values (including every value requiring the
+1 byte extension) are disallowed.
+
+Indefinite-Length Byte Strings
+==============================
+
+Indefinite-length byte strings (major type 2) are allowed. However,
+they MUST NOT occur inside a container type (such as an array or map).
+i.e. they can only occur as the "top-most" element in a stream of
+values.
+
+Encoders and decoders SHOULD *stream* indefinite-length byte strings.
+i.e. an encoder or decoder SHOULD NOT buffer the entirety of a long
+byte string value when indefinite-length byte strings are being used
+if it can be avoided. Mercurial MAY use extremely long indefinite-length
+byte strings and buffering the source or destination value COULD lead to
+memory exhaustion.
+
+Chunks in an indefinite-length byte string SHOULD NOT exceed 2^20
+bytes.
+
+Container Types
+===============
+
+Mercurial may use the array (major type 4), map (major type 5), and
+set (semantic tag 258 plus major type 4 array) container types.
+
+An array may contain any supported type as values.
+
+A map MUST only use the following types as keys:
+
+* unsigned integers (major type 0)
+* negative integers (major type 1)
+* byte strings (major type 2) (but not indefinite-length byte strings)
+* false (simple type 20)
+* true (simple type 21)
+* null (simple type 22)
+
+A map MUST only use the following types as values:
+
+* all types supported as map keys
+* arrays
+* maps
+* sets
+
+A set may only use the following types as values:
+
+* all types supported as map keys
+
+It is recommended that keys in maps and values in sets and arrays all
+be of a uniform type.
+
+Avoiding Large Byte Strings
+===========================
+
+The use of large byte strings is discouraged, especially in scenarios where
+the total size of the byte string may by unbound for some inputs (e.g. when
+representing the content of a tracked file). It is highly recommended to use
+indefinite-length byte strings for these purposes.
+
+Since indefinite-length byte strings cannot be nested within an outer
+container (such as an array or map), to associate a large byte string
+with another data structure, it is recommended to use an array or
+map followed immediately by an indefinite-length byte string. For example,
+instead of the following map::
+
+   {
+      "key1": "value1",
+      "key2": "value2",
+      "long_value": "some very large value...",
+   }
+
+Use a map followed by a byte string:
+
+   {
+      "key1": "value1",
+      "key2": "value2",
+      "value_follows": True,
+   }
+   <BEGIN INDEFINITE-LENGTH BYTE STRING>
+   "some very large value"
+   "..."
+   <END INDEFINITE-LENGTH BYTE STRING>
--- a/tests/test-help.t	Mon Sep 03 13:56:53 2018 +0300
+++ b/tests/test-help.t	Tue Aug 28 20:27:36 2018 -0700
@@ -1010,6 +1010,7 @@
   
        bundle2       Bundle2
        bundles       Bundles
+       cbor          CBOR
        censor        Censor
        changegroups  Changegroups
        config        Config Registrar
@@ -3294,6 +3295,13 @@
   Bundles
   </td></tr>
   <tr><td>
+  <a href="/help/internals.cbor">
+  cbor
+  </a>
+  </td><td>
+  CBOR
+  </td></tr>
+  <tr><td>
   <a href="/help/internals.censor">
   censor
   </a>