annotate mercurial/help/internals/cbor.txt @ 39409:2fe21c65777e

internals: document CBOR utilization I spoke with some people at Mozilla about CBOR and they advised me that we should be careful about the subset of CBOR we use in order to mitigate security, performance, and compatibility concerns. This commit establishes a document that attempts to formalize our use of CBOR. Its main limitations are on what types are allowed. It explicitly enumerates which types are supported. Notable missing features include: * Indefinite-length arrays and maps * Text strings (bytes all the way) * Floats * Date/time types * Big integers * Use of indefinite-length byte strings for map keys, values in containers. If we have a need for any of these, we can have a discussion about them when the time comes. Differential Revision: https://phab.mercurial-scm.org/D4412
author Gregory Szorc <gregory.szorc@gmail.com>
date Tue, 28 Aug 2018 20:27:36 -0700
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
39409
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
1 Mercurial uses Concise Binary Object Representation (CBOR)
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
2 (RFC 7049) for various data formats.
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
3
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
4 This document describes the subset of CBOR that Mercurial uses and
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
5 gives recommendations for appropriate use of CBOR within Mercurial.
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
6
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
7 Type Limitations
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
8 ================
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
9
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
10 Major types 0 and 1 (unsigned integers and negative integers) MUST be
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
11 fully supported.
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
12
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
13 Major type 2 (byte strings) MUST be fully supported. However, there
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
14 are limitations around the use of indefinite-length byte strings.
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
15 (See below.)
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
16
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
17 Major type 3 (text strings) are NOT supported.
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
18
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
19 Major type 4 (arrays) MUST be supported. However, values are limited
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
20 to the set of types described in the "Container Types" section below.
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
21 And indefinite-length arrays are NOT supported.
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
22
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
23 Major type 5 (maps) MUST be supported. However, key values are limited
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
24 to the set of types described in the "Container Types" section below.
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
25 And indefinite-length maps are NOT supported.
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
26
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
27 Major type 6 (semantic tagging of major types) can be used with the
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
28 following semantic tag values:
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
29
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
30 258
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
31 Mathematical finite set. Suitable for representing Python's
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
32 ``set`` type.
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
33
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
34 All other semantic tag values are not allowed.
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
35
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
36 Major type 7 (simple data types) can be used with the following
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
37 type values:
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
38
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
39 20
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
40 False
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
41 21
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
42 True
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
43 22
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
44 Null
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
45 31
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
46 Break stop code (for indefinite-length items).
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
47
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
48 All other simple data type values (including every value requiring the
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
49 1 byte extension) are disallowed.
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
50
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
51 Indefinite-Length Byte Strings
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
52 ==============================
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
53
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
54 Indefinite-length byte strings (major type 2) are allowed. However,
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
55 they MUST NOT occur inside a container type (such as an array or map).
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
56 i.e. they can only occur as the "top-most" element in a stream of
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
57 values.
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
58
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
59 Encoders and decoders SHOULD *stream* indefinite-length byte strings.
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
60 i.e. an encoder or decoder SHOULD NOT buffer the entirety of a long
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
61 byte string value when indefinite-length byte strings are being used
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
62 if it can be avoided. Mercurial MAY use extremely long indefinite-length
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
63 byte strings and buffering the source or destination value COULD lead to
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
64 memory exhaustion.
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
65
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
66 Chunks in an indefinite-length byte string SHOULD NOT exceed 2^20
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
67 bytes.
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
68
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
69 Container Types
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
70 ===============
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
71
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
72 Mercurial may use the array (major type 4), map (major type 5), and
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
73 set (semantic tag 258 plus major type 4 array) container types.
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
74
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
75 An array may contain any supported type as values.
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
76
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
77 A map MUST only use the following types as keys:
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
78
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
79 * unsigned integers (major type 0)
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
80 * negative integers (major type 1)
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
81 * byte strings (major type 2) (but not indefinite-length byte strings)
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
82 * false (simple type 20)
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
83 * true (simple type 21)
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
84 * null (simple type 22)
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
85
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
86 A map MUST only use the following types as values:
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
87
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
88 * all types supported as map keys
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
89 * arrays
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
90 * maps
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
91 * sets
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
92
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
93 A set may only use the following types as values:
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
94
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
95 * all types supported as map keys
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
96
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
97 It is recommended that keys in maps and values in sets and arrays all
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
98 be of a uniform type.
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
99
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
100 Avoiding Large Byte Strings
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
101 ===========================
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
102
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
103 The use of large byte strings is discouraged, especially in scenarios where
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
104 the total size of the byte string may by unbound for some inputs (e.g. when
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
105 representing the content of a tracked file). It is highly recommended to use
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
106 indefinite-length byte strings for these purposes.
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
107
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
108 Since indefinite-length byte strings cannot be nested within an outer
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
109 container (such as an array or map), to associate a large byte string
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
110 with another data structure, it is recommended to use an array or
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
111 map followed immediately by an indefinite-length byte string. For example,
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
112 instead of the following map::
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
113
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
114 {
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
115 "key1": "value1",
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
116 "key2": "value2",
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
117 "long_value": "some very large value...",
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
118 }
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
119
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
120 Use a map followed by a byte string:
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
121
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
122 {
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
123 "key1": "value1",
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
124 "key2": "value2",
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
125 "value_follows": True,
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
126 }
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
127 <BEGIN INDEFINITE-LENGTH BYTE STRING>
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
128 "some very large value"
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
129 "..."
2fe21c65777e internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
130 <END INDEFINITE-LENGTH BYTE STRING>