Mercurial > hg
annotate mercurial/help/internals/cbor.txt @ 39409:2fe21c65777e
internals: document CBOR utilization
I spoke with some people at Mozilla about CBOR and they advised me
that we should be careful about the subset of CBOR we use in order
to mitigate security, performance, and compatibility concerns.
This commit establishes a document that attempts to formalize our
use of CBOR.
Its main limitations are on what types are allowed. It explicitly
enumerates which types are supported. Notable missing features
include:
* Indefinite-length arrays and maps
* Text strings (bytes all the way)
* Floats
* Date/time types
* Big integers
* Use of indefinite-length byte strings for map keys, values in
containers.
If we have a need for any of these, we can have a discussion about
them when the time comes.
Differential Revision: https://phab.mercurial-scm.org/D4412
author | Gregory Szorc <gregory.szorc@gmail.com> |
---|---|
date | Tue, 28 Aug 2018 20:27:36 -0700 |
parents | |
children |
rev | line source |
---|---|
39409
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1 Mercurial uses Concise Binary Object Representation (CBOR) |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
2 (RFC 7049) for various data formats. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
3 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
4 This document describes the subset of CBOR that Mercurial uses and |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
5 gives recommendations for appropriate use of CBOR within Mercurial. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
6 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
7 Type Limitations |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
8 ================ |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
9 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
10 Major types 0 and 1 (unsigned integers and negative integers) MUST be |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
11 fully supported. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
12 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
13 Major type 2 (byte strings) MUST be fully supported. However, there |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
14 are limitations around the use of indefinite-length byte strings. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
15 (See below.) |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
16 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
17 Major type 3 (text strings) are NOT supported. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
18 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
19 Major type 4 (arrays) MUST be supported. However, values are limited |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
20 to the set of types described in the "Container Types" section below. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
21 And indefinite-length arrays are NOT supported. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
22 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
23 Major type 5 (maps) MUST be supported. However, key values are limited |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
24 to the set of types described in the "Container Types" section below. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
25 And indefinite-length maps are NOT supported. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
26 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
27 Major type 6 (semantic tagging of major types) can be used with the |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
28 following semantic tag values: |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
29 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
30 258 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
31 Mathematical finite set. Suitable for representing Python's |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
32 ``set`` type. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
33 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
34 All other semantic tag values are not allowed. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
35 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
36 Major type 7 (simple data types) can be used with the following |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
37 type values: |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
38 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
39 20 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
40 False |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
41 21 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
42 True |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
43 22 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
44 Null |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
45 31 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
46 Break stop code (for indefinite-length items). |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
47 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
48 All other simple data type values (including every value requiring the |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
49 1 byte extension) are disallowed. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
50 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
51 Indefinite-Length Byte Strings |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
52 ============================== |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
53 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
54 Indefinite-length byte strings (major type 2) are allowed. However, |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
55 they MUST NOT occur inside a container type (such as an array or map). |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
56 i.e. they can only occur as the "top-most" element in a stream of |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
57 values. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
58 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
59 Encoders and decoders SHOULD *stream* indefinite-length byte strings. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
60 i.e. an encoder or decoder SHOULD NOT buffer the entirety of a long |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
61 byte string value when indefinite-length byte strings are being used |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
62 if it can be avoided. Mercurial MAY use extremely long indefinite-length |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
63 byte strings and buffering the source or destination value COULD lead to |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
64 memory exhaustion. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
65 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
66 Chunks in an indefinite-length byte string SHOULD NOT exceed 2^20 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
67 bytes. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
68 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
69 Container Types |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
70 =============== |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
71 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
72 Mercurial may use the array (major type 4), map (major type 5), and |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
73 set (semantic tag 258 plus major type 4 array) container types. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
74 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
75 An array may contain any supported type as values. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
76 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
77 A map MUST only use the following types as keys: |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
78 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
79 * unsigned integers (major type 0) |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
80 * negative integers (major type 1) |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
81 * byte strings (major type 2) (but not indefinite-length byte strings) |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
82 * false (simple type 20) |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
83 * true (simple type 21) |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
84 * null (simple type 22) |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
85 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
86 A map MUST only use the following types as values: |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
87 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
88 * all types supported as map keys |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
89 * arrays |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
90 * maps |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
91 * sets |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
92 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
93 A set may only use the following types as values: |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
94 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
95 * all types supported as map keys |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
96 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
97 It is recommended that keys in maps and values in sets and arrays all |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
98 be of a uniform type. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
99 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
100 Avoiding Large Byte Strings |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
101 =========================== |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
102 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
103 The use of large byte strings is discouraged, especially in scenarios where |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
104 the total size of the byte string may by unbound for some inputs (e.g. when |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
105 representing the content of a tracked file). It is highly recommended to use |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
106 indefinite-length byte strings for these purposes. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
107 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
108 Since indefinite-length byte strings cannot be nested within an outer |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
109 container (such as an array or map), to associate a large byte string |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
110 with another data structure, it is recommended to use an array or |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
111 map followed immediately by an indefinite-length byte string. For example, |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
112 instead of the following map:: |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
113 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
114 { |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
115 "key1": "value1", |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
116 "key2": "value2", |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
117 "long_value": "some very large value...", |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
118 } |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
119 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
120 Use a map followed by a byte string: |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
121 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
122 { |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
123 "key1": "value1", |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
124 "key2": "value2", |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
125 "value_follows": True, |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
126 } |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
127 <BEGIN INDEFINITE-LENGTH BYTE STRING> |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
128 "some very large value" |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
129 "..." |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
130 <END INDEFINITE-LENGTH BYTE STRING> |