author | Pierre-Yves David <pierre-yves.david@octobus.net> |
Tue, 07 Jan 2020 11:24:05 +0100 | |
changeset 44124 | d56a2d6f34f0 |
parent 43632 | 2e017696181f |
permissions | -rw-r--r-- |
39409
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
1 |
Mercurial uses Concise Binary Object Representation (CBOR) |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
2 |
(RFC 7049) for various data formats. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
3 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
4 |
This document describes the subset of CBOR that Mercurial uses and |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
5 |
gives recommendations for appropriate use of CBOR within Mercurial. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
6 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
7 |
Type Limitations |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
8 |
================ |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
9 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
10 |
Major types 0 and 1 (unsigned integers and negative integers) MUST be |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
11 |
fully supported. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
12 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
13 |
Major type 2 (byte strings) MUST be fully supported. However, there |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
14 |
are limitations around the use of indefinite-length byte strings. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
15 |
(See below.) |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
16 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
17 |
Major type 3 (text strings) are NOT supported. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
18 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
19 |
Major type 4 (arrays) MUST be supported. However, values are limited |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
20 |
to the set of types described in the "Container Types" section below. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
21 |
And indefinite-length arrays are NOT supported. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
22 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
23 |
Major type 5 (maps) MUST be supported. However, key values are limited |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
24 |
to the set of types described in the "Container Types" section below. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
25 |
And indefinite-length maps are NOT supported. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
26 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
27 |
Major type 6 (semantic tagging of major types) can be used with the |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
28 |
following semantic tag values: |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
29 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
30 |
258 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
31 |
Mathematical finite set. Suitable for representing Python's |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
32 |
``set`` type. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
33 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
34 |
All other semantic tag values are not allowed. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
35 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
36 |
Major type 7 (simple data types) can be used with the following |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
37 |
type values: |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
38 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
39 |
20 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
40 |
False |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
41 |
21 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
42 |
True |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
43 |
22 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
44 |
Null |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
45 |
31 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
46 |
Break stop code (for indefinite-length items). |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
47 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
48 |
All other simple data type values (including every value requiring the |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
49 |
1 byte extension) are disallowed. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
50 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
51 |
Indefinite-Length Byte Strings |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
52 |
============================== |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
53 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
54 |
Indefinite-length byte strings (major type 2) are allowed. However, |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
55 |
they MUST NOT occur inside a container type (such as an array or map). |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
56 |
i.e. they can only occur as the "top-most" element in a stream of |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
57 |
values. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
58 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
59 |
Encoders and decoders SHOULD *stream* indefinite-length byte strings. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
60 |
i.e. an encoder or decoder SHOULD NOT buffer the entirety of a long |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
61 |
byte string value when indefinite-length byte strings are being used |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
62 |
if it can be avoided. Mercurial MAY use extremely long indefinite-length |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
63 |
byte strings and buffering the source or destination value COULD lead to |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
64 |
memory exhaustion. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
65 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
66 |
Chunks in an indefinite-length byte string SHOULD NOT exceed 2^20 |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
67 |
bytes. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
68 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
69 |
Container Types |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
70 |
=============== |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
71 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
72 |
Mercurial may use the array (major type 4), map (major type 5), and |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
73 |
set (semantic tag 258 plus major type 4 array) container types. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
74 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
75 |
An array may contain any supported type as values. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
76 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
77 |
A map MUST only use the following types as keys: |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
78 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
79 |
* unsigned integers (major type 0) |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
80 |
* negative integers (major type 1) |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
81 |
* byte strings (major type 2) (but not indefinite-length byte strings) |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
82 |
* false (simple type 20) |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
83 |
* true (simple type 21) |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
84 |
* null (simple type 22) |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
85 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
86 |
A map MUST only use the following types as values: |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
87 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
88 |
* all types supported as map keys |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
89 |
* arrays |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
90 |
* maps |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
91 |
* sets |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
92 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
93 |
A set may only use the following types as values: |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
94 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
95 |
* all types supported as map keys |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
96 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
97 |
It is recommended that keys in maps and values in sets and arrays all |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
98 |
be of a uniform type. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
99 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
100 |
Avoiding Large Byte Strings |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
101 |
=========================== |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
102 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
103 |
The use of large byte strings is discouraged, especially in scenarios where |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
104 |
the total size of the byte string may by unbound for some inputs (e.g. when |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
105 |
representing the content of a tracked file). It is highly recommended to use |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
106 |
indefinite-length byte strings for these purposes. |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
107 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
108 |
Since indefinite-length byte strings cannot be nested within an outer |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
109 |
container (such as an array or map), to associate a large byte string |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
110 |
with another data structure, it is recommended to use an array or |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
111 |
map followed immediately by an indefinite-length byte string. For example, |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
112 |
instead of the following map:: |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
113 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
114 |
{ |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
115 |
"key1": "value1", |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
116 |
"key2": "value2", |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
117 |
"long_value": "some very large value...", |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
118 |
} |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
119 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
120 |
Use a map followed by a byte string: |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
121 |
|
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
122 |
{ |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
123 |
"key1": "value1", |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
124 |
"key2": "value2", |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
125 |
"value_follows": True, |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
126 |
} |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
127 |
<BEGIN INDEFINITE-LENGTH BYTE STRING> |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
128 |
"some very large value" |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
129 |
"..." |
2fe21c65777e
internals: document CBOR utilization
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff
changeset
|
130 |
<END INDEFINITE-LENGTH BYTE STRING> |