CONTRIBUTORS
author Gregory Szorc <gregory.szorc@gmail.com>
Tue, 28 Aug 2018 15:02:48 -0700
changeset 39438 aeb551a3bb8a
parent 5514 c29efd272395
permissions -rw-r--r--
cborutil: implement sans I/O decoder The vendored CBOR package decodes by calling read(n) on an object. There are a number of disadvantages to this: * Uses blocking I/O. If sufficient data is not available, the decoder will hang until it is. * No support for partial reads. If the read(n) returns less data than requested, the decoder raises an error. * Requires the use of a file like object. If the original data is in say a buffer, we need to "cast" it to e.g. a BytesIO to appease the decoder. In addition, the vendored CBOR decoder doesn't provide flexibility that we desire. Specifically: * It buffers indefinite length bytestrings instead of streaming them. * It doesn't allow limiting the set of types that can be decoded. This property is useful when implementing a "hardened" decoder that is less susceptible to abusive input. * It doesn't provide sufficient "hook points" and introspection to institute checks around behavior. These are useful for implementing a "hardened" decoder. This all adds up to a reasonable set of justifications for writing our own decoder. So, this commit implements our own CBOR decoder. At the heart of the decoder is a function that decodes a single "item" from a buffer. This item can be a complete simple value or a special value, such as "start of array." Using this function, we can build a decoder that effectively iterates over the stream of decoded items and builds up higher-level values, such as arrays, maps, sets, and indefinite length bytestrings. And we can do this without performing I/O in the decoder itself. The core of the sans I/O decoder will probably not be used directly. Instead, it is expected that we'll build utility functions for invoking the decoder given specific input types. This will allow extreme flexibility in how data is delivered to the decoder. I'm pretty happy with the state of the decoder modulo the TODO items to track wanted features to help with a "hardened" decoder. The one thing I could be convinced to change is the handling of semantic tags. Since we only support a single semantic tag (sets), I thought it would be easier to handle them inline in decodeitem(). This is simpler now. But if we add support for other semantic tags, it will likely be easier to move semantic tag handling outside of decodeitem(). But, properly supporting semantic tags opens up a whole can of worms, as many semantic tags imply new types. I'm optimistic we won't need these in Mercurial. But who knows. I'm also pretty happy with the test coverage. Writing comprehensive tests for partial decoding did flush out a handful of bugs. One general improvement to testing would be fuzz testing for partial decoding. I may implement that later. I also anticipate switching the wire protocol code to this new decoder will flush out any lingering bugs. Differential Revision: https://phab.mercurial-scm.org/D4414
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
5514
c29efd272395 Add note to CONTRIBUTORS file
Matt Mackall <mpm@selenic.com>
parents: 2947
diff changeset
     1
[This file is here for historical purposes, all recent contributors
c29efd272395 Add note to CONTRIBUTORS file
Matt Mackall <mpm@selenic.com>
parents: 2947
diff changeset
     2
should appear in the changelog directly]
c29efd272395 Add note to CONTRIBUTORS file
Matt Mackall <mpm@selenic.com>
parents: 2947
diff changeset
     3
c29efd272395 Add note to CONTRIBUTORS file
Matt Mackall <mpm@selenic.com>
parents: 2947
diff changeset
     4
Andrea Arcangeli <andrea at suse.de>
519
50768efaf6f2 Add a CONTRIBUTORS file
mpm@selenic.com
parents:
diff changeset
     5
Thomas Arendsen Hein <thomas at intevation.de>
50768efaf6f2 Add a CONTRIBUTORS file
mpm@selenic.com
parents:
diff changeset
     6
Goffredo Baroncelli <kreijack at libero.it>
756
5d79dfa5e98f Added new code contributors, fixed Vincent's name, added hint on encoding.
Thomas Arendsen Hein <thomas@intevation.de>
parents: 594
diff changeset
     7
Muli Ben-Yehuda <mulix at mulix.org>
5d79dfa5e98f Added new code contributors, fixed Vincent's name, added hint on encoding.
Thomas Arendsen Hein <thomas@intevation.de>
parents: 594
diff changeset
     8
Mikael Berthe <mikael at lilotux.net>
1450
199bb2b4ed4a Add Benoit to CONTRIBUTORS
Matt Mackall <mpm@selenic.com>
parents: 1310
diff changeset
     9
Benoit Boissinot <bboissin at gmail.com>
2947
2d865068f72e Add self to contributors
Brendan Cully <brendan@kublai.com>
parents: 2162
diff changeset
    10
Brendan Cully <brendan at kublai.com>
519
50768efaf6f2 Add a CONTRIBUTORS file
mpm@selenic.com
parents:
diff changeset
    11
Vincent Danjean <vdanjean.ml at free.fr>
50768efaf6f2 Add a CONTRIBUTORS file
mpm@selenic.com
parents:
diff changeset
    12
Jake Edge <jake at edge2.net>
50768efaf6f2 Add a CONTRIBUTORS file
mpm@selenic.com
parents:
diff changeset
    13
Michael Fetterman <michael.fetterman at intel.com>
50768efaf6f2 Add a CONTRIBUTORS file
mpm@selenic.com
parents:
diff changeset
    14
Edouard Gomez <ed.gomez at free.fr>
1231
effff847870f CONTRIBUTORS update
mpm@selenic.com
parents: 1080
diff changeset
    15
Eric Hopper <hopper at omnifarious.org>
756
5d79dfa5e98f Added new code contributors, fixed Vincent's name, added hint on encoding.
Thomas Arendsen Hein <thomas@intevation.de>
parents: 594
diff changeset
    16
Alecs King <alecsk at gmail.com>
1310
7e8a55c9ee5c Updated CONTRIBUTORS.
Thomas Arendsen Hein <thomas@intevation.de>
parents: 1231
diff changeset
    17
Volker Kleinfeld <Volker.Kleinfeld at gmx.de>
519
50768efaf6f2 Add a CONTRIBUTORS file
mpm@selenic.com
parents:
diff changeset
    18
Vadim Lebedev <vadim at mbdsys.com>
50768efaf6f2 Add a CONTRIBUTORS file
mpm@selenic.com
parents:
diff changeset
    19
Christopher Li <hg at chrisli.org>
50768efaf6f2 Add a CONTRIBUTORS file
mpm@selenic.com
parents:
diff changeset
    20
Chris Mason <mason at suse.com>
2162
dac432a521d8 Add self to CONTRIBUTORS
Colin McMillen <mcmillen@cs.cmu.edu>
parents: 2120
diff changeset
    21
Colin McMillen <mcmillen at cs.cmu.edu>
1080
253072f39205 Updated list of contributors.
Thomas Arendsen Hein <thomas@intevation.de>
parents: 896
diff changeset
    22
Wojciech Milkowski <wmilkowski at interia.pl>
756
5d79dfa5e98f Added new code contributors, fixed Vincent's name, added hint on encoding.
Thomas Arendsen Hein <thomas@intevation.de>
parents: 594
diff changeset
    23
Chad Netzer <chad.netzer at gmail.com>
519
50768efaf6f2 Add a CONTRIBUTORS file
mpm@selenic.com
parents:
diff changeset
    24
Bryan O'Sullivan <bos at serpentine.com>
756
5d79dfa5e98f Added new code contributors, fixed Vincent's name, added hint on encoding.
Thomas Arendsen Hein <thomas@intevation.de>
parents: 594
diff changeset
    25
Vicent SeguĂ­ Pascual <vseguip at gmail.com>
5d79dfa5e98f Added new code contributors, fixed Vincent's name, added hint on encoding.
Thomas Arendsen Hein <thomas@intevation.de>
parents: 594
diff changeset
    26
Sean Perry <shaleh at speakeasy.net>
594
0a2ffc5c906b Update CONTRIBUTORS
mpm@selenic.com
parents: 519
diff changeset
    27
Nguyen Anh Quynh <aquynh at gmail.com>
1310
7e8a55c9ee5c Updated CONTRIBUTORS.
Thomas Arendsen Hein <thomas@intevation.de>
parents: 1231
diff changeset
    28
Ollivier Robert <roberto at keltia.freenix.fr>
2120
c0994047c5ff Added my name to the contributors list.
Alexander Schremmer <alex AT alexanderweb DOT de>
parents: 1450
diff changeset
    29
Alexander Schremmer <alex at alexanderweb.de>
519
50768efaf6f2 Add a CONTRIBUTORS file
mpm@selenic.com
parents:
diff changeset
    30
Arun Sharma <arun at sharma-home.net>
1231
effff847870f CONTRIBUTORS update
mpm@selenic.com
parents: 1080
diff changeset
    31
Josef "Jeff" Sipek <jeffpc at optonline.net>
1310
7e8a55c9ee5c Updated CONTRIBUTORS.
Thomas Arendsen Hein <thomas@intevation.de>
parents: 1231
diff changeset
    32
Kevin Smith <yarcs at qualitycode.com>
1231
effff847870f CONTRIBUTORS update
mpm@selenic.com
parents: 1080
diff changeset
    33
TK Soh <teekaysoh at yahoo.com>
519
50768efaf6f2 Add a CONTRIBUTORS file
mpm@selenic.com
parents:
diff changeset
    34
Radoslaw Szkodzinski <astralstorm at gorzow.mm.pl>
851
73a432c8040a Added Samuel Tardieu to contributors list.
Thomas Arendsen Hein <thomas@intevation.de>
parents: 760
diff changeset
    35
Samuel Tardieu <sam at rfc1149.net>
519
50768efaf6f2 Add a CONTRIBUTORS file
mpm@selenic.com
parents:
diff changeset
    36
K Thananchayan <thananck at yahoo.com>
50768efaf6f2 Add a CONTRIBUTORS file
mpm@selenic.com
parents:
diff changeset
    37
Andrew Thompson <andrewkt at aktzero.com>
50768efaf6f2 Add a CONTRIBUTORS file
mpm@selenic.com
parents:
diff changeset
    38
Michael S. Tsirkin <mst at mellanox.co.il>
50768efaf6f2 Add a CONTRIBUTORS file
mpm@selenic.com
parents:
diff changeset
    39
Rafael Villar Burke <pachi at mmn-arquitectos.com>
855
a107c64c76be Added Tristan Wibberley to contributors.
Thomas Arendsen Hein <thomas@intevation.de>
parents: 851
diff changeset
    40
Tristan Wibberley <tristan at wibberley.org>
756
5d79dfa5e98f Added new code contributors, fixed Vincent's name, added hint on encoding.
Thomas Arendsen Hein <thomas@intevation.de>
parents: 594
diff changeset
    41
Mark Williamson <mark.williamson at cl.cam.ac.uk>