Mercurial > hg
view mercurial/minifileset.py @ 37711:65a23cc8e75b
cborutil: implement support for streaming encoding, bytestring decoding
The vendored cbor2 package is... a bit disappointing.
On the encoding side, it insists that you pass it something with
a write() to send data to. That means if you want to emit data to
a generator, you have to construct an e.g. io.BytesIO(), write()
to it, then get the data back out. There can be non-trivial overhead
involved.
The encoder also doesn't support indefinite types - bytestrings, arrays,
and maps that don't have a known length. Again, this is really
unfortunate because it requires you to buffer the entire source and
destination in memory to encode large things.
On the decoding side, it supports reading indefinite length types.
But it buffers them completely before returning. More sadness.
This commit implements "streaming" encoders for various CBOR types.
Encoding emits a generator of hunks. So you can efficiently stream
encoded data elsewhere.
It also implements support for emitting indefinite length bytestrings,
arrays, and maps.
On the decoding side, we only implement support for decoding an
indefinite length bytestring from a file object. It will emit a
generator of raw chunks from the source.
I didn't want to reinvent so many wheels. But profiling the wire
protocol revealed that the overhead of constructing io.BytesIO()
instances to temporarily hold results has a non-trivial overhead.
We're talking >15% of execution time for operations like
"transfer the fulltexts of all files in a revision." So I can
justify this effort.
Fortunately, CBOR is a relatively straightforward format. And we have
a reference implementation in the repo we can test against.
Differential Revision: https://phab.mercurial-scm.org/D3303
author | Gregory Szorc <gregory.szorc@gmail.com> |
---|---|
date | Sat, 14 Apr 2018 16:36:15 -0700 |
parents | d5288b966e2f |
children | 9c98cb30f4de |
line wrap: on
line source
# minifileset.py - a simple language to select files # # Copyright 2017 Facebook, Inc. # # This software may be used and distributed according to the terms of the # GNU General Public License version 2 or any later version. from __future__ import absolute_import from .i18n import _ from . import ( error, fileset, ) def _compile(tree): if not tree: raise error.ParseError(_("missing argument")) op = tree[0] if op in {'symbol', 'string', 'kindpat'}: name = fileset.getpattern(tree, {'path'}, _('invalid file pattern')) if name.startswith('**'): # file extension test, ex. "**.tar.gz" ext = name[2:] for c in ext: if c in '*{}[]?/\\': raise error.ParseError(_('reserved character: %s') % c) return lambda n, s: n.endswith(ext) elif name.startswith('path:'): # directory or full path test p = name[5:] # prefix pl = len(p) f = lambda n, s: n.startswith(p) and (len(n) == pl or n[pl] == '/') return f raise error.ParseError(_("unsupported file pattern: %s") % name, hint=_('paths must be prefixed with "path:"')) elif op == 'or': func1 = _compile(tree[1]) func2 = _compile(tree[2]) return lambda n, s: func1(n, s) or func2(n, s) elif op == 'and': func1 = _compile(tree[1]) func2 = _compile(tree[2]) return lambda n, s: func1(n, s) and func2(n, s) elif op == 'not': return lambda n, s: not _compile(tree[1])(n, s) elif op == 'group': return _compile(tree[1]) elif op == 'func': symbols = { 'all': lambda n, s: True, 'none': lambda n, s: False, 'size': lambda n, s: fileset.sizematcher(tree[2])(s), } name = fileset.getsymbol(tree[1]) if name in symbols: return symbols[name] raise error.UnknownIdentifier(name, symbols.keys()) elif op == 'minus': # equivalent to 'x and not y' func1 = _compile(tree[1]) func2 = _compile(tree[2]) return lambda n, s: func1(n, s) and not func2(n, s) elif op == 'negate': raise error.ParseError(_("can't use negate operator in this context")) elif op == 'list': raise error.ParseError(_("can't use a list in this context"), hint=_('see hg help "filesets.x or y"')) raise error.ProgrammingError('illegal tree: %r' % (tree,)) def compile(text): """generate a function (path, size) -> bool from filter specification. "text" could contain the operators defined by the fileset language for common logic operations, and parenthesis for grouping. The supported path tests are '**.extname' for file extension test, and '"path:dir/subdir"' for prefix test. The ``size()`` predicate is borrowed from filesets to test file size. The predicates ``all()`` and ``none()`` are also supported. '(**.php & size(">10MB")) | **.zip | (path:bin & !path:bin/README)' for example, will catch all php files whose size is greater than 10 MB, all files whose name ends with ".zip", and all files under "bin" in the repo root except for "bin/README". """ tree = fileset.parse(text) return _compile(tree)