Mercurial > hg
annotate mercurial/minifileset.py @ 37711:65a23cc8e75b
cborutil: implement support for streaming encoding, bytestring decoding
The vendored cbor2 package is... a bit disappointing.
On the encoding side, it insists that you pass it something with
a write() to send data to. That means if you want to emit data to
a generator, you have to construct an e.g. io.BytesIO(), write()
to it, then get the data back out. There can be non-trivial overhead
involved.
The encoder also doesn't support indefinite types - bytestrings, arrays,
and maps that don't have a known length. Again, this is really
unfortunate because it requires you to buffer the entire source and
destination in memory to encode large things.
On the decoding side, it supports reading indefinite length types.
But it buffers them completely before returning. More sadness.
This commit implements "streaming" encoders for various CBOR types.
Encoding emits a generator of hunks. So you can efficiently stream
encoded data elsewhere.
It also implements support for emitting indefinite length bytestrings,
arrays, and maps.
On the decoding side, we only implement support for decoding an
indefinite length bytestring from a file object. It will emit a
generator of raw chunks from the source.
I didn't want to reinvent so many wheels. But profiling the wire
protocol revealed that the overhead of constructing io.BytesIO()
instances to temporarily hold results has a non-trivial overhead.
We're talking >15% of execution time for operations like
"transfer the fulltexts of all files in a revision." So I can
justify this effort.
Fortunately, CBOR is a relatively straightforward format. And we have
a reference implementation in the repo we can test against.
Differential Revision: https://phab.mercurial-scm.org/D3303
author | Gregory Szorc <gregory.szorc@gmail.com> |
---|---|
date | Sat, 14 Apr 2018 16:36:15 -0700 |
parents | d5288b966e2f |
children | 9c98cb30f4de |
rev | line source |
---|---|
35616
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
1 # minifileset.py - a simple language to select files |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
2 # |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
3 # Copyright 2017 Facebook, Inc. |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
4 # |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
5 # This software may be used and distributed according to the terms of the |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
6 # GNU General Public License version 2 or any later version. |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
7 |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
8 from __future__ import absolute_import |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
9 |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
10 from .i18n import _ |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
11 from . import ( |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
12 error, |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
13 fileset, |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
14 ) |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
15 |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
16 def _compile(tree): |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
17 if not tree: |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
18 raise error.ParseError(_("missing argument")) |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
19 op = tree[0] |
35741
73432eee0ac4
fileset: add kind:pat operator
Yuya Nishihara <yuya@tcha.org>
parents:
35740
diff
changeset
|
20 if op in {'symbol', 'string', 'kindpat'}: |
73432eee0ac4
fileset: add kind:pat operator
Yuya Nishihara <yuya@tcha.org>
parents:
35740
diff
changeset
|
21 name = fileset.getpattern(tree, {'path'}, _('invalid file pattern')) |
35616
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
22 if name.startswith('**'): # file extension test, ex. "**.tar.gz" |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
23 ext = name[2:] |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
24 for c in ext: |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
25 if c in '*{}[]?/\\': |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
26 raise error.ParseError(_('reserved character: %s') % c) |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
27 return lambda n, s: n.endswith(ext) |
35740
06a757b9e334
minifileset: unify handling of symbol and string patterns
Yuya Nishihara <yuya@tcha.org>
parents:
35691
diff
changeset
|
28 elif name.startswith('path:'): # directory or full path test |
35616
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
29 p = name[5:] # prefix |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
30 pl = len(p) |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
31 f = lambda n, s: n.startswith(p) and (len(n) == pl or n[pl] == '/') |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
32 return f |
35800
d5288b966e2f
minifileset: note the unsupported file pattern when raising a parse error
Matt Harbison <matt_harbison@yahoo.com>
parents:
35741
diff
changeset
|
33 raise error.ParseError(_("unsupported file pattern: %s") % name, |
35616
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
34 hint=_('paths must be prefixed with "path:"')) |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
35 elif op == 'or': |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
36 func1 = _compile(tree[1]) |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
37 func2 = _compile(tree[2]) |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
38 return lambda n, s: func1(n, s) or func2(n, s) |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
39 elif op == 'and': |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
40 func1 = _compile(tree[1]) |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
41 func2 = _compile(tree[2]) |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
42 return lambda n, s: func1(n, s) and func2(n, s) |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
43 elif op == 'not': |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
44 return lambda n, s: not _compile(tree[1])(n, s) |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
45 elif op == 'group': |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
46 return _compile(tree[1]) |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
47 elif op == 'func': |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
48 symbols = { |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
49 'all': lambda n, s: True, |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
50 'none': lambda n, s: False, |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
51 'size': lambda n, s: fileset.sizematcher(tree[2])(s), |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
52 } |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
53 |
35691
735f47b41521
fileset: make it robust for bad function calls
Yuya Nishihara <yuya@tcha.org>
parents:
35616
diff
changeset
|
54 name = fileset.getsymbol(tree[1]) |
735f47b41521
fileset: make it robust for bad function calls
Yuya Nishihara <yuya@tcha.org>
parents:
35616
diff
changeset
|
55 if name in symbols: |
35616
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
56 return symbols[name] |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
57 |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
58 raise error.UnknownIdentifier(name, symbols.keys()) |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
59 elif op == 'minus': # equivalent to 'x and not y' |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
60 func1 = _compile(tree[1]) |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
61 func2 = _compile(tree[2]) |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
62 return lambda n, s: func1(n, s) and not func2(n, s) |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
63 elif op == 'negate': |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
64 raise error.ParseError(_("can't use negate operator in this context")) |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
65 elif op == 'list': |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
66 raise error.ParseError(_("can't use a list in this context"), |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
67 hint=_('see hg help "filesets.x or y"')) |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
68 raise error.ProgrammingError('illegal tree: %r' % (tree,)) |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
69 |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
70 def compile(text): |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
71 """generate a function (path, size) -> bool from filter specification. |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
72 |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
73 "text" could contain the operators defined by the fileset language for |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
74 common logic operations, and parenthesis for grouping. The supported path |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
75 tests are '**.extname' for file extension test, and '"path:dir/subdir"' |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
76 for prefix test. The ``size()`` predicate is borrowed from filesets to test |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
77 file size. The predicates ``all()`` and ``none()`` are also supported. |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
78 |
35741
73432eee0ac4
fileset: add kind:pat operator
Yuya Nishihara <yuya@tcha.org>
parents:
35740
diff
changeset
|
79 '(**.php & size(">10MB")) | **.zip | (path:bin & !path:bin/README)' for |
35616
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
80 example, will catch all php files whose size is greater than 10 MB, all |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
81 files whose name ends with ".zip", and all files under "bin" in the repo |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
82 root except for "bin/README". |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
83 """ |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
84 tree = fileset.parse(text) |
706aa203b396
fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff
changeset
|
85 return _compile(tree) |