annotate mercurial/minifileset.py @ 37711:65a23cc8e75b

cborutil: implement support for streaming encoding, bytestring decoding The vendored cbor2 package is... a bit disappointing. On the encoding side, it insists that you pass it something with a write() to send data to. That means if you want to emit data to a generator, you have to construct an e.g. io.BytesIO(), write() to it, then get the data back out. There can be non-trivial overhead involved. The encoder also doesn't support indefinite types - bytestrings, arrays, and maps that don't have a known length. Again, this is really unfortunate because it requires you to buffer the entire source and destination in memory to encode large things. On the decoding side, it supports reading indefinite length types. But it buffers them completely before returning. More sadness. This commit implements "streaming" encoders for various CBOR types. Encoding emits a generator of hunks. So you can efficiently stream encoded data elsewhere. It also implements support for emitting indefinite length bytestrings, arrays, and maps. On the decoding side, we only implement support for decoding an indefinite length bytestring from a file object. It will emit a generator of raw chunks from the source. I didn't want to reinvent so many wheels. But profiling the wire protocol revealed that the overhead of constructing io.BytesIO() instances to temporarily hold results has a non-trivial overhead. We're talking >15% of execution time for operations like "transfer the fulltexts of all files in a revision." So I can justify this effort. Fortunately, CBOR is a relatively straightforward format. And we have a reference implementation in the repo we can test against. Differential Revision: https://phab.mercurial-scm.org/D3303
author Gregory Szorc <gregory.szorc@gmail.com>
date Sat, 14 Apr 2018 16:36:15 -0700
parents d5288b966e2f
children 9c98cb30f4de
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
35616
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
1 # minifileset.py - a simple language to select files
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
2 #
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
3 # Copyright 2017 Facebook, Inc.
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
4 #
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
5 # This software may be used and distributed according to the terms of the
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
6 # GNU General Public License version 2 or any later version.
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
7
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
8 from __future__ import absolute_import
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
9
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
10 from .i18n import _
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
11 from . import (
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
12 error,
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
13 fileset,
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
14 )
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
15
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
16 def _compile(tree):
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
17 if not tree:
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
18 raise error.ParseError(_("missing argument"))
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
19 op = tree[0]
35741
73432eee0ac4 fileset: add kind:pat operator
Yuya Nishihara <yuya@tcha.org>
parents: 35740
diff changeset
20 if op in {'symbol', 'string', 'kindpat'}:
73432eee0ac4 fileset: add kind:pat operator
Yuya Nishihara <yuya@tcha.org>
parents: 35740
diff changeset
21 name = fileset.getpattern(tree, {'path'}, _('invalid file pattern'))
35616
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
22 if name.startswith('**'): # file extension test, ex. "**.tar.gz"
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
23 ext = name[2:]
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
24 for c in ext:
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
25 if c in '*{}[]?/\\':
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
26 raise error.ParseError(_('reserved character: %s') % c)
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
27 return lambda n, s: n.endswith(ext)
35740
06a757b9e334 minifileset: unify handling of symbol and string patterns
Yuya Nishihara <yuya@tcha.org>
parents: 35691
diff changeset
28 elif name.startswith('path:'): # directory or full path test
35616
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
29 p = name[5:] # prefix
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
30 pl = len(p)
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
31 f = lambda n, s: n.startswith(p) and (len(n) == pl or n[pl] == '/')
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
32 return f
35800
d5288b966e2f minifileset: note the unsupported file pattern when raising a parse error
Matt Harbison <matt_harbison@yahoo.com>
parents: 35741
diff changeset
33 raise error.ParseError(_("unsupported file pattern: %s") % name,
35616
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
34 hint=_('paths must be prefixed with "path:"'))
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
35 elif op == 'or':
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
36 func1 = _compile(tree[1])
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
37 func2 = _compile(tree[2])
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
38 return lambda n, s: func1(n, s) or func2(n, s)
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
39 elif op == 'and':
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
40 func1 = _compile(tree[1])
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
41 func2 = _compile(tree[2])
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
42 return lambda n, s: func1(n, s) and func2(n, s)
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
43 elif op == 'not':
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
44 return lambda n, s: not _compile(tree[1])(n, s)
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
45 elif op == 'group':
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
46 return _compile(tree[1])
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
47 elif op == 'func':
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
48 symbols = {
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
49 'all': lambda n, s: True,
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
50 'none': lambda n, s: False,
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
51 'size': lambda n, s: fileset.sizematcher(tree[2])(s),
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
52 }
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
53
35691
735f47b41521 fileset: make it robust for bad function calls
Yuya Nishihara <yuya@tcha.org>
parents: 35616
diff changeset
54 name = fileset.getsymbol(tree[1])
735f47b41521 fileset: make it robust for bad function calls
Yuya Nishihara <yuya@tcha.org>
parents: 35616
diff changeset
55 if name in symbols:
35616
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
56 return symbols[name]
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
57
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
58 raise error.UnknownIdentifier(name, symbols.keys())
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
59 elif op == 'minus': # equivalent to 'x and not y'
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
60 func1 = _compile(tree[1])
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
61 func2 = _compile(tree[2])
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
62 return lambda n, s: func1(n, s) and not func2(n, s)
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
63 elif op == 'negate':
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
64 raise error.ParseError(_("can't use negate operator in this context"))
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
65 elif op == 'list':
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
66 raise error.ParseError(_("can't use a list in this context"),
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
67 hint=_('see hg help "filesets.x or y"'))
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
68 raise error.ProgrammingError('illegal tree: %r' % (tree,))
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
69
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
70 def compile(text):
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
71 """generate a function (path, size) -> bool from filter specification.
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
72
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
73 "text" could contain the operators defined by the fileset language for
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
74 common logic operations, and parenthesis for grouping. The supported path
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
75 tests are '**.extname' for file extension test, and '"path:dir/subdir"'
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
76 for prefix test. The ``size()`` predicate is borrowed from filesets to test
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
77 file size. The predicates ``all()`` and ``none()`` are also supported.
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
78
35741
73432eee0ac4 fileset: add kind:pat operator
Yuya Nishihara <yuya@tcha.org>
parents: 35740
diff changeset
79 '(**.php & size(">10MB")) | **.zip | (path:bin & !path:bin/README)' for
35616
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
80 example, will catch all php files whose size is greater than 10 MB, all
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
81 files whose name ends with ".zip", and all files under "bin" in the repo
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
82 root except for "bin/README".
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
83 """
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
84 tree = fileset.parse(text)
706aa203b396 fileset: add a lightweight file filtering language
Matt Harbison <matt_harbison@yahoo.com>
parents:
diff changeset
85 return _compile(tree)