mercurial/parser.py
author Sune Foldager <cryo@cyanite.org>
Wed, 18 May 2011 23:26:26 +0200
changeset 14365 a8e3931e3fb5
parent 13665 e798e430c5e5
child 14701 4b93bd041772
permissions -rw-r--r--
revlog: linearize created changegroups in generaldelta revlogs This greatly improves the speed of the bundling process, and often reduces the bundle size considerably. (Although if the repository is already ordered, this has little effect on both time and bundle size.) For non-generaldelta clients, the reduced bundle size translates to a reduced repository size, similar to shrinking the revlogs (which uses the exact same algorithm). For generaldelta clients the difference is minor. When the new bundle format comes, reordering will not be necessary since we can then store the deltaparent relationsships directly. The eventual default behavior for clients and servers is presented in the table below, where "new" implies support for GD as well as the new bundle format: old client new client old server old bundle, no reorder old bundle, no reorder new server, non-GD old bundle, no reorder[1] old bundle, no reorder[2] new server, GD old bundle, reorder[3] new bundle, no reorder[4] [1] reordering is expensive on the server in this case, skip it [2] client can choose to do its own redelta here [3] reordering is needed because otherwise the pull does a lot of extra work on the server [4] reordering isn't needed because client can get deltabase in bundle format Currently, the default is to reorder on GD-servers, and not otherwise. A new setting, bundle.reorder, has been added to override the default reordering behavior. It can be set to either 'auto' (the default), or any true or false value as a standard boolean setting, to either force the reordering on or off regardless of generaldelta. Some timing data from a relatively branch test repository follows. All bundling is done with --all --type none options. Non-generaldelta, non-shrunk repo: ----------------------------------- Size: 276M Without reorder (default): Bundle time: 14.4 seconds Bundle size: 939M With reorder: Bundle time: 1 minute, 29.3 seconds Bundle size: 381M Generaldelta, non-shrunk repo: ----------------------------------- Size: 87M Without reorder: Bundle time: 2 minutes, 1.4 seconds Bundle size: 939M With reorder (default): Bundle time: 25.5 seconds Bundle size: 381M
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
11274
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
     1
# parser.py - simple top-down operator precedence parser for mercurial
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
     2
#
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
     3
# Copyright 2010 Matt Mackall <mpm@selenic.com>
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
     4
#
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
     5
# This software may be used and distributed according to the terms of the
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
     6
# GNU General Public License version 2 or any later version.
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
     7
11449
05af334bac05 parser: fix URL to effbot
Julian Cowley <julian@lava.net>
parents: 11412
diff changeset
     8
# see http://effbot.org/zone/simple-top-down-parsing.htm and
11274
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
     9
# http://eli.thegreenplace.net/2010/01/02/top-down-operator-precedence-parsing/
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    10
# for background
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    11
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    12
# takes a tokenizer and elements
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    13
# tokenizer is an iterator that returns type, value pairs
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    14
# elements is a mapping of types to binding strength, prefix and infix actions
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    15
# an action is a tree node name, a tree label, and an optional match
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    16
# __call__(program) parses program into a labelled tree
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    17
11289
4215ce511134 revset: raise ParseError exceptions
Matt Mackall <mpm@selenic.com>
parents: 11278
diff changeset
    18
import error
4215ce511134 revset: raise ParseError exceptions
Matt Mackall <mpm@selenic.com>
parents: 11278
diff changeset
    19
11274
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    20
class parser(object):
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    21
    def __init__(self, tokenizer, elements, methods=None):
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    22
        self._tokenizer = tokenizer
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    23
        self._elements = elements
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    24
        self._methods = methods
13176
895f54a79c6e templater: use the parser.py parser to extend the templater syntax
Matt Mackall <mpm@selenic.com>
parents: 11449
diff changeset
    25
        self.current = None
11274
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    26
    def _advance(self):
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    27
        'advance the tokenizer'
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    28
        t = self.current
11278
7df88cdf47fd revset: add support for prefix and suffix versions of : and ::
Matt Mackall <mpm@selenic.com>
parents: 11274
diff changeset
    29
        try:
7df88cdf47fd revset: add support for prefix and suffix versions of : and ::
Matt Mackall <mpm@selenic.com>
parents: 11274
diff changeset
    30
            self.current = self._iter.next()
7df88cdf47fd revset: add support for prefix and suffix versions of : and ::
Matt Mackall <mpm@selenic.com>
parents: 11274
diff changeset
    31
        except StopIteration:
7df88cdf47fd revset: add support for prefix and suffix versions of : and ::
Matt Mackall <mpm@selenic.com>
parents: 11274
diff changeset
    32
            pass
11274
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    33
        return t
11319
9d1cf337a78d parser: fix missing param in _match
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 11305
diff changeset
    34
    def _match(self, m, pos):
11274
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    35
        'make sure the tokenizer matches an end condition'
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    36
        if self.current[0] != m:
11305
d4cafcb63f77 cleanups: undefined variables
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 11289
diff changeset
    37
            raise error.ParseError("unexpected token: %s" % self.current[0],
d4cafcb63f77 cleanups: undefined variables
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 11289
diff changeset
    38
                                   self.current[2])
11274
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    39
        self._advance()
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    40
    def _parse(self, bind=0):
11289
4215ce511134 revset: raise ParseError exceptions
Matt Mackall <mpm@selenic.com>
parents: 11278
diff changeset
    41
        token, value, pos = self._advance()
11274
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    42
        # handle prefix rules on current token
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    43
        prefix = self._elements[token][1]
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    44
        if not prefix:
11289
4215ce511134 revset: raise ParseError exceptions
Matt Mackall <mpm@selenic.com>
parents: 11278
diff changeset
    45
            raise error.ParseError("not a prefix: %s" % token, pos)
11274
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    46
        if len(prefix) == 1:
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    47
            expr = (prefix[0], value)
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    48
        else:
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    49
            if len(prefix) > 2 and prefix[2] == self.current[0]:
11319
9d1cf337a78d parser: fix missing param in _match
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 11305
diff changeset
    50
                self._match(prefix[2], pos)
11274
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    51
                expr = (prefix[0], None)
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    52
            else:
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    53
                expr = (prefix[0], self._parse(prefix[1]))
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    54
                if len(prefix) > 2:
11319
9d1cf337a78d parser: fix missing param in _match
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 11305
diff changeset
    55
                    self._match(prefix[2], pos)
11274
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    56
        # gather tokens until we meet a lower binding strength
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    57
        while bind < self._elements[self.current[0]][0]:
11289
4215ce511134 revset: raise ParseError exceptions
Matt Mackall <mpm@selenic.com>
parents: 11278
diff changeset
    58
            token, value, pos = self._advance()
11278
7df88cdf47fd revset: add support for prefix and suffix versions of : and ::
Matt Mackall <mpm@selenic.com>
parents: 11274
diff changeset
    59
            e = self._elements[token]
7df88cdf47fd revset: add support for prefix and suffix versions of : and ::
Matt Mackall <mpm@selenic.com>
parents: 11274
diff changeset
    60
            # check for suffix - next token isn't a valid prefix
7df88cdf47fd revset: add support for prefix and suffix versions of : and ::
Matt Mackall <mpm@selenic.com>
parents: 11274
diff changeset
    61
            if len(e) == 4 and not self._elements[self.current[0]][1]:
7df88cdf47fd revset: add support for prefix and suffix versions of : and ::
Matt Mackall <mpm@selenic.com>
parents: 11274
diff changeset
    62
                suffix = e[3]
7df88cdf47fd revset: add support for prefix and suffix versions of : and ::
Matt Mackall <mpm@selenic.com>
parents: 11274
diff changeset
    63
                expr = (suffix[0], expr)
11274
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    64
            else:
11278
7df88cdf47fd revset: add support for prefix and suffix versions of : and ::
Matt Mackall <mpm@selenic.com>
parents: 11274
diff changeset
    65
                # handle infix rules
11412
51ceb1571805 parser: improve infix error checking
Matt Mackall <mpm@selenic.com>
parents: 11319
diff changeset
    66
                if len(e) < 3 or not e[2]:
51ceb1571805 parser: improve infix error checking
Matt Mackall <mpm@selenic.com>
parents: 11319
diff changeset
    67
                    raise error.ParseError("not an infix: %s" % token, pos)
51ceb1571805 parser: improve infix error checking
Matt Mackall <mpm@selenic.com>
parents: 11319
diff changeset
    68
                infix = e[2]
11278
7df88cdf47fd revset: add support for prefix and suffix versions of : and ::
Matt Mackall <mpm@selenic.com>
parents: 11274
diff changeset
    69
                if len(infix) == 3 and infix[2] == self.current[0]:
11319
9d1cf337a78d parser: fix missing param in _match
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 11305
diff changeset
    70
                    self._match(infix[2], pos)
11278
7df88cdf47fd revset: add support for prefix and suffix versions of : and ::
Matt Mackall <mpm@selenic.com>
parents: 11274
diff changeset
    71
                    expr = (infix[0], expr, (None))
7df88cdf47fd revset: add support for prefix and suffix versions of : and ::
Matt Mackall <mpm@selenic.com>
parents: 11274
diff changeset
    72
                else:
7df88cdf47fd revset: add support for prefix and suffix versions of : and ::
Matt Mackall <mpm@selenic.com>
parents: 11274
diff changeset
    73
                    expr = (infix[0], expr, self._parse(infix[1]))
7df88cdf47fd revset: add support for prefix and suffix versions of : and ::
Matt Mackall <mpm@selenic.com>
parents: 11274
diff changeset
    74
                    if len(infix) == 3:
11319
9d1cf337a78d parser: fix missing param in _match
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 11305
diff changeset
    75
                        self._match(infix[2], pos)
11274
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    76
        return expr
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    77
    def parse(self, message):
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    78
        'generate a parse tree from a message'
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    79
        self._iter = self._tokenizer(message)
13176
895f54a79c6e templater: use the parser.py parser to extend the templater syntax
Matt Mackall <mpm@selenic.com>
parents: 11449
diff changeset
    80
        self._advance()
13665
e798e430c5e5 revset: report a parse error if a revset is not parsed completely (issue2654)
Bernhard Leiner <bleiner@gmail.com>
parents: 13176
diff changeset
    81
        res = self._parse()
e798e430c5e5 revset: report a parse error if a revset is not parsed completely (issue2654)
Bernhard Leiner <bleiner@gmail.com>
parents: 13176
diff changeset
    82
        token, value, pos = self.current
e798e430c5e5 revset: report a parse error if a revset is not parsed completely (issue2654)
Bernhard Leiner <bleiner@gmail.com>
parents: 13176
diff changeset
    83
        return res, pos
11274
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    84
    def eval(self, tree):
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    85
        'recursively evaluate a parse tree using node methods'
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    86
        if not isinstance(tree, tuple):
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    87
            return tree
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    88
        return self._methods[tree[0]](*[self.eval(t) for t in tree[1:]])
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    89
    def __call__(self, message):
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    90
        'parse a message into a parse tree and evaluate if methods given'
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    91
        t = self.parse(message)
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    92
        if self._methods:
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    93
            return self.eval(t)
77272d28b53f revset: introduce basic parser
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    94
        return t