annotate mercurial/pure/parsers.py @ 21809:e250b8300e6e

parsers: inline fields of dirstate values in C version Previously, while unpacking the dirstate we'd create 3-4 new CPython objects for most dirstate values: - the state is a single character string, which is pooled by CPython - the mode is a new object if it isn't 0 due to being in the lookup set - the size is a new object if it is greater than 255 - the mtime is a new object if it isn't -1 due to being in the lookup set - the tuple to contain them all In some cases such as regular hg status, we actually look at all the objects. In other cases like hg add, hg status for a subdirectory, or hg status with the third-party hgwatchman enabled, we look at almost none of the objects. This patch eliminates most object creation in these cases by defining a custom C struct that is exposed to Python with an interface similar to a tuple. Only when tuple elements are actually requested are the respective objects created. The gains, where they're expected, are significant. The following tests are run against a working copy with over 270,000 files. parse_dirstate becomes significantly faster: $ hg perfdirstate before: wall 0.186437 comb 0.180000 user 0.160000 sys 0.020000 (best of 35) after: wall 0.093158 comb 0.100000 user 0.090000 sys 0.010000 (best of 95) and as a result, several commands benefit: $ time hg status # with hgwatchman enabled before: 0.42s user 0.14s system 99% cpu 0.563 total after: 0.34s user 0.12s system 99% cpu 0.471 total $ time hg add new-file before: 0.85s user 0.18s system 99% cpu 1.033 total after: 0.76s user 0.17s system 99% cpu 0.931 total There is a slight regression in regular status performance, but this is fixed in an upcoming patch.
author Siddharth Agarwal <sid0@fb.com>
date Tue, 27 May 2014 14:27:41 -0700
parents 187bf2dde7c1
children feddc5284724
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
7700
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
1 # parsers.py - Python implementation of parsers.c
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
2 #
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
3 # Copyright 2009 Matt Mackall <mpm@selenic.com> and others
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
4 #
8225
46293a0c7e9f updated license to be explicit about GPL version 2
Martin Geisler <mg@lazybytes.net>
parents: 7945
diff changeset
5 # This software may be used and distributed according to the terms of the
10263
25e572394f5c Update license to GPLv2+
Matt Mackall <mpm@selenic.com>
parents: 8225
diff changeset
6 # GNU General Public License version 2 or any later version.
7700
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
7
14064
e4bfb9c337f3 remove unused imports and variables
Alexander Solovyov <alexander@solovyov.net>
parents: 13435
diff changeset
8 from mercurial.node import bin, nullid
7945
94d7e14cfa42 pure/parsers: fix circular imports, import mercurial modules properly
Matt Mackall <mpm@selenic.com>
parents: 7873
diff changeset
9 from mercurial import util
18567
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
10 import struct, zlib, cStringIO
7700
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
11
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
12 _pack = struct.pack
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
13 _unpack = struct.unpack
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
14 _compress = zlib.compress
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
15 _decompress = zlib.decompress
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
16 _sha = util.sha1
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
17
21809
e250b8300e6e parsers: inline fields of dirstate values in C version
Siddharth Agarwal <sid0@fb.com>
parents: 19652
diff changeset
18 # Some code below makes tuples directly because it's more convenient. However,
e250b8300e6e parsers: inline fields of dirstate values in C version
Siddharth Agarwal <sid0@fb.com>
parents: 19652
diff changeset
19 # code outside this module should always use dirstatetuple.
e250b8300e6e parsers: inline fields of dirstate values in C version
Siddharth Agarwal <sid0@fb.com>
parents: 19652
diff changeset
20 def dirstatetuple(*x):
e250b8300e6e parsers: inline fields of dirstate values in C version
Siddharth Agarwal <sid0@fb.com>
parents: 19652
diff changeset
21 # x is a tuple
e250b8300e6e parsers: inline fields of dirstate values in C version
Siddharth Agarwal <sid0@fb.com>
parents: 19652
diff changeset
22 return x
e250b8300e6e parsers: inline fields of dirstate values in C version
Siddharth Agarwal <sid0@fb.com>
parents: 19652
diff changeset
23
7700
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
24 def parse_manifest(mfdict, fdict, lines):
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
25 for l in lines.splitlines():
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
26 f, n = l.split('\0')
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
27 if len(n) > 40:
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
28 fdict[f] = n[40:]
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
29 mfdict[f] = bin(n[:40])
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
30 else:
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
31 mfdict[f] = bin(n)
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
32
13261
20a54bdf2328 pure: update index parsing
Matt Mackall <mpm@selenic.com>
parents: 13253
diff changeset
33 def parse_index2(data, inline):
7945
94d7e14cfa42 pure/parsers: fix circular imports, import mercurial modules properly
Matt Mackall <mpm@selenic.com>
parents: 7873
diff changeset
34 def gettype(q):
94d7e14cfa42 pure/parsers: fix circular imports, import mercurial modules properly
Matt Mackall <mpm@selenic.com>
parents: 7873
diff changeset
35 return int(q & 0xFFFF)
94d7e14cfa42 pure/parsers: fix circular imports, import mercurial modules properly
Matt Mackall <mpm@selenic.com>
parents: 7873
diff changeset
36
94d7e14cfa42 pure/parsers: fix circular imports, import mercurial modules properly
Matt Mackall <mpm@selenic.com>
parents: 7873
diff changeset
37 def offset_type(offset, type):
94d7e14cfa42 pure/parsers: fix circular imports, import mercurial modules properly
Matt Mackall <mpm@selenic.com>
parents: 7873
diff changeset
38 return long(long(offset) << 16 | type)
94d7e14cfa42 pure/parsers: fix circular imports, import mercurial modules properly
Matt Mackall <mpm@selenic.com>
parents: 7873
diff changeset
39
94d7e14cfa42 pure/parsers: fix circular imports, import mercurial modules properly
Matt Mackall <mpm@selenic.com>
parents: 7873
diff changeset
40 indexformatng = ">Qiiiiii20s12x"
94d7e14cfa42 pure/parsers: fix circular imports, import mercurial modules properly
Matt Mackall <mpm@selenic.com>
parents: 7873
diff changeset
41
7700
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
42 s = struct.calcsize(indexformatng)
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
43 index = []
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
44 cache = None
14995
8d928799dab5 parsers: remove redundant 'n' variable in parsers.parse_index2() (issue2935)
py4fun
parents: 14421
diff changeset
45 off = 0
13253
61c9bc3da402 revlog: remove lazy index
Matt Mackall <mpm@selenic.com>
parents: 10263
diff changeset
46
7700
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
47 l = len(data) - s
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
48 append = index.append
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
49 if inline:
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
50 cache = (0, data)
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
51 while off <= l:
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
52 e = _unpack(indexformatng, data[off:off + s])
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
53 append(e)
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
54 if e[1] < 0:
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
55 break
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
56 off += e[1] + s
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
57 else:
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
58 while off <= l:
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
59 e = _unpack(indexformatng, data[off:off + s])
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
60 append(e)
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
61 off += s
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
62
14421
639f26cab2f5 pure parsers: properly detect corrupt index files
Augie Fackler <durin42@gmail.com>
parents: 14064
diff changeset
63 if off != len(data):
639f26cab2f5 pure parsers: properly detect corrupt index files
Augie Fackler <durin42@gmail.com>
parents: 14064
diff changeset
64 raise ValueError('corrupt index file')
639f26cab2f5 pure parsers: properly detect corrupt index files
Augie Fackler <durin42@gmail.com>
parents: 14064
diff changeset
65
13435
90d7ce986565 pure: fix index parsing on empty repositories
Wagner Bruna <wbruna@softwareexpress.com.br>
parents: 13261
diff changeset
66 if index:
90d7ce986565 pure: fix index parsing on empty repositories
Wagner Bruna <wbruna@softwareexpress.com.br>
parents: 13261
diff changeset
67 e = list(index[0])
90d7ce986565 pure: fix index parsing on empty repositories
Wagner Bruna <wbruna@softwareexpress.com.br>
parents: 13261
diff changeset
68 type = gettype(e[0])
90d7ce986565 pure: fix index parsing on empty repositories
Wagner Bruna <wbruna@softwareexpress.com.br>
parents: 13261
diff changeset
69 e[0] = offset_type(0, type)
90d7ce986565 pure: fix index parsing on empty repositories
Wagner Bruna <wbruna@softwareexpress.com.br>
parents: 13261
diff changeset
70 index[0] = tuple(e)
7700
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
71
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
72 # add the magic null revision at -1
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
73 index.append((0, 0, 0, -1, -1, -1, -1, nullid))
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
74
13261
20a54bdf2328 pure: update index parsing
Matt Mackall <mpm@selenic.com>
parents: 13253
diff changeset
75 return index, cache
7700
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
76
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
77 def parse_dirstate(dmap, copymap, st):
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
78 parents = [st[:20], st[20: 40]]
17425
e95ec38f86b0 fix wording and not-completely-trivial spelling errors and bad docstrings
Mads Kiilerich <mads@kiilerich.com>
parents: 14995
diff changeset
79 # dereference fields so they will be local in loop
7945
94d7e14cfa42 pure/parsers: fix circular imports, import mercurial modules properly
Matt Mackall <mpm@selenic.com>
parents: 7873
diff changeset
80 format = ">cllll"
94d7e14cfa42 pure/parsers: fix circular imports, import mercurial modules properly
Matt Mackall <mpm@selenic.com>
parents: 7873
diff changeset
81 e_size = struct.calcsize(format)
7700
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
82 pos1 = 40
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
83 l = len(st)
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
84
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
85 # the inner loop
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
86 while pos1 < l:
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
87 pos2 = pos1 + e_size
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
88 e = _unpack(">cllll", st[pos1:pos2]) # a literal here is faster
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
89 pos1 = pos2 + e[4]
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
90 f = st[pos2:pos1]
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
91 if '\0' in f:
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
92 f, c = f.split('\0')
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
93 copymap[f] = c
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
94 dmap[f] = e[:4]
f410c552fa15 pure Python implementation of parsers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
95 return parents
18567
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
96
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
97 def pack_dirstate(dmap, copymap, pl, now):
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
98 now = int(now)
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
99 cs = cStringIO.StringIO()
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
100 write = cs.write
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
101 write("".join(pl))
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
102 for f, e in dmap.iteritems():
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
103 if e[0] == 'n' and e[3] == now:
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
104 # The file was last modified "simultaneously" with the current
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
105 # write to dirstate (i.e. within the same second for file-
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
106 # systems with a granularity of 1 sec). This commonly happens
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
107 # for at least a couple of files on 'update'.
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
108 # The user could change the file without changing its size
19652
187bf2dde7c1 pack_dirstate: only invalidate mtime for files written in the last second
Siddharth Agarwal <sid0@fb.com>
parents: 18567
diff changeset
109 # within the same second. Invalidate the file's mtime in
18567
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
110 # dirstate, forcing future 'status' calls to compare the
19652
187bf2dde7c1 pack_dirstate: only invalidate mtime for files written in the last second
Siddharth Agarwal <sid0@fb.com>
parents: 18567
diff changeset
111 # contents of the file if the size is the same. This prevents
187bf2dde7c1 pack_dirstate: only invalidate mtime for files written in the last second
Siddharth Agarwal <sid0@fb.com>
parents: 18567
diff changeset
112 # mistakenly treating such files as clean.
21809
e250b8300e6e parsers: inline fields of dirstate values in C version
Siddharth Agarwal <sid0@fb.com>
parents: 19652
diff changeset
113 e = dirstatetuple(e[0], e[1], e[2], -1)
18567
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
114 dmap[f] = e
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
115
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
116 if f in copymap:
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
117 f = "%s\0%s" % (f, copymap[f])
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
118 e = _pack(">cllll", e[0], e[1], e[2], e[3], len(f))
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
119 write(e)
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
120 write(f)
194e63c1ccb9 dirstate: move pure python dirstate packing to pure/parsers.py
Siddharth Agarwal <sid0@fb.com>
parents: 17425
diff changeset
121 return cs.getvalue()