annotate mercurial/pure/mpatch.py @ 48178:f12a19d03d2c

fix: reduce number of tool executions By grouping together (path, ctx) pairs according to the inputs they would provide to fixer tools, we can deduplicate executions of fixer tools to significantly reduce the amount of time spent running slow tools. This change does not handle clean files in the working copy, which could still be deduplicated against the files in the checked out commit. It's a little harder to do that because the filerev is not available in the workingfilectx (and it doesn't exist for added files). Anecdotally, this change makes some real uses cases at Google 10x faster. I think we were originally hesitant to do this because the benefits weren't obvious, and implementing it efficiently is kind of tricky. If we simply memoized the formatter execution function, we would be keeping tons of file content in memory. Also included is a regression test for a corner case that I broke with my first attempt at optimizing this code. Differential Revision: https://phab.mercurial-scm.org/D11280
author Danny Hooper <hooper@google.com>
date Thu, 02 Sep 2021 14:08:45 -0700
parents d4ba4d51f85f
children 5aafc3c5bdec
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
7699
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
1 # mpatch.py - Python implementation of mpatch.c
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
2 #
46819
d4ba4d51f85f contributor: change mentions of mpm to olivia
Raphaël Gomès <rgomes@octobus.net>
parents: 45942
diff changeset
3 # Copyright 2009 Olivia Mackall <olivia@selenic.com> and others
7699
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
4 #
8225
46293a0c7e9f updated license to be explicit about GPL version 2
Martin Geisler <mg@lazybytes.net>
parents: 7775
diff changeset
5 # This software may be used and distributed according to the terms of the
10263
25e572394f5c Update license to GPLv2+
Matt Mackall <mpm@selenic.com>
parents: 8225
diff changeset
6 # GNU General Public License version 2 or any later version.
7699
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
7
27337
9a17576103a4 mpatch: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 16683
diff changeset
8 from __future__ import absolute_import
9a17576103a4 mpatch: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 16683
diff changeset
9
7775
5280c39778b6 pure/mpatch: use StringIO instead of mmap (issue1493)
Martin Geisler <mg@daimi.au.dk>
parents: 7699
diff changeset
10 import struct
27337
9a17576103a4 mpatch: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 16683
diff changeset
11
32512
0e8b0b9a7acc cffi: split modules from pure
Yuya Nishihara <yuya@tcha.org>
parents: 32506
diff changeset
12 from .. import pycompat
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36958
diff changeset
13
36958
644a02f6b34f util: prefer "bytesio" to "stringio"
Gregory Szorc <gregory.szorc@gmail.com>
parents: 34435
diff changeset
14 stringio = pycompat.bytesio
7699
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
15
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36958
diff changeset
16
28782
f736f98e16ca mpatch: unify mpatchError (issue5182)
timeless <timeless@mozdev.org>
parents: 28589
diff changeset
17 class mpatchError(Exception):
45942
89a2afe31e82 formating: upgrade to black 20.8b1
Augie Fackler <raf@durin42.com>
parents: 43077
diff changeset
18 """error raised when a delta cannot be decoded"""
28782
f736f98e16ca mpatch: unify mpatchError (issue5182)
timeless <timeless@mozdev.org>
parents: 28589
diff changeset
19
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36958
diff changeset
20
7699
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
21 # This attempts to apply a series of patches in time proportional to
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
22 # the total size of the patches, rather than patches * len(text). This
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
23 # means rather than shuffling strings around, we shuffle around
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
24 # pointers to fragments with fragment lists.
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
25 #
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
26 # When the fragment lists get too long, we collapse them. To do this
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
27 # efficiently, we do all our operations inside a buffer created by
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
28 # mmap and simply use memmove. This avoids creating a bunch of large
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
29 # temporary string buffers.
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
30
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36958
diff changeset
31
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36958
diff changeset
32 def _pull(dst, src, l): # pull l bytes from src
28587
76d7cab13f04 mpatch: move pull() method to top level
Augie Fackler <augie@google.com>
parents: 27337
diff changeset
33 while l:
76d7cab13f04 mpatch: move pull() method to top level
Augie Fackler <augie@google.com>
parents: 27337
diff changeset
34 f = src.pop()
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36958
diff changeset
35 if f[0] > l: # do we need to split?
28587
76d7cab13f04 mpatch: move pull() method to top level
Augie Fackler <augie@google.com>
parents: 27337
diff changeset
36 src.append((f[0] - l, f[1] + l))
76d7cab13f04 mpatch: move pull() method to top level
Augie Fackler <augie@google.com>
parents: 27337
diff changeset
37 dst.append((l, f[1]))
76d7cab13f04 mpatch: move pull() method to top level
Augie Fackler <augie@google.com>
parents: 27337
diff changeset
38 return
76d7cab13f04 mpatch: move pull() method to top level
Augie Fackler <augie@google.com>
parents: 27337
diff changeset
39 dst.append(f)
76d7cab13f04 mpatch: move pull() method to top level
Augie Fackler <augie@google.com>
parents: 27337
diff changeset
40 l -= f[0]
76d7cab13f04 mpatch: move pull() method to top level
Augie Fackler <augie@google.com>
parents: 27337
diff changeset
41
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36958
diff changeset
42
28588
6546afde350e mpatch: un-nest the move() method
Augie Fackler <augie@google.com>
parents: 28587
diff changeset
43 def _move(m, dest, src, count):
6546afde350e mpatch: un-nest the move() method
Augie Fackler <augie@google.com>
parents: 28587
diff changeset
44 """move count bytes from src to dest
6546afde350e mpatch: un-nest the move() method
Augie Fackler <augie@google.com>
parents: 28587
diff changeset
45
6546afde350e mpatch: un-nest the move() method
Augie Fackler <augie@google.com>
parents: 28587
diff changeset
46 The file pointer is left at the end of dest.
6546afde350e mpatch: un-nest the move() method
Augie Fackler <augie@google.com>
parents: 28587
diff changeset
47 """
6546afde350e mpatch: un-nest the move() method
Augie Fackler <augie@google.com>
parents: 28587
diff changeset
48 m.seek(src)
6546afde350e mpatch: un-nest the move() method
Augie Fackler <augie@google.com>
parents: 28587
diff changeset
49 buf = m.read(count)
6546afde350e mpatch: un-nest the move() method
Augie Fackler <augie@google.com>
parents: 28587
diff changeset
50 m.seek(dest)
6546afde350e mpatch: un-nest the move() method
Augie Fackler <augie@google.com>
parents: 28587
diff changeset
51 m.write(buf)
6546afde350e mpatch: un-nest the move() method
Augie Fackler <augie@google.com>
parents: 28587
diff changeset
52
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36958
diff changeset
53
28589
c4c7be9f0554 mpatch: move collect() to module level
Augie Fackler <augie@google.com>
parents: 28588
diff changeset
54 def _collect(m, buf, list):
c4c7be9f0554 mpatch: move collect() to module level
Augie Fackler <augie@google.com>
parents: 28588
diff changeset
55 start = buf
c4c7be9f0554 mpatch: move collect() to module level
Augie Fackler <augie@google.com>
parents: 28588
diff changeset
56 for l, p in reversed(list):
c4c7be9f0554 mpatch: move collect() to module level
Augie Fackler <augie@google.com>
parents: 28588
diff changeset
57 _move(m, buf, p, l)
c4c7be9f0554 mpatch: move collect() to module level
Augie Fackler <augie@google.com>
parents: 28588
diff changeset
58 buf += l
c4c7be9f0554 mpatch: move collect() to module level
Augie Fackler <augie@google.com>
parents: 28588
diff changeset
59 return (buf - start, start)
c4c7be9f0554 mpatch: move collect() to module level
Augie Fackler <augie@google.com>
parents: 28588
diff changeset
60
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36958
diff changeset
61
7699
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
62 def patches(a, bins):
10282
08a0f04b56bd many, many trivial check-code fixups
Matt Mackall <mpm@selenic.com>
parents: 10263
diff changeset
63 if not bins:
08a0f04b56bd many, many trivial check-code fixups
Matt Mackall <mpm@selenic.com>
parents: 10263
diff changeset
64 return a
7699
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
65
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
66 plens = [len(x) for x in bins]
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
67 pl = sum(plens)
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
68 bl = len(a) + pl
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36958
diff changeset
69 tl = bl + bl + pl # enough for the patches and two working texts
7699
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
70 b1, b2 = 0, bl
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
71
10282
08a0f04b56bd many, many trivial check-code fixups
Matt Mackall <mpm@selenic.com>
parents: 10263
diff changeset
72 if not tl:
08a0f04b56bd many, many trivial check-code fixups
Matt Mackall <mpm@selenic.com>
parents: 10263
diff changeset
73 return a
7699
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
74
28861
86db5cb55d46 pycompat: switch to util.stringio for py3 compat
timeless <timeless@mozdev.org>
parents: 28782
diff changeset
75 m = stringio()
7699
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
76
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
77 # load our original text
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
78 m.write(a)
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
79 frags = [(len(a), b1)]
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
80
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
81 # copy all the patches into our segment so we can memmove from them
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
82 pos = b2 + bl
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
83 m.seek(pos)
34435
5326e4ef1dab style: never put multiple statements on one line
Alex Gaynor <agaynor@mozilla.com>
parents: 32512
diff changeset
84 for p in bins:
5326e4ef1dab style: never put multiple statements on one line
Alex Gaynor <agaynor@mozilla.com>
parents: 32512
diff changeset
85 m.write(p)
7699
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
86
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
87 for plen in plens:
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
88 # if our list gets too long, execute it
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
89 if len(frags) > 128:
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
90 b2, b1 = b1, b2
28589
c4c7be9f0554 mpatch: move collect() to module level
Augie Fackler <augie@google.com>
parents: 28588
diff changeset
91 frags = [_collect(m, b1, frags)]
7699
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
92
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
93 new = []
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
94 end = pos + plen
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
95 last = 0
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
96 while pos < end:
7775
5280c39778b6 pure/mpatch: use StringIO instead of mmap (issue1493)
Martin Geisler <mg@daimi.au.dk>
parents: 7699
diff changeset
97 m.seek(pos)
28782
f736f98e16ca mpatch: unify mpatchError (issue5182)
timeless <timeless@mozdev.org>
parents: 28589
diff changeset
98 try:
43077
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
99 p1, p2, l = struct.unpack(b">lll", m.read(12))
28782
f736f98e16ca mpatch: unify mpatchError (issue5182)
timeless <timeless@mozdev.org>
parents: 28589
diff changeset
100 except struct.error:
43077
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
101 raise mpatchError(b"patch cannot be decoded")
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36958
diff changeset
102 _pull(new, frags, p1 - last) # what didn't change
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36958
diff changeset
103 _pull([], frags, p2 - p1) # what got deleted
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36958
diff changeset
104 new.append((l, pos + 12)) # what got added
7699
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
105 pos += l + 12
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
106 last = p2
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36958
diff changeset
107 frags.extend(reversed(new)) # what was left at the end
7699
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
108
28589
c4c7be9f0554 mpatch: move collect() to module level
Augie Fackler <augie@google.com>
parents: 28588
diff changeset
109 t = _collect(m, b2, frags)
7699
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
110
7775
5280c39778b6 pure/mpatch: use StringIO instead of mmap (issue1493)
Martin Geisler <mg@daimi.au.dk>
parents: 7699
diff changeset
111 m.seek(t[1])
5280c39778b6 pure/mpatch: use StringIO instead of mmap (issue1493)
Martin Geisler <mg@daimi.au.dk>
parents: 7699
diff changeset
112 return m.read(t[0])
7699
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
113
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36958
diff changeset
114
7699
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
115 def patchedsize(orig, delta):
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
116 outlen, last, bin = 0, 0, 0
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
117 binend = len(delta)
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
118 data = 12
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
119
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
120 while data <= binend:
43076
2372284d9457 formatting: blacken the codebase
Augie Fackler <augie@google.com>
parents: 36958
diff changeset
121 decode = delta[bin : bin + 12]
43077
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
122 start, end, length = struct.unpack(b">lll", decode)
7699
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
123 if start > end:
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
124 break
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
125 bin = data + length
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
126 data = bin + 12
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
127 outlen += start - last
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
128 last = end
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
129 outlen += length
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
130
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
131 if bin != binend:
43077
687b865b95ad formatting: byteify all mercurial/ and hgext/ string literals
Augie Fackler <augie@google.com>
parents: 43076
diff changeset
132 raise mpatchError(b"patch cannot be decoded")
7699
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
133
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
134 outlen += orig - last
fac054f84600 pure Python implementation of mpatch.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
135 return outlen