annotate mercurial/pure/diffhelpers.py @ 35793:4fb2bb61597c

bundle2: increase payload part chunk size to 32kb Bundle2 payload parts are framed chunks. Esentially, we obtain data in equal size chunks of size `preferedchunksize` and emit those to a generator. That generator is fed into a compressor (which can be the no-op compressor, which just re-emits the generator). And the output from the compressor likely goes to a file descriptor or socket. What this means is that small chunk sizes create more Python objects and Python function calls than larger chunk sizes. And as we know, Python object and function call overhead in performance sensitive code matters (at least with CPython). This commit increases the bundle2 part payload chunk size from 4k to 32k. Practically speaking, this means that the chunks we feed into a compressor (implemented in C code) or feed directly into a file handle or socket write() are larger. It's possible the chunks might be larger than what the receiver can handle in one logical operation. But at that point, we're in C code, which is much more efficient at dealing with splitting up the chunk and making multiple function calls than Python is. A downside to larger chunks is that the receiver has to wait for that much data to arrive (either raw or from a decompressor) before it can process the chunk. But 32kb still feels like a small buffer to have to wait for. And in many cases, the client will convert from 8 read(4096) to 1 read(32768). That's happening in Python land. So we cut down on the number of Python objects and function calls, making the client faster as well. I don't think there are any significant concerns to increasing the payload chunk size to 32kb. The impact of this change on performance significant. Using `curl` to obtain a stream clone bundle2 payload from a server on localhost serving the mozilla-unified repository: before: 20.78 user; 7.71 system; 80.5 MB/s after: 13.90 user; 3.51 system; 132 MB/s legacy: 9.72 user; 8.16 system; 132 MB/s bundle2 stream clone generation is still more resource intensive than legacy stream clone (that's likely because of the use of a util.chunkbuffer). But the throughput is the same. We might be in territory we're this is effectively a benchmark of the networking stack or Python's syscall throughput. From the client perspective, `hg clone -U --stream`: before: 33.50 user; 7.95 system; 53.3 MB/s after: 22.82 user; 7.33 system; 72.7 MB/s legacy: 29.96 user; 7.94 system; 58.0 MB/s And for `hg clone --stream` with a working directory update of ~230k files: after: 119.55 user; 26.47 system; 0:57.08 wall legacy: 126.98 user; 26.94 system; 1:05.56 wall So, it appears that bundle2's stream clone is now definitively faster than legacy stream clone! Differential Revision: https://phab.mercurial-scm.org/D1932
author Gregory Szorc <gregory.szorc@gmail.com>
date Sat, 20 Jan 2018 22:55:42 -0800
parents 80214358ac88
children f53b55b162f4
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
7702
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
1 # diffhelpers.py - pure Python implementation of diffhelpers.c
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
2 #
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
3 # Copyright 2009 Matt Mackall <mpm@selenic.com> and others
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
4 #
8225
46293a0c7e9f updated license to be explicit about GPL version 2
Martin Geisler <mg@lazybytes.net>
parents: 7702
diff changeset
5 # This software may be used and distributed according to the terms of the
10263
25e572394f5c Update license to GPLv2+
Matt Mackall <mpm@selenic.com>
parents: 8225
diff changeset
6 # GNU General Public License version 2 or any later version.
7702
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
7
27336
80214358ac88 diffhelpers: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 12387
diff changeset
8 from __future__ import absolute_import
80214358ac88 diffhelpers: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 12387
diff changeset
9
7702
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
10 def addlines(fp, hunk, lena, lenb, a, b):
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
11 while True:
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
12 todoa = lena - len(a)
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
13 todob = lenb - len(b)
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
14 num = max(todoa, todob)
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
15 if num == 0:
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
16 break
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
17 for i in xrange(num):
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
18 s = fp.readline()
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
19 c = s[0]
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
20 if s == "\\ No newline at end of file\n":
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
21 fix_newline(hunk, a, b)
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
22 continue
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
23 if c == "\n":
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
24 # Some patches may be missing the control char
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
25 # on empty lines. Supply a leading space.
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
26 s = " \n"
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
27 hunk.append(s)
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
28 if c == "+":
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
29 b.append(s[1:])
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
30 elif c == "-":
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
31 a.append(s)
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
32 else:
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
33 b.append(s[1:])
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
34 a.append(s)
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
35 return 0
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
36
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
37 def fix_newline(hunk, a, b):
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
38 l = hunk[-1]
10551
f61dced1367a fix test-mq-eol under --pure (mimic diffhelper.c behaviour)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 10263
diff changeset
39 # tolerate CRLF in last line
f61dced1367a fix test-mq-eol under --pure (mimic diffhelper.c behaviour)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 10263
diff changeset
40 if l.endswith('\r\n'):
f61dced1367a fix test-mq-eol under --pure (mimic diffhelper.c behaviour)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 10263
diff changeset
41 hline = l[:-2]
f61dced1367a fix test-mq-eol under --pure (mimic diffhelper.c behaviour)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 10263
diff changeset
42 else:
f61dced1367a fix test-mq-eol under --pure (mimic diffhelper.c behaviour)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 10263
diff changeset
43 hline = l[:-1]
f61dced1367a fix test-mq-eol under --pure (mimic diffhelper.c behaviour)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 10263
diff changeset
44 c = hline[0]
7702
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
45
12387
4f8067c94729 cleanup: use x in (a, b) instead of x == a or x == b
Brodie Rao <brodie@bitheap.org>
parents: 10551
diff changeset
46 if c in " +":
10551
f61dced1367a fix test-mq-eol under --pure (mimic diffhelper.c behaviour)
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 10263
diff changeset
47 b[-1] = hline[1:]
12387
4f8067c94729 cleanup: use x in (a, b) instead of x == a or x == b
Brodie Rao <brodie@bitheap.org>
parents: 10551
diff changeset
48 if c in " -":
7702
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
49 a[-1] = hline
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
50 hunk[-1] = hline
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
51 return 0
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
52
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
53
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
54 def testhunk(a, b, bstart):
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
55 alen = len(a)
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
56 blen = len(b)
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
57 if alen > blen - bstart:
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
58 return -1
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
59 for i in xrange(alen):
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
60 if a[i][1:] != b[i + bstart]:
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
61 return -1
f6bb40554e34 pure Python implementation of diffhelpers.c
Martin Geisler <mg@daimi.au.dk>
parents:
diff changeset
62 return 0