Mercurial > hg
view mercurial/diffhelper.py @ 45390:7d24201b6447
worker: don't expose readinto() on _blockingreader since pickle is picky
The `pickle` module expects the input to be buffered and a whole
object to be available when `pickle.load()` is called, which is not
necessarily true when we send data from workers back to the parent
process (i.e., it seems like a bad assumption for the `pickle` module
to make). We added a workaround for that in
https://phab.mercurial-scm.org/D8076, which made `read()` continue
until all the requested bytes have been read.
As we found out at work after a lot of investigation (I've spent the
last two days on this), the native version of `pickle.load()` has
started calling `readinto()` on the input since Python 3.8. That
started being called in
https://github.com/python/cpython/commit/91f4380cedbae32b49adbea2518014a5624c6523
(and only by the C version of `pickle.load()`)). Before that, it was
only `read()` and `readline()` that were called. The problem with that
was that `readinto()` on our `_blockingreader` was simply delegating
to the underlying, *unbuffered* object. The symptom we saw was that
`hg fix` started failing sometimes on Python 3.8 on Mac. It failed
very relyable in some cases. I still haven't figured out under what
circumstances it fails and I've been unable to reproduce it in test
cases (I've tried writing larger amounts of data, using different
numbers of workers, and making the formatters sleep). I have, however,
been able to reproduce it 3-4 times on Linux, but then it stopped
reproducing on the following few hundred attempts.
To fix the problem, we can simply remove the implementation of
`readinto()`, since the unpickler will then fall back to calling
`read()`. The fallback was added a bit later, in
https://github.com/python/cpython/commit/b19f7ecfa3adc6ba1544225317b9473649815b38. However,
that commit also added checking that what `read()` returns is a
`bytes`, so we also need to convert the `bytearray` we use into
that. I was able to add a test for that failure at least.
Differential Revision: https://phab.mercurial-scm.org/D8928
author | Martin von Zweigbergk <martinvonz@google.com> |
---|---|
date | Fri, 14 Aug 2020 20:45:49 -0700 |
parents | 10f48720ef95 |
children | d4ba4d51f85f |
line wrap: on
line source
# diffhelper.py - helper routines for patch # # Copyright 2009 Matt Mackall <mpm@selenic.com> and others # # This software may be used and distributed according to the terms of the # GNU General Public License version 2 or any later version. from __future__ import absolute_import from .i18n import _ from . import ( error, pycompat, ) MISSING_NEWLINE_MARKER = b'\\ No newline at end of file\n' def addlines(fp, hunk, lena, lenb, a, b): """Read lines from fp into the hunk The hunk is parsed into two arrays, a and b. a gets the old state of the text, b gets the new state. The control char from the hunk is saved when inserting into a, but not b (for performance while deleting files.) """ while True: todoa = lena - len(a) todob = lenb - len(b) num = max(todoa, todob) if num == 0: break for i in pycompat.xrange(num): s = fp.readline() if not s: raise error.ParseError(_(b'incomplete hunk')) if s == MISSING_NEWLINE_MARKER: fixnewline(hunk, a, b) continue if s == b'\n' or s == b'\r\n': # Some patches may be missing the control char # on empty lines. Supply a leading space. s = b' ' + s hunk.append(s) if s.startswith(b'+'): b.append(s[1:]) elif s.startswith(b'-'): a.append(s) else: b.append(s[1:]) a.append(s) def fixnewline(hunk, a, b): """Fix up the last lines of a and b when the patch has no newline at EOF""" l = hunk[-1] # tolerate CRLF in last line if l.endswith(b'\r\n'): hline = l[:-2] else: hline = l[:-1] if hline.startswith((b' ', b'+')): b[-1] = hline[1:] if hline.startswith((b' ', b'-')): a[-1] = hline hunk[-1] = hline def testhunk(a, b, bstart): """Compare the lines in a with the lines in b a is assumed to have a control char at the start of each line, this char is ignored in the compare. """ alen = len(a) blen = len(b) if alen > blen - bstart or bstart < 0: return False for i in pycompat.xrange(alen): if a[i][1:] != b[i + bstart]: return False return True