revlog: automatically read from opened file handles
The revlog reading code commonly opens a new file handle for
reading on demand. There is support for passing a file handle
to revlog.revision(). But it is marked as an internal argument.
When revlogs are written, we write() data as it is available. But
we don't flush() data until all revisions are written.
Putting these two traits together, it is possible for an in-process
revlog reader during active writes to trigger the opening of a new
file handle on a file with unflushed writes. The reader won't have
access to all "available" revlog data (as it hasn't been flushed).
And with the introduction of the previous patch, this can lead to
the revlog raising an error due to a partial read.
I witnessed this behavior when applying changegroup data (via
`hg pull`) before issue6006 was fixed via different means. Having
this and the previous patch in play would have helped cause errors
earlier rather than manifesting as hash verification failures.
While this has been a long-standing issue, I believe the relatively
new delta computation code has tickled it into being more common.
This is because the new delta computation code will compute deltas
in more scenarios. This can lead to revlog reading. While the delta
computation code is probably supposed to reuse file handles, it
appears it isn't doing so in all circumstances.
But the issue runs deeper than that. Theoretically, any code can
access revision data during revlog writes. It appears we were just
getting lucky that it wasn't. (The "add revision callback" passed to
addgroup() provides an avenue to do this.)
If I changed the revlog's behavior to not cache the full revision
text or to clear caches after revision insertion during addgroup(),
I was able to produce crashes 100% of the time when writing changelog
revisions. This is because changelog's add revision callback attempts
to resolve the revision data to access the changed files list. And
without the revision's fulltext being cached, we performed a revlog
read, which required opening a new file handle. This attempted to read
unflushed data, leading to a partial read and a crash.
This commit teaches the revlog to store the file handles used for
writing multiple revisions during addgroup(). It also teaches the
code for resolving a file handle when reading to use these handles,
if available. This ensures that *any* reads (regardless of their
source) use the active writing file handles, if available. These
file handles have access to the unflushed data because they wrote it.
This allows reads to complete without issue.
Differential Revision: https://phab.mercurial-scm.org/D5267
hg debuginstall
$ hg debuginstall
checking encoding (ascii)...
checking Python executable (*) (glob)
checking Python version (2.*) (glob)
checking Python lib (*lib*)... (glob)
checking Python security support (*) (glob)
TLS 1.2 not supported by Python install; network connections lack modern security (?)
SNI not supported by Python install; may have connectivity issues with some servers (?)
checking Mercurial version (*) (glob)
checking Mercurial custom build (*) (glob)
checking module policy (*) (glob)
checking installed modules (*mercurial)... (glob)
checking registered compression engines (*zlib*) (glob)
checking available compression engines (*zlib*) (glob)
checking available compression engines for wire protocol (*zlib*) (glob)
checking "re2" regexp engine \((available|missing)\) (re)
checking templates (*mercurial?templates)... (glob)
checking default template (*mercurial?templates?map-cmdline.default) (glob)
checking commit editor... (*) (glob)
checking username (test)
no problems detected
hg debuginstall JSON
$ hg debuginstall -Tjson | sed 's|\\\\|\\|g'
[
{
"compengines": ["bz2", "bz2truncated", "none", "zlib"*], (glob)
"compenginesavail": ["bz2", "bz2truncated", "none", "zlib"*], (glob)
"compenginesserver": [*"zlib"*], (glob)
"defaulttemplate": "*mercurial?templates?map-cmdline.default", (glob)
"defaulttemplateerror": null,
"defaulttemplatenotfound": "default",
"editor": "*", (glob)
"editornotfound": false,
"encoding": "ascii",
"encodingerror": null,
"extensionserror": null, (no-pure !)
"hgmodulepolicy": "*", (glob)
"hgmodules": "*mercurial", (glob)
"hgver": "*", (glob)
"hgverextra": "*", (glob)
"problems": 0,
"pythonexe": "*", (glob)
"pythonlib": "*", (glob)
"pythonsecurity": [*], (glob)
"pythonver": "*.*.*", (glob)
"re2": (true|false), (re)
"templatedirs": "*mercurial?templates", (glob)
"username": "test",
"usernameerror": null,
"vinotfound": false
}
]
hg debuginstall with no username
$ HGUSER= hg debuginstall
checking encoding (ascii)...
checking Python executable (*) (glob)
checking Python version (2.*) (glob)
checking Python lib (*lib*)... (glob)
checking Python security support (*) (glob)
TLS 1.2 not supported by Python install; network connections lack modern security (?)
SNI not supported by Python install; may have connectivity issues with some servers (?)
checking Mercurial version (*) (glob)
checking Mercurial custom build (*) (glob)
checking module policy (*) (glob)
checking installed modules (*mercurial)... (glob)
checking registered compression engines (*zlib*) (glob)
checking available compression engines (*zlib*) (glob)
checking available compression engines for wire protocol (*zlib*) (glob)
checking "re2" regexp engine \((available|missing)\) (re)
checking templates (*mercurial?templates)... (glob)
checking default template (*mercurial?templates?map-cmdline.default) (glob)
checking commit editor... (*) (glob)
checking username...
no username supplied
(specify a username in your configuration file)
1 problems detected, please check your install!
[1]
hg debuginstall with invalid encoding
$ HGENCODING=invalidenc hg debuginstall | grep encoding
checking encoding (invalidenc)...
unknown encoding: invalidenc
exception message in JSON
$ HGENCODING=invalidenc HGUSER= hg debuginstall -Tjson | grep error
"defaulttemplateerror": null,
"encodingerror": "unknown encoding: invalidenc",
"extensionserror": null, (no-pure !)
"usernameerror": "no username supplied",
path variables are expanded (~ is the same as $TESTTMP)
$ mkdir tools
$ touch tools/testeditor.exe
#if execbit
$ chmod 755 tools/testeditor.exe
#endif
$ HGEDITOR="~/tools/testeditor.exe" hg debuginstall
checking encoding (ascii)...
checking Python executable (*) (glob)
checking Python version (*) (glob)
checking Python lib (*lib*)... (glob)
checking Python security support (*) (glob)
TLS 1.2 not supported by Python install; network connections lack modern security (?)
SNI not supported by Python install; may have connectivity issues with some servers (?)
checking Mercurial version (*) (glob)
checking Mercurial custom build (*) (glob)
checking module policy (*) (glob)
checking installed modules (*mercurial)... (glob)
checking registered compression engines (*zlib*) (glob)
checking available compression engines (*zlib*) (glob)
checking available compression engines for wire protocol (*zlib*) (glob)
checking "re2" regexp engine \((available|missing)\) (re)
checking templates (*mercurial?templates)... (glob)
checking default template (*mercurial?templates?map-cmdline.default) (glob)
checking commit editor... ($TESTTMP/tools/testeditor.exe)
checking username (test)
no problems detected
print out the binary post-shlexsplit in the error message when commit editor is
not found (this is intentionally using backslashes to mimic a windows usecase).
$ HGEDITOR="c:\foo\bar\baz.exe -y -z" hg debuginstall
checking encoding (ascii)...
checking Python executable (*) (glob)
checking Python version (*) (glob)
checking Python lib (*lib*)... (glob)
checking Python security support (*) (glob)
TLS 1.2 not supported by Python install; network connections lack modern security (?)
SNI not supported by Python install; may have connectivity issues with some servers (?)
checking Mercurial version (*) (glob)
checking Mercurial custom build (*) (glob)
checking module policy (*) (glob)
checking installed modules (*mercurial)... (glob)
checking registered compression engines (*zlib*) (glob)
checking available compression engines (*zlib*) (glob)
checking available compression engines for wire protocol (*zlib*) (glob)
checking "re2" regexp engine \((available|missing)\) (re)
checking templates (*mercurial?templates)... (glob)
checking default template (*mercurial?templates?map-cmdline.default) (glob)
checking commit editor... (c:\foo\bar\baz.exe) (windows !)
Can't find editor 'c:\foo\bar\baz.exe' in PATH (windows !)
checking commit editor... (c:foobarbaz.exe) (no-windows !)
Can't find editor 'c:foobarbaz.exe' in PATH (no-windows !)
(specify a commit editor in your configuration file)
checking username (test)
1 problems detected, please check your install!
[1]
#if test-repo
$ . "$TESTDIR/helpers-testrepo.sh"
$ cat >> wixxml.py << EOF
> import os
> import subprocess
> import sys
> import xml.etree.ElementTree as ET
>
> # MSYS mangles the path if it expands $TESTDIR
> testdir = os.environ['TESTDIR']
> ns = {'wix' : 'http://schemas.microsoft.com/wix/2006/wi'}
>
> def directory(node, relpath):
> '''generator of files in the xml node, rooted at relpath'''
> dirs = node.findall('./{%(wix)s}Directory' % ns)
>
> for d in dirs:
> for subfile in directory(d, relpath + d.attrib['Name'] + '/'):
> yield subfile
>
> files = node.findall('./{%(wix)s}Component/{%(wix)s}File' % ns)
>
> for f in files:
> yield relpath + f.attrib['Name']
>
> def hgdirectory(relpath):
> '''generator of tracked files, rooted at relpath'''
> hgdir = "%s/../mercurial" % (testdir)
> args = ['hg', '--cwd', hgdir, 'files', relpath]
> proc = subprocess.Popen(args, stdout=subprocess.PIPE,
> stderr=subprocess.PIPE)
> output = proc.communicate()[0]
>
> slash = '/'
> for line in output.splitlines():
> if os.name == 'nt':
> yield line.replace(os.sep, slash)
> else:
> yield line
>
> tracked = [f for f in hgdirectory(sys.argv[1])]
>
> xml = ET.parse("%s/../contrib/wix/%s.wxs" % (testdir, sys.argv[1]))
> root = xml.getroot()
> dir = root.find('.//{%(wix)s}DirectoryRef' % ns)
>
> installed = [f for f in directory(dir, '')]
>
> print('Not installed:')
> for f in sorted(set(tracked) - set(installed)):
> print(' %s' % f)
>
> print('Not tracked:')
> for f in sorted(set(installed) - set(tracked)):
> print(' %s' % f)
> EOF
$ ( testrepohgenv; "$PYTHON" wixxml.py help )
Not installed:
help/common.txt
help/hg-ssh.8.txt
help/hg.1.txt
help/hgignore.5.txt
help/hgrc.5.txt
Not tracked:
$ ( testrepohgenv; "$PYTHON" wixxml.py templates )
Not installed:
Not tracked:
#endif
#if virtualenv
Verify that Mercurial is installable with pip. Note that this MUST be
the last test in this file, because we do some nasty things to the
shell environment in order to make the virtualenv work reliably.
$ cd $TESTTMP
Note: --no-site-packages is deprecated, but some places have an
ancient virtualenv from their linux distro or similar and it's not yet
the default for them.
$ unset PYTHONPATH
$ "$PYTHON" -m virtualenv --no-site-packages --never-download installenv >> pip.log
Note: we use this weird path to run pip and hg to avoid platform differences,
since it's bin on most platforms but Scripts on Windows.
$ ./installenv/*/pip install --no-index $TESTDIR/.. >> pip.log
$ ./installenv/*/hg debuginstall || cat pip.log
checking encoding (ascii)...
checking Python executable (*) (glob)
checking Python version (2.*) (glob)
checking Python lib (*)... (glob)
checking Python security support (*) (glob)
TLS 1.2 not supported by Python install; network connections lack modern security (?)
SNI not supported by Python install; may have connectivity issues with some servers (?)
checking Mercurial version (*) (glob)
checking Mercurial custom build (*) (glob)
checking module policy (*) (glob)
checking installed modules (*/mercurial)... (glob)
checking registered compression engines (*) (glob)
checking available compression engines (*) (glob)
checking available compression engines for wire protocol (*) (glob)
checking "re2" regexp engine \((available|missing)\) (re)
checking templates ($TESTTMP/installenv/*/site-packages/mercurial/templates)... (glob)
checking default template ($TESTTMP/installenv/*/site-packages/mercurial/templates/map-cmdline.default) (glob)
checking commit editor... (*) (glob)
checking username (test)
no problems detected
#endif