Mercurial > hg
view tests/test-narrow-expanddirstate.t @ 38732:be4984261611
merge: mark file gets as not thread safe (issue5933)
In default installs, this has the effect of disabling the thread-based
worker on Windows when manifesting files in the working directory. My
measurements have shown that with revlog-based repositories, Mercurial
spends a lot of CPU time in revlog code resolving file data. This ends
up incurring a lot of context switching across threads and slows down
`hg update` operations when going from an empty working directory to
the tip of the repo.
On mozilla-unified (246,351 files) on an i7-6700K (4+4 CPUs):
before: 487s wall
after: 360s wall (equivalent to worker.enabled=false)
cpus=2: 379s wall
Even with only 2 threads, the thread pool is still slower.
The introduction of the thread-based worker (02b36e860e0b) states that
it resulted in a "~50%" speedup for `hg sparse --enable-profile` and
`hg sparse --disable-profile`. This disagrees with my measurement
above. I theorize a few reasons for this:
1) Removal of files from the working directory is I/O - not CPU - bound
and should benefit from a thread pool (unless I/O is insanely fast
and the GIL release is near instantaneous). So tests like `hg sparse
--enable-profile` may exercise deletion throughput and aren't good
benchmarks for worker tasks that are CPU heavy.
2) The patch was authored by someone at Facebook. The results were
likely measured against a repository using remotefilelog. And I
believe that revision retrieval during working directory updates with
remotefilelog will often use a remote store, thus being I/O and not
CPU bound. This probably resulted in an overstated performance gain.
Since there appears to be a need to enable the thread-based worker with
some stores, I've made the flagging of file gets as thread safe
configurable. I've made it experimental because I don't want to formalize
a boolean flag for this option and because this attribute is best
captured against the store implementation. But we don't have a proper
store API for this yet. I'd rather cross this bridge later.
It is possible there are revlog-based repositories that do benefit from
a thread-based worker. I didn't do very comprehensive testing. If there
are, we may want to devise a more proper algorithm for whether to use
the thread-based worker, including possibly config options to limit the
number of threads to use. But until I see evidence that justifies
complexity, simplicity wins.
Differential Revision: https://phab.mercurial-scm.org/D3963
author | Gregory Szorc <gregory.szorc@gmail.com> |
---|---|
date | Wed, 18 Jul 2018 09:49:34 -0700 |
parents | 1cba497491be |
children | fa64a229f24b |
line wrap: on
line source
$ . "$TESTDIR/narrow-library.sh" $ hg init master $ cd master $ mkdir inside $ echo inside > inside/f1 $ mkdir outside $ echo outside > outside/f2 $ mkdir patchdir $ echo patch_this > patchdir/f3 $ hg ci -Aqm 'initial' $ cd .. $ hg clone --narrow ssh://user@dummy/master narrow --include inside requesting all changes adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files new changesets dff6a2a6d433 updating to branch default 1 files updated, 0 files merged, 0 files removed, 0 files unresolved $ cd narrow $ mkdir outside $ echo other_contents > outside/f2 $ grep outside .hg/narrowspec [1] $ grep outside .hg/dirstate [1] $ hg status `hg status` did not add outside. $ grep outside .hg/narrowspec [1] $ grep outside .hg/dirstate [1] Unfortunately this is not really a candidate for adding to narrowhg proper, since it depends on some other source for providing the manifests (when using treemanifests) and file contents. Something like a virtual filesystem and/or remotefilelog. We want to be useful when not using those systems, so we do not have this method available in narrowhg proper at the moment. $ cat > "$TESTTMP/expand_extension.py" <<EOF > import os > import sys > > from mercurial import encoding > from mercurial import extensions > from mercurial import localrepo > from mercurial import match as matchmod > from mercurial import narrowspec > from mercurial import patch > from mercurial import util as hgutil > > def expandnarrowspec(ui, repo, newincludes=None): > if not newincludes: > return > import sys > newincludes = set([newincludes]) > includes, excludes = repo.narrowpats > currentmatcher = narrowspec.match(repo.root, includes, excludes) > includes = includes | newincludes > if not repo.currenttransaction(): > ui.develwarn(b'expandnarrowspec called outside of transaction!') > repo.setnarrowpats(includes, excludes) > newmatcher = narrowspec.match(repo.root, includes, excludes) > added = matchmod.differencematcher(newmatcher, currentmatcher) > for f in repo[b'.'].manifest().walk(added): > repo.dirstate.normallookup(f) > > def wrapds(ui, repo, ds): > class expandingdirstate(ds.__class__): > @hgutil.propertycache > def _map(self): > ret = super(expandingdirstate, self)._map > with repo.wlock(), repo.lock(), repo.transaction( > b'expandnarrowspec'): > expandnarrowspec(ui, repo, > encoding.environ.get(b'DIRSTATEINCLUDES')) > return ret > ds.__class__ = expandingdirstate > return ds > > def reposetup(ui, repo): > class expandingrepo(repo.__class__): > def _makedirstate(self): > dirstate = super(expandingrepo, self)._makedirstate() > return wrapds(ui, repo, dirstate) > repo.__class__ = expandingrepo > > def extsetup(unused_ui): > def overridepatch(orig, ui, repo, *args, **kwargs): > with repo.wlock(): > expandnarrowspec(ui, repo, encoding.environ.get(b'PATCHINCLUDES')) > return orig(ui, repo, *args, **kwargs) > > extensions.wrapfunction(patch, b'patch', overridepatch) > EOF $ cat >> ".hg/hgrc" <<EOF > [extensions] > expand_extension = $TESTTMP/expand_extension.py > EOF Since we do not have the ability to rely on a virtual filesystem or remotefilelog in the test, we just fake it by copying the data from the 'master' repo. $ cp -a ../master/.hg/store/data/* .hg/store/data Do that for patchdir as well. $ cp -a ../master/patchdir . `hg status` will now add outside, but not patchdir. $ DIRSTATEINCLUDES=path:outside hg status M outside/f2 $ grep outside .hg/narrowspec path:outside $ grep outside .hg/dirstate > /dev/null $ grep patchdir .hg/narrowspec [1] $ grep patchdir .hg/dirstate [1] Get rid of the modification to outside/f2. $ hg update -C . 1 files updated, 0 files merged, 0 files removed, 0 files unresolved This patch will not apply cleanly at the moment, so `hg import` will break $ cat > "$TESTTMP/foo.patch" <<EOF > --- patchdir/f3 > +++ patchdir/f3 > @@ -1,1 +1,1 @@ > -this should be "patch_this", but its not, so patch fails > +this text is irrelevant > EOF $ PATCHINCLUDES=path:patchdir hg import -p0 -e "$TESTTMP/foo.patch" -m ignored applying $TESTTMP/foo.patch patching file patchdir/f3 Hunk #1 FAILED at 0 1 out of 1 hunks FAILED -- saving rejects to file patchdir/f3.rej abort: patch failed to apply [255] $ grep patchdir .hg/narrowspec [1] $ grep patchdir .hg/dirstate > /dev/null [1] Let's make it apply cleanly and see that it *did* expand properly $ cat > "$TESTTMP/foo.patch" <<EOF > --- patchdir/f3 > +++ patchdir/f3 > @@ -1,1 +1,1 @@ > -patch_this > +patched_this > EOF $ PATCHINCLUDES=path:patchdir hg import -p0 -e "$TESTTMP/foo.patch" -m message applying $TESTTMP/foo.patch $ cat patchdir/f3 patched_this $ grep patchdir .hg/narrowspec path:patchdir $ grep patchdir .hg/dirstate > /dev/null