commands: don't violate storage abstractions in `manifest --all`
Previously, we asked the store to emit its data files. For modern
repos, this would use fncache to resolve the set of files then would
stat() each file. For my copy of the mozilla-unified repository, this
took 3.3-10s depending on the state of my filesystem cache to render
449,790 items.
The previous behavior was a massive layering violation because it
assumed tracked files would have specific filenames in specific
directories. Alternate storage backends would violate this assumption.
The new behavior scans the changelog entries for the set of files
changed by each commit. It aggregates them into a set and then
sorts and prints the result. This reliably takes ~16.3s on my
machine. ~80% of the time is spent in zlib decompression.
The performance regression is unfortunate. If we want to claw it
back, we can create a proper storage API to query for the set of
tracked files. I'm not opposed to doing that. But I'm in no hurry
because I suspect ~0 people care about the performance of
`hg manifest --all`.
.. perf::
`hg manifest --all` is likely slower due to changing its
implementation to respect storage interface boundaries. If you
are impacted by this regression in a meaningful way, please make
noise on the development mailing list and it can be dealt with.
Differential Revision: https://phab.mercurial-scm.org/D3119
Corrupt an hg repo with two pulls.
create one repo with a long history
$ hg init source1
$ cd source1
$ touch foo
$ hg add foo
$ for i in 1 2 3 4 5 6 7 8 9 10; do
> echo $i >> foo
> hg ci -m $i
> done
$ cd ..
create one repo with a shorter history
$ hg clone -r 0 source1 source2
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files
new changesets 495a0ec48aaf
updating to branch default
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
$ cd source2
$ echo a >> foo
$ hg ci -m a
$ cd ..
create a third repo to pull both other repos into it
$ hg init corrupted
$ cd corrupted
use a hook to make the second pull start while the first one is still running
$ echo '[hooks]' >> .hg/hgrc
$ echo 'prechangegroup = sleep 5' >> .hg/hgrc
start a pull...
$ hg pull ../source1 > pull.out 2>&1 &
... and start another pull before the first one has finished
$ sleep 1
$ hg pull ../source2 2>/dev/null
pulling from ../source2
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files (+1 heads)
new changesets ca3c05af513e
(run 'hg heads' to see heads, 'hg merge' to merge)
$ cat pull.out
pulling from ../source1
requesting all changes
adding changesets
adding manifests
adding file changes
added 10 changesets with 10 changes to 1 files
new changesets 495a0ec48aaf:1e7b6c812ca8
(run 'hg update' to get a working copy)
see the result
$ wait
$ hg verify
checking changesets
checking manifests
crosschecking files in changesets and manifests
checking files
1 files, 11 changesets, 11 total revisions
$ cd ..