Mercurial > hg-stable
changeset 37438:7b7ca9ba2de5
commands: don't violate storage abstractions in `manifest --all`
Previously, we asked the store to emit its data files. For modern
repos, this would use fncache to resolve the set of files then would
stat() each file. For my copy of the mozilla-unified repository, this
took 3.3-10s depending on the state of my filesystem cache to render
449,790 items.
The previous behavior was a massive layering violation because it
assumed tracked files would have specific filenames in specific
directories. Alternate storage backends would violate this assumption.
The new behavior scans the changelog entries for the set of files
changed by each commit. It aggregates them into a set and then
sorts and prints the result. This reliably takes ~16.3s on my
machine. ~80% of the time is spent in zlib decompression.
The performance regression is unfortunate. If we want to claw it
back, we can create a proper storage API to query for the set of
tracked files. I'm not opposed to doing that. But I'm in no hurry
because I suspect ~0 people care about the performance of
`hg manifest --all`.
.. perf::
`hg manifest --all` is likely slower due to changing its
implementation to respect storage interface boundaries. If you
are impacted by this regression in a meaningful way, please make
noise on the development mailing list and it can be dealt with.
Differential Revision: https://phab.mercurial-scm.org/D3119
author | Gregory Szorc <gregory.szorc@gmail.com> |
---|---|
date | Wed, 04 Apr 2018 21:27:02 -0700 |
parents | 814e080a1215 |
children | 556984ae0005 |
files | mercurial/commands.py tests/test-convert-git.t tests/test-manifest.t |
diffstat | 3 files changed, 10 insertions(+), 16 deletions(-) [+] |
line wrap: on
line diff
--- a/mercurial/commands.py Wed Apr 04 21:09:47 2018 -0700 +++ b/mercurial/commands.py Wed Apr 04 21:27:02 2018 -0700 @@ -3491,19 +3491,13 @@ if rev or node: raise error.Abort(_("can't specify a revision with --all")) - res = [] - # TODO this is a massive layering violation. It assumes the repo is - # backed by revlogs with a well-defined naming scheme. - prefix = "data/" - suffix = ".i" - plen = len(prefix) - slen = len(suffix) - with repo.lock(): - for fn, b, size in repo.store.datafiles(): - if size != 0 and fn[-slen:] == suffix and fn[:plen] == prefix: - res.append(fn[plen:-slen]) + res = set() + for rev in repo: + ctx = repo[rev] + res |= set(ctx.files()) + ui.pager('manifest') - for f in res: + for f in sorted(res): fm.startitem() fm.write("path", '%s\n', f) fm.end()
--- a/tests/test-convert-git.t Wed Apr 04 21:09:47 2018 -0700 +++ b/tests/test-convert-git.t Wed Apr 04 21:27:02 2018 -0700 @@ -878,7 +878,7 @@ $ hg convert -q git-repo6 no-submodules --config convert.git.skipsubmodules=True $ hg -R no-submodules manifest --all - .gitmodules-renamed (no-reposimplestore !) + .gitmodules-renamed convert using a different remote prefix $ git init git-repo7