Mercurial > hg
view tests/test-contrib-perf.t @ 51681:522b4d729e89
mmap: populate the mapping by default
Without pre-population, accessing all data through a mmap can result in many
pagefault, reducing performance significantly. If the mmap is prepopulated, the
performance can no longer get slower than a full read.
(See benchmark number below)
In some cases were very few data is read, prepopulating can be overkill and
slower than populating on access (through page fault). So that behavior can be
controlled when the caller can pre-determine the best behavior.
(See benchmark number below)
In addition, testing with populating in a secondary thread yield great result
combining the best of each approach. This might be implemented in later
changesets.
In all cases, using mmap has a great effect on memory usage when many processes
run in parallel on the same machine.
### Benchmarks
# What did I run
A couple of month back I ran a large benchmark campaign to assess the impact of
various approach for using mmap with the revlog (and other files), it
highlighted a few benchmarks that capture the impact of the changes well. So to
validate this change I checked the following:
- log command displaying various revisions
(read the changelog index)
- log command displaying the patch of listed revisions
(read the changelog index, the manifest index and a few files indexes)
- unbundling a few revisions
(read and write changelog, manifest and few files indexes, and walk the graph
to update some cache)
- pushing a few revisions
(read and write changelog, manifest and few files indexes, walk the graph to
update some cache, performs various accesses locally and remotely during
discovery)
Benchmarks were run using the default module policy (c+py) and the rust one. No
significant difference were found between the two implementation, so we will
present result using the default policy (unless otherwise specified).
I ran them on a few repositories :
- mercurial: a "public changeset only" copy of mercurial from 2018-08-01 using
zstd compression and sparse-revlog
- pypy: a copy of pypy from 2018-08-01 using zstd compression and sparse-revlog
- netbeans: a copy of netbeans from 2018-08-01 using zstd compression and
sparse-revlog
- mozilla-try: a copy of mozilla-try from 2019-02-18 using zstd compression and
sparse-revlog
- mozilla-try persistent-nodemap: Same as the above but with a persistent
nodemap. Used for the log --patch benchmark only
# Results
For the smaller repositories (mercurial, pypy), the impact of mmap is almost
imperceptible, other cost dominating the operation. The impact of prepopulating
is undiscernible in the benchmark we ran.
For larger repositories the benchmark support explanation given above:
On netbeans, the log can be about 1% faster without repopulation (for a
difference < 100ms) but unbundle becomes a bit slower, even when small.
### data-env-vars.name = netbeans-2018-08-01-zstd-sparse-revlog
# benchmark.name = hg.command.unbundle
# benchmark.variants.issue6528 = disabled
# benchmark.variants.reuse-external-delta-parent = yes
# benchmark.variants.revs = any-1-extra-rev
# benchmark.variants.source = unbundle
# benchmark.variants.verbosity = quiet
with-populate: 0.240157
no-populate: 0.265087 (+10.38%, +0.02)
# benchmark.variants.revs = any-100-extra-rev
with-populate: 1.459518
no-populate: 1.481290 (+1.49%, +0.02)
## benchmark.name = hg.command.push
# benchmark.variants.explicit-rev = none
# benchmark.variants.issue6528 = disabled
# benchmark.variants.protocol = ssh
# benchmark.variants.reuse-external-delta-parent = yes
# benchmark.variants.revs = any-1-extra-rev
with-populate: 0.771919
no-populate: 0.792025 (+2.60%, +0.02)
# benchmark.variants.revs = any-100-extra-rev
with-populate: 1.459518
no-populate: 1.481290 (+1.49%, +0.02)
For mozilla-try, the "slow down" from pre-populate for small `hg log` is more
visible, but still small in absolute time. (using rust value for the persistent
nodemap value to be relevant).
### data-env-vars.name = mozilla-try-2019-02-18-ds2-pnm
# benchmark.name = hg.command.log
# bin-env-vars.hg.flavor = rust
# benchmark.variants.patch = yes
# benchmark.variants.limit-rev = 1
with-populate: 0.237813
no-populate: 0.229452 (-3.52%, -0.01)
# benchmark.variants.limit-rev = 10
# benchmark.variants.patch = yes
with-populate: 1.213578
no-populate: 1.205189
### data-env-vars.name = mozilla-try-2019-02-18-zstd-sparse-revlog
# benchmark.variants.limit-rev = 1000
# benchmark.variants.patch = no
# benchmark.variants.rev = tip
with-populate: 0.198607
no-populate: 0.195038 (-1.80%, -0.00)
However pre-populating provide a significant boost on more complex operations
like unbundle or push:
### data-env-vars.name = mozilla-try-2019-02-18-zstd-sparse-revlog
# benchmark.name = hg.command.push
# benchmark.variants.explicit-rev = none
# benchmark.variants.issue6528 = disabled
# benchmark.variants.protocol = ssh
# benchmark.variants.reuse-external-delta-parent = yes
# benchmark.variants.revs = any-1-extra-rev
with-populate: 4.798632
no-populate: 4.953295 (+3.22%, +0.15)
# benchmark.variants.revs = any-100-extra-rev
with-populate: 4.903618
no-populate: 5.014963 (+2.27%, +0.11)
## benchmark.name = hg.command.unbundle
# benchmark.variants.revs = any-1-extra-rev
with-populate: 1.423411
no-populate: 1.585365 (+11.38%, +0.16)
# benchmark.variants.revs = any-100-extra-rev
with-populate: 1.537909
no-populate: 1.688489 (+9.79%, +0.15)
author | Pierre-Yves David <pierre-yves.david@octobus.net> |
---|---|
date | Thu, 11 Apr 2024 00:02:07 +0200 |
parents | 90ef3e042e10 |
children | 7346f93be7a4 |
line wrap: on
line source
#require test-repo Set vars: $ . "$TESTDIR/helpers-testrepo.sh" $ CONTRIBDIR="$TESTDIR/../contrib" Prepare repo: $ hg init $ echo this is file a > a $ hg add a $ hg commit -m first $ echo adding to file a >> a $ hg commit -m second $ echo adding more to file a >> a $ hg commit -m third $ hg up -r 0 1 files updated, 0 files merged, 0 files removed, 0 files unresolved $ echo merge-this >> a $ hg commit -m merge-able created new head $ hg up -r 2 1 files updated, 0 files merged, 0 files removed, 0 files unresolved perfstatus $ cat >> $HGRCPATH << EOF > [extensions] > perf=$CONTRIBDIR/perf.py > [perf] > presleep=0 > stub=on > parentscount=1 > EOF $ hg help -e perf perf extension - helper extension to measure performance Configurations ============== "perf" ------ "all-timing" When set, additional statistics will be reported for each benchmark: best, worst, median average. If not set only the best timing is reported (default: off). "presleep" number of second to wait before any group of runs (default: 1) "pre-run" number of run to perform before starting measurement. "profile-benchmark" Enable profiling for the benchmarked section. (by default, the first iteration is benchmarked) "profiled-runs" list of iteration to profile (starting from 0) "run-limits" Control the number of runs each benchmark will perform. The option value should be a list of '<time>-<numberofrun>' pairs. After each run the conditions are considered in order with the following logic: If benchmark has been running for <time> seconds, and we have performed <numberofrun> iterations, stop the benchmark, The default value is: '3.0-100, 10.0-3' "stub" When set, benchmarks will only be run once, useful for testing (default: off) list of commands: perf::addremove (no help text available) perf::ancestors (no help text available) perf::ancestorset (no help text available) perf::annotate (no help text available) perf::bdiff benchmark a bdiff between revisions perf::bookmarks benchmark parsing bookmarks from disk to memory perf::branchmap benchmark the update of a branchmap perf::branchmapload benchmark reading the branchmap perf::branchmapupdate benchmark branchmap update from for <base> revs to <target> revs perf::bundle benchmark the creation of a bundle from a repository perf::bundleread Benchmark reading of bundle files. perf::cca (no help text available) perf::changegroupchangelog Benchmark producing a changelog group for a changegroup. perf::changeset (no help text available) perf::ctxfiles (no help text available) perf::delta-find benchmark the process of finding a valid delta for a revlog revision perf::diffwd Profile diff of working directory changes perf::dirfoldmap benchmap a 'dirstate._map.dirfoldmap.get()' request perf::dirs (no help text available) perf::dirstate benchmap the time of various distate operations perf::dirstatedirs benchmap a 'dirstate.hasdir' call from an empty 'dirs' cache perf::dirstatefoldmap benchmap a 'dirstate._map.filefoldmap.get()' request perf::dirstatewrite benchmap the time it take to write a dirstate on disk perf::discovery benchmark discovery between local repo and the peer at given path perf::fncacheencode (no help text available) perf::fncacheload (no help text available) perf::fncachewrite (no help text available) perf::heads benchmark the computation of a changelog heads perf::helper-mergecopies find statistics about potential parameters for 'perfmergecopies' perf::helper-pathcopies find statistic about potential parameters for the 'perftracecopies' perf::ignore benchmark operation related to computing ignore perf::index benchmark index creation time followed by a lookup perf::linelogedits (no help text available) perf::loadmarkers benchmark the time to parse the on-disk markers for a repo perf::log (no help text available) perf::lookup (no help text available) perf::lrucachedict (no help text available) perf::manifest benchmark the time to read a manifest from disk and return a usable perf::mergecalculate (no help text available) perf::mergecopies measure runtime of 'copies.mergecopies' perf::moonwalk benchmark walking the changelog backwards perf::nodelookup (no help text available) perf::nodemap benchmark the time necessary to look up revision from a cold nodemap perf::parents benchmark the time necessary to fetch one changeset's parents. perf::pathcopies benchmark the copy tracing logic perf::phases benchmark phasesets computation perf::phasesremote benchmark time needed to analyse phases of the remote server perf::progress printing of progress bars perf::rawfiles (no help text available) perf::revlogchunks Benchmark operations on revlog chunks. perf::revlogindex Benchmark operations against a revlog index. perf::revlogrevision Benchmark obtaining a revlog revision. perf::revlogrevisions Benchmark reading a series of revisions from a revlog. perf::revlogwrite Benchmark writing a series of revisions to a revlog. perf::revrange (no help text available) perf::revset benchmark the execution time of a revset perf::startup (no help text available) perf::status benchmark the performance of a single status call perf::stream-consume benchmark the full application of a stream clone perf::stream-generate benchmark the full generation of a stream clone perf::stream-locked-section benchmark the initial, repo-locked, section of a stream-clone perf::tags Benchmark tags retrieval in various situation perf::templating test the rendering time of a given template perf::unbundle benchmark application of a bundle in a repository. perf::unidiff benchmark a unified diff between revisions perf::volatilesets benchmark the computation of various volatile set perf::walk (no help text available) perf::write microbenchmark ui.write (and others) (use 'hg help -v perf' to show built-in aliases and global options) $ hg help perfaddremove hg perf::addremove aliases: perfaddremove (no help text available) options: -T --template TEMPLATE display with template (some details hidden, use --verbose to show complete help) $ hg perfaddremove $ hg perfancestors $ hg perfancestorset 2 $ hg perfannotate a $ hg perfbdiff -c 1 $ hg perfbdiff --alldata 1 $ hg perfunidiff -c 1 $ hg perfunidiff --alldata 1 $ hg perfbookmarks $ hg perfbranchmap $ hg perfbranchmapload $ hg perfbranchmapupdate --base "not tip" --target "tip" benchmark of branchmap with 3 revisions with 1 new ones $ hg perfcca $ hg perfchangegroupchangelog $ hg perfchangegroupchangelog --cgversion 01 $ hg perfchangeset 2 $ hg perfctxfiles 2 $ hg perfdiffwd $ hg perfdirfoldmap $ hg perfdirs $ hg perfdirstate $ hg perfdirstate --contains $ hg perfdirstate --iteration $ hg perfdirstatedirs $ hg perfdirstatefoldmap $ hg perfdirstatewrite #if repofncache $ hg perffncacheencode $ hg perffncacheload $ hg debugrebuildfncache fncache already up to date $ hg perffncachewrite $ hg debugrebuildfncache fncache already up to date #endif $ hg perfheads $ hg perfignore $ hg perfindex $ hg perflinelogedits -n 1 $ hg perfloadmarkers $ hg perflog $ hg perflookup 2 $ hg perflrucache $ hg perfmanifest 2 $ hg perfmanifest -m 44fe2c8352bb3a478ffd7d8350bbc721920134d1 $ hg perfmanifest -m 44fe2c8352bb abort: manifest revision must be integer or full node [255] $ hg perfmergecalculate -r 3 $ hg perfmoonwalk $ hg perfnodelookup 2 $ hg perfpathcopies 1 2 $ hg perfprogress --total 1000 $ hg perfrawfiles 2 $ hg perfrevlogindex -c #if reporevlogstore $ hg perfrevlogrevisions .hg/store/data/a.i #endif $ hg perfrevlogrevision -m 0 $ hg perfrevlogchunks -c $ hg perfrevrange $ hg perfrevset 'all()' $ hg perfstartup $ hg perfstatus $ hg perfstatus --dirstate $ hg perftags $ hg perftemplating $ hg perfvolatilesets $ hg perfwalk $ hg perfparents $ hg perfdiscovery -q . $ hg perf::phases Test run control ---------------- Simple single entry $ hg perfparents --config perf.stub=no --config perf.run-limits='0.000000001-15' ! wall * comb * user * sys * (best of 15) (glob) ! wall * comb * user * sys * (max of 15) (glob) ! wall * comb * user * sys * (avg of 15) (glob) ! wall * comb * user * sys * (median of 15) (glob) Multiple entries $ hg perfparents --config perf.stub=no --config perf.run-limits='500000-1, 0.000000001-50' ! wall * comb * user * sys * (best of 50) (glob) ! wall * comb * user * sys * (max of 50) (glob) ! wall * comb * user * sys * (avg of 50) (glob) ! wall * comb * user * sys * (median of 50) (glob) error case are ignored $ hg perfparents --config perf.stub=no --config perf.run-limits='500, 0.000000001-50' malformatted run limit entry, missing "-": 500 ! wall * comb * user * sys * (best of 50) (glob) ! wall * comb * user * sys * (max of 50) (glob) ! wall * comb * user * sys * (avg of 50) (glob) ! wall * comb * user * sys * (median of 50) (glob) $ hg perfparents --config perf.stub=no --config perf.run-limits='aaa-120, 0.000000001-50' malformatted run limit entry, could not convert string to float: 'aaa': aaa-120 ! wall * comb * user * sys * (best of 50) (glob) ! wall * comb * user * sys * (max of 50) (glob) ! wall * comb * user * sys * (avg of 50) (glob) ! wall * comb * user * sys * (median of 50) (glob) $ hg perfparents --config perf.stub=no --config perf.run-limits='120-aaaaaa, 0.000000001-50' malformatted run limit entry, invalid literal for int() with base 10: 'aaaaaa': 120-aaaaaa ! wall * comb * user * sys * (best of 50) (glob) ! wall * comb * user * sys * (max of 50) (glob) ! wall * comb * user * sys * (avg of 50) (glob) ! wall * comb * user * sys * (median of 50) (glob) test actual output ------------------ normal output: $ hg perfheads --config perf.stub=no ! wall * comb * user * sys * (best of *) (glob) ! wall * comb * user * sys * (max of *) (glob) ! wall * comb * user * sys * (avg of *) (glob) ! wall * comb * user * sys * (median of *) (glob) detailed output: $ hg perfheads --config perf.all-timing=yes --config perf.stub=no ! wall * comb * user * sys * (best of *) (glob) ! wall * comb * user * sys * (max of *) (glob) ! wall * comb * user * sys * (avg of *) (glob) ! wall * comb * user * sys * (median of *) (glob) test json output ---------------- normal output: $ hg perfheads --template json --config perf.stub=no [ { "avg.comb": *, (glob) "avg.count": *, (glob) "avg.sys": *, (glob) "avg.user": *, (glob) "avg.wall": *, (glob) "comb": *, (glob) "count": *, (glob) "max.comb": *, (glob) "max.count": *, (glob) "max.sys": *, (glob) "max.user": *, (glob) "max.wall": *, (glob) "median.comb": *, (glob) "median.count": *, (glob) "median.sys": *, (glob) "median.user": *, (glob) "median.wall": *, (glob) "sys": *, (glob) "user": *, (glob) "wall": * (glob) } ] detailed output: $ hg perfheads --template json --config perf.all-timing=yes --config perf.stub=no [ { "avg.comb": *, (glob) "avg.count": *, (glob) "avg.sys": *, (glob) "avg.user": *, (glob) "avg.wall": *, (glob) "comb": *, (glob) "count": *, (glob) "max.comb": *, (glob) "max.count": *, (glob) "max.sys": *, (glob) "max.user": *, (glob) "max.wall": *, (glob) "median.comb": *, (glob) "median.count": *, (glob) "median.sys": *, (glob) "median.user": *, (glob) "median.wall": *, (glob) "sys": *, (glob) "user": *, (glob) "wall": * (glob) } ] Test pre-run feature -------------------- (perf discovery has some spurious output) $ hg perfdiscovery . --config perf.stub=no --config perf.run-limits='0.000000001-1' --config perf.pre-run=0 ! wall * comb * user * sys * (best of 1) (glob) ! wall * comb * user * sys * (max of 1) (glob) ! wall * comb * user * sys * (avg of 1) (glob) ! wall * comb * user * sys * (median of 1) (glob) searching for changes $ hg perfdiscovery . --config perf.stub=no --config perf.run-limits='0.000000001-1' --config perf.pre-run=1 ! wall * comb * user * sys * (best of 1) (glob) ! wall * comb * user * sys * (max of 1) (glob) ! wall * comb * user * sys * (avg of 1) (glob) ! wall * comb * user * sys * (median of 1) (glob) searching for changes searching for changes $ hg perfdiscovery . --config perf.stub=no --config perf.run-limits='0.000000001-1' --config perf.pre-run=3 ! wall * comb * user * sys * (best of 1) (glob) ! wall * comb * user * sys * (max of 1) (glob) ! wall * comb * user * sys * (avg of 1) (glob) ! wall * comb * user * sys * (median of 1) (glob) searching for changes searching for changes searching for changes searching for changes $ hg perf::bundle 'last(all(), 5)' $ hg bundle --exact --rev 'last(all(), 5)' last-5.hg 4 changesets found $ hg perf::unbundle last-5.hg test profile-benchmark option ------------------------------ Function to check that statprof ran $ statprofran () { > grep -E 'Sample count:|No samples recorded' > /dev/null > } $ hg perfdiscovery . --config perf.stub=no --config perf.run-limits='0.000000001-1' --config perf.profile-benchmark=yes 2>&1 | statprofran Check perf.py for historical portability ---------------------------------------- $ cd "$TESTDIR/.." $ (testrepohg files -r 1.2 glob:mercurial/*.c glob:mercurial/*.py; > testrepohg files -r tip glob:mercurial/*.c glob:mercurial/*.py) | > "$TESTDIR"/check-perf-code.py contrib/perf.py contrib/perf.py:\d+: (re) > from mercurial import ( import newer module separately in try clause for early Mercurial contrib/perf.py:\d+: (re) > from mercurial import ( import newer module separately in try clause for early Mercurial contrib/perf.py:\d+: (re) > origindexpath = orig.opener.join(indexfile) use getvfs()/getsvfs() for early Mercurial contrib/perf.py:\d+: (re) > origdatapath = orig.opener.join(datafile) use getvfs()/getsvfs() for early Mercurial contrib/perf.py:\d+: (re) > vfs = vfsmod.vfs(tmpdir) use getvfs()/getsvfs() for early Mercurial contrib/perf.py:\d+: (re) > vfs.options = getattr(orig.opener, 'options', None) use getvfs()/getsvfs() for early Mercurial [1]