bundle2: increase payload part chunk size to 32kb
Bundle2 payload parts are framed chunks. Esentially, we obtain
data in equal size chunks of size `preferedchunksize` and emit those
to a generator. That generator is fed into a compressor (which can
be the no-op compressor, which just re-emits the generator). And
the output from the compressor likely goes to a file descriptor
or socket.
What this means is that small chunk sizes create more Python objects
and Python function calls than larger chunk sizes. And as we know,
Python object and function call overhead in performance sensitive
code matters (at least with CPython).
This commit increases the bundle2 part payload chunk size from 4k
to 32k. Practically speaking, this means that the chunks we feed
into a compressor (implemented in C code) or feed directly into a
file handle or socket write() are larger. It's possible the chunks
might be larger than what the receiver can handle in one logical
operation. But at that point, we're in C code, which is much more
efficient at dealing with splitting up the chunk and making multiple
function calls than Python is.
A downside to larger chunks is that the receiver has to wait for that
much data to arrive (either raw or from a decompressor) before it
can process the chunk. But 32kb still feels like a small buffer to
have to wait for. And in many cases, the client will convert from
8 read(4096) to 1 read(32768). That's happening in Python land. So
we cut down on the number of Python objects and function calls,
making the client faster as well. I don't think there are any
significant concerns to increasing the payload chunk size to 32kb.
The impact of this change on performance significant. Using `curl`
to obtain a stream clone bundle2 payload from a server on localhost
serving the mozilla-unified repository:
before: 20.78 user; 7.71 system; 80.5 MB/s
after: 13.90 user; 3.51 system; 132 MB/s
legacy: 9.72 user; 8.16 system; 132 MB/s
bundle2 stream clone generation is still more resource intensive than
legacy stream clone (that's likely because of the use of a
util.chunkbuffer). But the throughput is the same. We might
be in territory we're this is effectively a benchmark of the
networking stack or Python's syscall throughput.
From the client perspective, `hg clone -U --stream`:
before: 33.50 user; 7.95 system; 53.3 MB/s
after: 22.82 user; 7.33 system; 72.7 MB/s
legacy: 29.96 user; 7.94 system; 58.0 MB/s
And for `hg clone --stream` with a working directory update of
~230k files:
after: 119.55 user; 26.47 system; 0:57.08 wall
legacy: 126.98 user; 26.94 system; 1:05.56 wall
So, it appears that bundle2's stream clone is now definitively faster
than legacy stream clone!
Differential Revision: https://phab.mercurial-scm.org/D1932
This test tries to exercise the ssh functionality with a dummy script
(enable general delta early)
$ cat << EOF >> $HGRCPATH
> [format]
> usegeneraldelta=yes
> EOF
$ checknewrepo()
> {
> name=$1
> if [ -d "$name"/.hg/store ]; then
> echo store created
> fi
> if [ -f "$name"/.hg/00changelog.i ]; then
> echo 00changelog.i created
> fi
> cat "$name"/.hg/requires
> }
creating 'local'
$ hg init local
$ checknewrepo local
store created
00changelog.i created
dotencode
fncache
generaldelta
revlogv1
store
$ echo this > local/foo
$ hg ci --cwd local -A -m "init"
adding foo
test custom revlog chunk cache sizes
$ hg --config format.chunkcachesize=0 log -R local -pv
abort: revlog chunk cache size 0 is not greater than 0!
[255]
$ hg --config format.chunkcachesize=1023 log -R local -pv
abort: revlog chunk cache size 1023 is not a power of 2!
[255]
$ hg --config format.chunkcachesize=1024 log -R local -pv
changeset: 0:08b9e9f63b32
tag: tip
user: test
date: Thu Jan 01 00:00:00 1970 +0000
files: foo
description:
init
diff -r 000000000000 -r 08b9e9f63b32 foo
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/foo Thu Jan 01 00:00:00 1970 +0000
@@ -0,0 +1,1 @@
+this
creating repo with format.usestore=false
$ hg --config format.usestore=false init old
$ checknewrepo old
generaldelta
revlogv1
creating repo with format.usefncache=false
$ hg --config format.usefncache=false init old2
$ checknewrepo old2
store created
00changelog.i created
generaldelta
revlogv1
store
creating repo with format.dotencode=false
$ hg --config format.dotencode=false init old3
$ checknewrepo old3
store created
00changelog.i created
fncache
generaldelta
revlogv1
store
creating repo with format.dotencode=false
$ hg --config format.generaldelta=false --config format.usegeneraldelta=false init old4
$ checknewrepo old4
store created
00changelog.i created
dotencode
fncache
revlogv1
store
test failure
$ hg init local
abort: repository local already exists!
[255]
init+push to remote2
$ hg init -e "\"$PYTHON\" \"$TESTDIR/dummyssh\"" ssh://user@dummy/remote2
$ hg incoming -R remote2 local
comparing with local
changeset: 0:08b9e9f63b32
tag: tip
user: test
date: Thu Jan 01 00:00:00 1970 +0000
summary: init
$ hg push -R local -e "\"$PYTHON\" \"$TESTDIR/dummyssh\"" ssh://user@dummy/remote2
pushing to ssh://user@dummy/remote2
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files
clone to remote1
$ hg clone -e "\"$PYTHON\" \"$TESTDIR/dummyssh\"" local ssh://user@dummy/remote1
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files
The largefiles extension doesn't crash
$ hg clone -e "\"$PYTHON\" \"$TESTDIR/dummyssh\"" local ssh://user@dummy/remotelf --config extensions.largefiles=
The fsmonitor extension is incompatible with the largefiles extension and has been disabled. (fsmonitor !)
The fsmonitor extension is incompatible with the largefiles extension and has been disabled. (fsmonitor !)
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files
init to existing repo
$ hg init -e "\"$PYTHON\" \"$TESTDIR/dummyssh\"" ssh://user@dummy/remote1
abort: repository remote1 already exists!
abort: could not create remote repo!
[255]
clone to existing repo
$ hg clone -e "\"$PYTHON\" \"$TESTDIR/dummyssh\"" local ssh://user@dummy/remote1
abort: repository remote1 already exists!
abort: could not create remote repo!
[255]
output of dummyssh
$ cat dummylog
Got arguments 1:user@dummy 2:hg init remote2
Got arguments 1:user@dummy 2:hg -R remote2 serve --stdio
Got arguments 1:user@dummy 2:hg -R remote2 serve --stdio
Got arguments 1:user@dummy 2:hg init remote1
Got arguments 1:user@dummy 2:hg -R remote1 serve --stdio
Got arguments 1:user@dummy 2:hg init remotelf
Got arguments 1:user@dummy 2:hg -R remotelf serve --stdio
Got arguments 1:user@dummy 2:hg init remote1
Got arguments 1:user@dummy 2:hg init remote1
comparing repositories
$ hg tip -q -R local
0:08b9e9f63b32
$ hg tip -q -R remote1
0:08b9e9f63b32
$ hg tip -q -R remote2
0:08b9e9f63b32
check names for repositories (clashes with URL schemes, special chars)
$ for i in bundle file hg http https old-http ssh static-http "with space"; do
> printf "hg init \"$i\"... "
> hg init "$i"
> test -d "$i" -a -d "$i/.hg" && echo "ok" || echo "failed"
> done
hg init "bundle"... ok
hg init "file"... ok
hg init "hg"... ok
hg init "http"... ok
hg init "https"... ok
hg init "old-http"... ok
hg init "ssh"... ok
hg init "static-http"... ok
hg init "with space"... ok
#if eol-in-paths
/* " " is not a valid name for a directory on Windows */
$ hg init " "
$ test -d " "
$ test -d " /.hg"
#endif
creating 'local/sub/repo'
$ hg init local/sub/repo
$ checknewrepo local/sub/repo
store created
00changelog.i created
dotencode
fncache
generaldelta
revlogv1
store
prepare test of init of url configured from paths
$ echo '[paths]' >> $HGRCPATH
$ echo "somewhere = `pwd`/url from paths" >> $HGRCPATH
$ echo "elsewhere = `pwd`/another paths url" >> $HGRCPATH
init should (for consistency with clone) expand the url
$ hg init somewhere
$ checknewrepo "url from paths"
store created
00changelog.i created
dotencode
fncache
generaldelta
revlogv1
store
verify that clone also expand urls
$ hg clone somewhere elsewhere
updating to branch default
0 files updated, 0 files merged, 0 files removed, 0 files unresolved
$ checknewrepo "another paths url"
store created
00changelog.i created
dotencode
fncache
generaldelta
revlogv1
store
clone bookmarks
$ hg -R local bookmark test
$ hg -R local bookmarks
* test 0:08b9e9f63b32
$ hg clone -e "\"$PYTHON\" \"$TESTDIR/dummyssh\"" local ssh://user@dummy/remote-bookmarks
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files
exporting bookmark test
$ hg -R remote-bookmarks bookmarks
test 0:08b9e9f63b32