wireproto: compress data from a generator
Currently, the "getbundle" wire protocol command obtains a generator of
data, converts it to a util.chunkbuffer, then converts it back to a
generator via the protocol's groupchunks() implementation. For the SSH
protocol, groupchunks() simply reads 4kb chunks then write()s the
data to a file descriptor. For the HTTP protocol, groupchunks() reads
32kb chunks, feeds those into a zlib compressor, emits compressed data
as it is available, and that is sent to the WSGI layer, where it is
likely turned into HTTP chunked transfer chunks as is or further
buffered and turned into a larger chunk.
For both the SSH and HTTP protocols, there is inefficiency from using
util.chunkbuffer.
For SSH, emitting consistent 4kb chunks sounds nice. However, the file
descriptor it is writing to is almost certainly buffered. That means
that a Python .write() probably doesn't translate into exactly what is
written to the I/O layer.
For HTTP, we're going through an intermediate layer to zlib compress
data. So all util.chunkbuffer is doing is ensuring that the chunks we
feed into the zlib compressor are of uniform size. This means more CPU
time in Python buffering and emitting chunks in util.chunkbuffer but
fewer function calls to zlib.
This patch introduces and implements a new wire protocol abstract
method: compresschunks(). It is like groupchunks() except it operates
on a generator instead of something with a .read(). The SSH
implementation simply proxies chunks. The HTTP implementation uses
zlib compression.
To avoid duplicate code, the HTTP groupchunks() has been reimplemented
in terms of compresschunks().
To prove this all works, the "getbundle" wire protocol command has been
switched to compresschunks(). This removes the util.chunkbuffer from
that command. Now, data essentially streams straight from the
changegroup emitter to the wire, possibly through a zlib compressor.
Generators all the way, baby.
There were slim to no performance changes on the server as measured
with the mozilla-central repository. This is likely because CPU
time is dominated by reading revlogs, producing the changegroup, and
zlib compressing the output stream. Still, this brings us a little
closer to our ideal of using generators everywhere.
$ hg init
$ echo a > a
$ hg commit -A -ma
adding a
$ echo b >> a
$ hg commit -mb
$ echo c >> a
$ hg commit -mc
$ hg up 1
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
$ echo d >> a
$ hg commit -md
created new head
$ hg up 1
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
$ echo e >> a
$ hg commit -me
created new head
$ hg up 1
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
Should fail because not at a head:
$ hg merge
abort: working directory not at a head revision
(use 'hg update' or merge with an explicit revision)
[255]
$ hg up
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
2 other heads for branch "default"
Should fail because > 2 heads:
$ HGMERGE=internal:other; export HGMERGE
$ hg merge
abort: branch 'default' has 3 heads - please merge with an explicit rev
(run 'hg heads .' to see heads)
[255]
Should succeed:
$ hg merge 2
0 files updated, 1 files merged, 0 files removed, 0 files unresolved
(branch merge, don't forget to commit)
$ hg commit -mm1
Should succeed - 2 heads:
$ hg merge -P
changeset: 3:ea9ff125ff88
parent: 1:1846eede8b68
user: test
date: Thu Jan 01 00:00:00 1970 +0000
summary: d
$ hg merge
0 files updated, 1 files merged, 0 files removed, 0 files unresolved
(branch merge, don't forget to commit)
$ hg commit -mm2
Should fail because at tip:
$ hg merge
abort: nothing to merge
[255]
$ hg up 0
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
Should fail because there is only one head:
$ hg merge
abort: nothing to merge
(use 'hg update' instead)
[255]
$ hg up 3
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
$ echo f >> a
$ hg branch foobranch
marked working directory as branch foobranch
(branches are permanent and global, did you want a bookmark?)
$ hg commit -mf
Should fail because merge with other branch:
$ hg merge
abort: branch 'foobranch' has one head - please merge with an explicit rev
(run 'hg heads' to see all heads)
[255]
Test for issue2043: ensure that 'merge -P' shows ancestors of 6 that
are not ancestors of 7, regardless of where their common ancestors are.
Merge preview not affected by common ancestor:
$ hg up -q 7
$ hg merge -q -P 6
2:2d95304fed5d
4:f25cbe84d8b3
5:a431fabd6039
6:e88e33f3bf62
Test experimental destination revset
$ hg log -r '_destmerge()'
abort: branch 'foobranch' has one head - please merge with an explicit rev
(run 'hg heads' to see all heads)
[255]
(on a branch with a two heads)
$ hg up 5
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
$ echo f >> a
$ hg commit -mf
created new head
$ hg log -r '_destmerge()'
changeset: 6:e88e33f3bf62
parent: 5:a431fabd6039
parent: 3:ea9ff125ff88
user: test
date: Thu Jan 01 00:00:00 1970 +0000
summary: m2
(from the other head)
$ hg log -r '_destmerge(e88e33f3bf62)'
changeset: 8:b613918999e2
tag: tip
parent: 5:a431fabd6039
user: test
date: Thu Jan 01 00:00:00 1970 +0000
summary: f
(from unrelated branch)
$ hg log -r '_destmerge(foobranch)'
abort: branch 'foobranch' has one head - please merge with an explicit rev
(run 'hg heads' to see all heads)
[255]