Gregory Szorc <gregory.szorc@gmail.com> [Mon, 13 Nov 2017 21:54:46 -0800] rev 35119
bundle2: inline struct operations
Before, we were calling struct.unpack() (via an alias) on every
loop iteration. I'm not sure what Python does under the hood, but
it would have to look at the struct format and determine what to
do.
This commit establishes a struct.Struct instance and reuses it for
struct reading.
We can see the impact from running `hg perfbundleread` on a Firefox
bundle:
! read(8k)
! wall 0.679730 comb 0.680000 user 0.140000 sys 0.540000 (best of 15)
! read(16k)
! wall 0.577228 comb 0.570000 user 0.080000 sys 0.490000 (best of 17)
! read(32k)
! wall 0.516060 comb 0.520000 user 0.040000 sys 0.480000 (best of 20)
! read(128k)
! wall 0.496378 comb 0.490000 user 0.010000 sys 0.480000 (best of 20)
! bundle2 iterparts()
! wall 3.056811 comb 3.050000 user 2.340000 sys 0.710000 (best of 4)
! wall 2.992605 comb 2.990000 user 2.260000 sys 0.730000 (best of 4)
! bundle2 iterparts() seekable
! wall 4.007676 comb 4.000000 user 3.170000 sys 0.830000 (best of 3)
! wall 3.863810 comb 3.860000 user 3.000000 sys 0.860000 (best of 3)
! bundle2 part seek()
! wall 6.267110 comb 6.250000 user 3.480000 sys 2.770000 (best of 3)
! wall 6.213387 comb 6.200000 user 3.350000 sys 2.850000 (best of 3)
! bundle2 part read(8k)
! wall 3.404164 comb 3.400000 user 2.650000 sys 0.750000 (best of 3)
! wall 3.241099 comb 3.250000 user 2.560000 sys 0.690000 (best of 3)
! bundle2 part read(16k)
! wall 3.197972 comb 3.200000 user 2.490000 sys 0.710000 (best of 4)
! wall 3.003930 comb 3.000000 user 2.270000 sys 0.730000 (best of 4)
! bundle2 part read(32k)
! wall 3.060557 comb 3.060000 user 2.340000 sys 0.720000 (best of 4)
! wall 2.904695 comb 2.900000 user 2.160000 sys 0.740000 (best of 4)
! bundle2 part read(128k)
! wall 2.952209 comb 2.950000 user 2.230000 sys 0.720000 (best of 4)
! wall 2.776140 comb 2.780000 user 2.070000 sys 0.710000 (best of 4)
Profiling now says most remaining time is spent in util.chunkbuffer.
I already heavily optimized that data structure several releases ago.
So we'll likely get little more performance out of bundle2 reading
while still retaining util.chunkbuffer().
Differential Revision: https://phab.mercurial-scm.org/D1393
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 13 Nov 2017 21:48:35 -0800] rev 35118
bundle2: inline changegroup.readexactly()
Profiling reveals this loop is pretty tight. Literally any
function call elimination can make a big difference.
This commit inlines the relatively trivial changegroup.readexactly()
method inside the loop.
The results with `hg perfbundleread` on a bundle of the Firefox repo
speak for themselves:
! read(8k)
! wall 0.679730 comb 0.680000 user 0.140000 sys 0.540000 (best of 15)
! read(16k)
! wall 0.577228 comb 0.570000 user 0.080000 sys 0.490000 (best of 17)
! read(32k)
! wall 0.516060 comb 0.520000 user 0.040000 sys 0.480000 (best of 20)
! read(128k)
! wall 0.496378 comb 0.490000 user 0.010000 sys 0.480000 (best of 20)
! bundle2 iterparts()
! wall 3.460903 comb 3.460000 user 2.760000 sys 0.700000 (best of 3)
! wall 3.056811 comb 3.050000 user 2.340000 sys 0.710000 (best of 4)
! bundle2 iterparts() seekable
! wall 4.312722 comb 4.310000 user 3.480000 sys 0.830000 (best of 3)
! wall 4.007676 comb 4.000000 user 3.170000 sys 0.830000 (best of 3)
! bundle2 part seek()
! wall 6.754764 comb 6.740000 user 3.970000 sys 2.770000 (best of 3)
! wall 6.267110 comb 6.250000 user 3.480000 sys 2.770000 (best of 3)
! bundle2 part read(8k)
! wall 3.668004 comb 3.660000 user 2.960000 sys 0.700000 (best of 3)
! wall 3.404164 comb 3.400000 user 2.650000 sys 0.750000 (best of 3)
! bundle2 part read(16k)
! wall 3.489196 comb 3.480000 user 2.750000 sys 0.730000 (best of 3)
! wall 3.197972 comb 3.200000 user 2.490000 sys 0.710000 (best of 4)
! bundle2 part read(32k)
! wall 3.388569 comb 3.380000 user 2.640000 sys 0.740000 (best of 3)
! wall 3.060557 comb 3.060000 user 2.340000 sys 0.720000 (best of 4)
! bundle2 part read(128k)
! wall 3.276415 comb 3.270000 user 2.560000 sys 0.710000 (best of 4)
! wall 2.952209 comb 2.950000 user 2.230000 sys 0.720000 (best of 4)
Differential Revision: https://phab.mercurial-scm.org/D1392