bdiff: give slight preference to appending lines
[This change could be folded into the previous changeset to minimize the repo
churn ...]
The general preference to matches in the middle of bdiff ranges helps getting
balanced recursion and efficient computation. But, as previous changes have
shown, it might also give diffs that seems "obviously wrong".
To mitigate that: If the best match on the A side starts at the beginning of
the bdiff range, don't aim for the middle-most B side match but for the
earliest.
This will make the matches balanced (by both sides being "early") even though
the bisection will be less balanced. Still, this case only apply if the *best*
and middle-most match was fully unbalanced on the A side. Each recursion will
thus even in this worst case reduce the problem significantly and we are not
re-introducing the problem that was fixed in
f1ca249696ed.
The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from
22806817 to
22807275 bytes - a 0.002% increase.
This make the recent test-bdiff.py changes give a more pretty output ... but
they no longer show that the recursion is around middle matches (because it in
these cases isn't).
bdiff: give slight preference to longest matches in the middle of the B side
We already have a slight preference for matches close to the middle on the A
side. Now, do the same on the B side.
j is iterating the b range backwards and we thus accept a new j if the previous
match was in the upper half.
This makes the test-bhalf diff "correct". It obviously also gives more
preference to balanced recursion than to appending to sequences. That is kind
of correct, but will also unfortunately make some bundles bigger. No doubt, we
can also create examples where it will make them smaller ...
The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from
22803824 to
22806817 bytes - an 0.01% increase.
bdiff: rearrange the "better longest match" code
This is primarily to make the code more managable and prepare for later
changes.
More specific assignments might also be slightly faster, even thought it also
might generate a bit more code.
bdiff: adjust criteria for getting optimal longest match in the A side middle
We prefer matches closer to the middle to balance recursion, as introduced in
f1ca249696ed.
For ranges with uneven length, matches starting exactly in the middle should
have preference. That will be optimal for matches of length 1. We will thus
accept equality in the half check.
For ranges with even length, half was ceil'ed when calculated but we got the
preference for low matches from the 'less than half' check. To get the same
result as before when we also accept equality, floor it. Without that,
test-annotate.t would show some different (still correct but less optimal)
results.
This will change the heuristics. Tests shows a slightly different output - and
sometimes slightly smaller bundles.
The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from
22804885 to
22803824 bytes - an 0.005% reduction.
tests: make test-bdiff.py easier to maintain
Add more stdout logging to help navigate the .out file.
perf: unbust perfbdiff --alldata
This broke in
f84fc6a92817 due to a refactored manifest API.
The fix is a bit hacky - perfbdiff doesn't yet support tree manifests
for example. But it gets the job done.
A test has been added for --alldata so this doesn't happen again.
worker: discard waited pid by anyone who noticed it first
This makes sure all waited pids are removed before calling killworkers()
even if waitpid()-pids.discard() sequence is interrupted by another SIGCHLD.
worker: kill workers after all zombie processes are reaped
Since we now wait child processes in non-blocking way (changed by
7bc25549e084
and
e8fb03cfbbde), we don't have to kill them in the middle of the waitpid()
loop. This change will help solving a possible race of waitpid()-pids.discard()
sequence and another SIGCHLD.
waitforworkers() is called by cleanup(), in which case we do killworkers()
beforehand so we can remove killworkers() from waitforworkers().
worker: make sure killworkers() never be interrupted by another SIGCHLD
killworkers() iterates over pids, which can be updated by SIGCHLD handler.
So we should either copy pids or prevent killworkers() from being interrupted
by SIGCHLD. I chose the latter as it is simpler and can make pids handling
more consistent.
This fixes a possible "set changed size during iteration" error at
killworkers() before cleanup().