Matt Harbison <matt_harbison@yahoo.com> [Sat, 10 Jan 2015 12:00:03 -0500] rev 23819
branchmap: add seek() to end of file before calling tell() on append open()
This is similar to
48c232873a54, which was subsequently modified in
19f5dec2d61f
for 2.4. Unexpected test changes on Windows occurred without this.
Matt Harbison <matt_harbison@yahoo.com> [Fri, 09 Jan 2015 22:14:01 -0500] rev 23818
tests: fix test-casefolding.t output for branchcache
This belongs with
7d63398fbfd1. I assume that the failure to read is OK,
because there is similar output in test-convert-svn-encoding.t.
Pierre-Yves David <pierre-yves.david@fb.com> [Tue, 06 Jan 2015 17:19:21 -0800] rev 23817
setdiscovery: remove '_setupsample' function
It is now unused.
Pierre-Yves David <pierre-yves.david@fb.com> [Wed, 07 Jan 2015 20:44:20 -0800] rev 23816
setdiscovery: document '_takequicksample'
Pierre-Yves David <pierre-yves.david@fb.com> [Tue, 06 Jan 2015 17:07:44 -0800] rev 23815
setdiscovery: drop '_setupsample' usage in '_takequicksample'
For '_takefullsample' we can just retrieve the list of head directly and
ignore the rest of the complex return values. This was the last call to the
infamous '_updatesample' function.
Pierre-Yves David <pierre-yves.david@fb.com> [Wed, 07 Jan 2015 10:32:17 -0800] rev 23814
setdiscovery: drop the 'always' argument to '_updatesample'
This argument exists because of the complex code flow in '_takequicksample'. It
first gets the list of heads and then calls '_updatesample' on an empty initial
sample and a size limit matching the differences between the number of heads and
the target sample size. Finally the heads and the sample from '_updatesample'
were added. To ensure this addition result had the exact target length, the code
had to ensure no elements from the heads were added to the '_updatesample'
content and therefore was passing this "always included set of heads".
Instead we can just update the initial heads sample directly and use the final
target size as target size for the update.
This removes the need for this 'always' parameter to the '_updatesample' function
The test are affected because different set building order results in different
random sampling.
Pierre-Yves David <pierre-yves.david@fb.com> [Wed, 07 Jan 2015 17:28:51 -0800] rev 23813
setdiscovery: always add exponential sample to the heads
As explained in a previous changeset, prioritizing heads too much behaves
pathologically when there are more heads than the sample size. To counter this,
we always inject exponential samples before reducing to the sample size limit.
This already show some benefit in the test themselves, but on a real-world example
this moves my discovery for push to pathologically headed repo from 45 rounds to
17 of them.
We should maybe ensure that at least 25% of the result sample is heads, but I
think the random sampling will be fine in practice.
Pierre-Yves David <pierre-yves.david@fb.com> [Wed, 07 Jan 2015 17:23:21 -0800] rev 23812
setdiscovery: directly run '_updatesample'
The heads and exponential sample are going to end up in the same set
before any extra processing happens. We simplify the code by directly
updating a set with heads.
Changes in the order the set is built lead to small changes in the random
sampling output. But after double checking, I can confirm the input data to
the random sampling is consistent.
Pierre-Yves David <pierre-yves.david@fb.com> [Wed, 07 Jan 2015 17:17:56 -0800] rev 23811
setdiscovery: stop using '_setupsample' in '_takefullsample'
Very few of the return values of '_setupsample' remain in use, so we
directly retrieve the value we care about and drop the '_setupsample'
call.
Pierre-Yves David <pierre-yves.david@fb.com> [Wed, 07 Jan 2015 12:09:51 -0800] rev 23810
setdiscovery: randomly pick between heads and sample when taking full sample
Before this changeset, the discovery protocol was too heads-centric. Heads of the
undiscovered set were always sent for discovery and any room remaining in the
sample were filled with exponential samples (and random ones if any room
remained).
This behaved extremely poorly when the number of heads exceeded the sample size,
because we keep just asking about the existence of heads, then their direct parent
and so on. As a result, the 'O(log(len(repo)))' discovery turns into a
'O(len(repo))' one. As a solution we take a random sample of the heads plus
exponential samples. This way we ensure some exponential sampling is achieved,
bringing back some logarithmic convergence of the discovery again.
This patch only applies this principle in one place. More places will be updated
in future patches.
One test is impacted because the random sample happen to be different. By
chance, it helps a bit in this case.
Pierre-Yves David <pierre-yves.david@fb.com> [Tue, 06 Jan 2015 17:02:32 -0800] rev 23809
setdiscovery: document the '_updatesample' function
This function is central in the sample building process, having it documented
help code readability a lot.
Pierre-Yves David <pierre-yves.david@fb.com> [Tue, 06 Jan 2015 16:40:33 -0800] rev 23808
setdiscovery: avoid calling any sample building if the undecided set is small
If the length of undecided is smaller than the sample size, we can just request
information for all of them.
This conditional was previously handled by '_setupsample'. But '_setupsample' is
in my opinion a problematic function with blurry semantics. Having this
conditional explicitly earlier makes the code more explicit and moves us closer
to removing this '_setupsample' function.