annotate mercurial/setdiscovery.py @ 26610:f9f82c444ff7

filemerge: only copy to backup during premerge step The premerge might leave the original file in an unclean state. Therefore it's important to only copy the file in the beginning.
author Siddharth Agarwal <sid0@fb.com>
date Sun, 11 Oct 2015 20:04:40 -0700
parents 56b2bcea2529
children c3eacee01c7e
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
1 # setdiscovery.py - improved discovery of common nodeset for mercurial
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
2 #
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
3 # Copyright 2010 Benoit Boissinot <bboissin@gmail.com>
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
4 # and Peter Arrenbrecht <peter@arrenbrecht.ch>
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
5 #
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
6 # This software may be used and distributed according to the terms of the
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
7 # GNU General Public License version 2 or any later version.
20656
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
8 """
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
9 Algorithm works in the following way. You have two repository: local and
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
10 remote. They both contains a DAG of changelists.
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
11
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
12 The goal of the discovery protocol is to find one set of node *common*,
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
13 the set of nodes shared by local and remote.
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
14
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
15 One of the issue with the original protocol was latency, it could
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
16 potentially require lots of roundtrips to discover that the local repo was a
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
17 subset of remote (which is a very common case, you usually have few changes
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
18 compared to upstream, while upstream probably had lots of development).
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
19
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
20 The new protocol only requires one interface for the remote repo: `known()`,
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
21 which given a set of changelists tells you if they are present in the DAG.
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
22
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
23 The algorithm then works as follow:
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
24
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
25 - We will be using three sets, `common`, `missing`, `unknown`. Originally
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
26 all nodes are in `unknown`.
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
27 - Take a sample from `unknown`, call `remote.known(sample)`
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
28 - For each node that remote knows, move it and all its ancestors to `common`
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
29 - For each node that remote doesn't know, move it and all its descendants
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
30 to `missing`
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
31 - Iterate until `unknown` is empty
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
32
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
33 There are a couple optimizations, first is instead of starting with a random
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
34 sample of missing, start by sending all heads, in the case where the local
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
35 repo is a subset, you computed the answer in one round trip.
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
36
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
37 Then you can do something similar to the bisecting strategy used when
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
38 finding faulty changesets. Instead of random samples, you can try picking
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
39 nodes that will maximize the number of nodes that will be
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
40 classified with it (since all ancestors or descendants will be marked as well).
cdecbc5ab504 setdiscovery: document algorithms used
Olle Lundberg <geek@nerd.sh>
parents: 20034
diff changeset
41 """
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
42
25973
fb5664eb8414 setdiscovery: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 25914
diff changeset
43 from __future__ import absolute_import
fb5664eb8414 setdiscovery: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 25914
diff changeset
44
25113
0ca8410ea345 util: drop alias for collections.deque
Martin von Zweigbergk <martinvonz@google.com>
parents: 23817
diff changeset
45 import collections
20034
1e5b38a919dd cleanup: move stdlib imports to their own import statement
Augie Fackler <raf@durin42.com>
parents: 17426
diff changeset
46 import random
25973
fb5664eb8414 setdiscovery: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 25914
diff changeset
47
fb5664eb8414 setdiscovery: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 25914
diff changeset
48 from .i18n import _
fb5664eb8414 setdiscovery: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 25914
diff changeset
49 from .node import (
fb5664eb8414 setdiscovery: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 25914
diff changeset
50 nullid,
fb5664eb8414 setdiscovery: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 25914
diff changeset
51 nullrev,
fb5664eb8414 setdiscovery: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 25914
diff changeset
52 )
fb5664eb8414 setdiscovery: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 25914
diff changeset
53 from . import (
fb5664eb8414 setdiscovery: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 25914
diff changeset
54 dagutil,
26587
56b2bcea2529 error: get Abort from 'error' instead of 'util'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 25973
diff changeset
55 error,
25973
fb5664eb8414 setdiscovery: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 25914
diff changeset
56 )
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
57
23814
6a5877a73141 setdiscovery: drop the 'always' argument to '_updatesample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23813
diff changeset
58 def _updatesample(dag, nodes, sample, quicksamplesize=0):
23809
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
59 """update an existing sample to match the expected size
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
60
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
61 The sample is updated with nodes exponentially distant from each head of the
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
62 <nodes> set. (H~1, H~2, H~4, H~8, etc).
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
63
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
64 If a target size is specified, the sampling will stop once this size is
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
65 reached. Otherwise sampling will happen until roots of the <nodes> set are
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
66 reached.
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
67
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
68 :dag: a dag object from dagutil
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
69 :nodes: set of nodes we want to discover (if None, assume the whole dag)
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
70 :sample: a sample to update
9ca2eb881b53 setdiscovery: document the '_updatesample' function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23808
diff changeset
71 :quicksamplesize: optional target size of the sample"""
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
72 # if nodes is empty we scan the entire graph
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
73 if nodes:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
74 heads = dag.headsetofconnecteds(nodes)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
75 else:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
76 heads = dag.heads()
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
77 dist = {}
25113
0ca8410ea345 util: drop alias for collections.deque
Martin von Zweigbergk <martinvonz@google.com>
parents: 23817
diff changeset
78 visit = collections.deque(heads)
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
79 seen = set()
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
80 factor = 1
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
81 while visit:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
82 curr = visit.popleft()
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
83 if curr in seen:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
84 continue
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
85 d = dist.setdefault(curr, 1)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
86 if d > factor:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
87 factor *= 2
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
88 if d == factor:
23814
6a5877a73141 setdiscovery: drop the 'always' argument to '_updatesample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23813
diff changeset
89 sample.add(curr)
6a5877a73141 setdiscovery: drop the 'always' argument to '_updatesample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23813
diff changeset
90 if quicksamplesize and (len(sample) >= quicksamplesize):
6a5877a73141 setdiscovery: drop the 'always' argument to '_updatesample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23813
diff changeset
91 return
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
92 seen.add(curr)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
93 for p in dag.parents(curr):
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
94 if not nodes or p in nodes:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
95 dist.setdefault(p, d + 1)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
96 visit.append(p)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
97
23806
d6cbbe3baef0 setdiscovery: drop unused 'initial' argument for '_takequicksample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23747
diff changeset
98 def _takequicksample(dag, nodes, size):
23816
34d4b58580d1 setdiscovery: document '_takequicksample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23815
diff changeset
99 """takes a quick sample of size <size>
34d4b58580d1 setdiscovery: document '_takequicksample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23815
diff changeset
100
34d4b58580d1 setdiscovery: document '_takequicksample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23815
diff changeset
101 It is meant for initial sampling and focuses on querying heads and close
34d4b58580d1 setdiscovery: document '_takequicksample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23815
diff changeset
102 ancestors of heads.
34d4b58580d1 setdiscovery: document '_takequicksample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23815
diff changeset
103
34d4b58580d1 setdiscovery: document '_takequicksample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23815
diff changeset
104 :dag: a dag object
34d4b58580d1 setdiscovery: document '_takequicksample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23815
diff changeset
105 :nodes: set of nodes to discover
34d4b58580d1 setdiscovery: document '_takequicksample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23815
diff changeset
106 :size: the maximum size of the sample"""
23815
31e75a362d44 setdiscovery: drop '_setupsample' usage in '_takequicksample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23814
diff changeset
107 sample = dag.headsetofconnecteds(nodes)
31e75a362d44 setdiscovery: drop '_setupsample' usage in '_takequicksample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23814
diff changeset
108 if size <= len(sample):
31e75a362d44 setdiscovery: drop '_setupsample' usage in '_takequicksample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23814
diff changeset
109 return _limitsample(sample, size)
23814
6a5877a73141 setdiscovery: drop the 'always' argument to '_updatesample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23813
diff changeset
110 _updatesample(dag, None, sample, quicksamplesize=size)
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
111 return sample
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
112
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
113 def _takefullsample(dag, nodes, size):
23814
6a5877a73141 setdiscovery: drop the 'always' argument to '_updatesample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23813
diff changeset
114 sample = dag.headsetofconnecteds(nodes)
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
115 # update from heads
23814
6a5877a73141 setdiscovery: drop the 'always' argument to '_updatesample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23813
diff changeset
116 _updatesample(dag, nodes, sample)
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
117 # update from roots
23814
6a5877a73141 setdiscovery: drop the 'always' argument to '_updatesample'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23813
diff changeset
118 _updatesample(dag.inverse(), nodes, sample)
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
119 assert sample
23810
b681d3a2bf04 setdiscovery: randomly pick between heads and sample when taking full sample
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23809
diff changeset
120 sample = _limitsample(sample, size)
b681d3a2bf04 setdiscovery: randomly pick between heads and sample when taking full sample
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23809
diff changeset
121 if len(sample) < size:
b681d3a2bf04 setdiscovery: randomly pick between heads and sample when taking full sample
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23809
diff changeset
122 more = size - len(sample)
b681d3a2bf04 setdiscovery: randomly pick between heads and sample when taking full sample
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23809
diff changeset
123 sample.update(random.sample(list(nodes - sample), more))
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
124 return sample
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
125
23083
ee45f5c2ffcc setdiscovery: extract sample limitation in a `_limitsample` function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 20656
diff changeset
126 def _limitsample(sample, desiredlen):
ee45f5c2ffcc setdiscovery: extract sample limitation in a `_limitsample` function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 20656
diff changeset
127 """return a random subset of sample of at most desiredlen item"""
ee45f5c2ffcc setdiscovery: extract sample limitation in a `_limitsample` function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 20656
diff changeset
128 if len(sample) > desiredlen:
ee45f5c2ffcc setdiscovery: extract sample limitation in a `_limitsample` function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 20656
diff changeset
129 sample = set(random.sample(sample, desiredlen))
ee45f5c2ffcc setdiscovery: extract sample limitation in a `_limitsample` function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 20656
diff changeset
130 return sample
ee45f5c2ffcc setdiscovery: extract sample limitation in a `_limitsample` function
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 20656
diff changeset
131
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
132 def findcommonheads(ui, local, remote,
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
133 initialsamplesize=100,
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
134 fullsamplesize=200,
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
135 abortwhenunrelated=True):
14206
2bf60f158ecb setdiscovery: limit lines to 80 characters
Steven Brown <StevenGBrown@gmail.com>
parents: 14164
diff changeset
136 '''Return a tuple (common, anyincoming, remoteheads) used to identify
2bf60f158ecb setdiscovery: limit lines to 80 characters
Steven Brown <StevenGBrown@gmail.com>
parents: 14164
diff changeset
137 missing nodes from or in remote.
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
138 '''
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
139 roundtrips = 0
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
140 cl = local.changelog
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
141 dag = dagutil.revlogdag(cl)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
142
14624
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
143 # early exit if we know all the specified remote heads already
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
144 ui.debug("query 1; heads\n")
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
145 roundtrips += 1
14624
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
146 ownheads = dag.heads()
23084
3ef893520a85 setdiscovery: limit the size of the initial sample (issue4411)
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23083
diff changeset
147 sample = _limitsample(ownheads, initialsamplesize)
23192
73cfaa348650 discovery: indices between sample and yesno must match (issue4438)
Mads Kiilerich <madski@unity3d.com>
parents: 23191
diff changeset
148 # indices between sample and externalized version must match
73cfaa348650 discovery: indices between sample and yesno must match (issue4438)
Mads Kiilerich <madski@unity3d.com>
parents: 23191
diff changeset
149 sample = list(sample)
25914
4d77e89652ad discovery: always use batching now that all peers support batching
Augie Fackler <augie@google.com>
parents: 25151
diff changeset
150 batch = remote.batch()
4d77e89652ad discovery: always use batching now that all peers support batching
Augie Fackler <augie@google.com>
parents: 25151
diff changeset
151 srvheadhashesref = batch.heads()
4d77e89652ad discovery: always use batching now that all peers support batching
Augie Fackler <augie@google.com>
parents: 25151
diff changeset
152 yesnoref = batch.known(dag.externalizeall(sample))
4d77e89652ad discovery: always use batching now that all peers support batching
Augie Fackler <augie@google.com>
parents: 25151
diff changeset
153 batch.submit()
4d77e89652ad discovery: always use batching now that all peers support batching
Augie Fackler <augie@google.com>
parents: 25151
diff changeset
154 srvheadhashes = srvheadhashesref.value
4d77e89652ad discovery: always use batching now that all peers support batching
Augie Fackler <augie@google.com>
parents: 25151
diff changeset
155 yesno = yesnoref.value
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
156
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
157 if cl.tip() == nullid:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
158 if srvheadhashes != [nullid]:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
159 return [nullid], True, srvheadhashes
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
160 return [nullid], False, []
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
161
14206
2bf60f158ecb setdiscovery: limit lines to 80 characters
Steven Brown <StevenGBrown@gmail.com>
parents: 14164
diff changeset
162 # start actual discovery (we note this before the next "if" for
2bf60f158ecb setdiscovery: limit lines to 80 characters
Steven Brown <StevenGBrown@gmail.com>
parents: 14164
diff changeset
163 # compatibility reasons)
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
164 ui.status(_("searching for changes\n"))
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
165
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
166 srvheads = dag.internalizeall(srvheadhashes, filterunknown=True)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
167 if len(srvheads) == len(srvheadhashes):
14833
308e1b5acc87 discovery: quiet note about heads
Matt Mackall <mpm@selenic.com>
parents: 14624
diff changeset
168 ui.debug("all remote heads known locally\n")
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
169 return (srvheadhashes, False, srvheadhashes,)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
170
25151
6eb4bdad198f cleanup: use __builtins__.all instead of util.all
Augie Fackler <augie@google.com>
parents: 25113
diff changeset
171 if sample and len(ownheads) <= initialsamplesize and all(yesno):
15497
9bea3aed6ee1 add missing localization markup
Mads Kiilerich <mads@kiilerich.com>
parents: 15063
diff changeset
172 ui.note(_("all local heads known remotely\n"))
14624
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
173 ownheadhashes = dag.externalizeall(ownheads)
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
174 return (ownheadhashes, True, srvheadhashes,)
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
175
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
176 # full blown discovery
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
177
16683
525fdb738975 cleanup: eradicate long lines
Brodie Rao <brodie@sf.io>
parents: 15713
diff changeset
178 # own nodes I know we both know
23343
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
179 # treat remote heads (and maybe own heads) as a first implicit sample
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
180 # response
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
181 common = cl.incrementalmissingrevs(srvheads)
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
182 commoninsample = set(n for i, n in enumerate(sample) if yesno[i])
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
183 common.addbases(commoninsample)
23746
4ef2f2fa8b8b setdiscovery: drop shadowed 'undecided' assignment
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23343
diff changeset
184 # own nodes where I don't know if remote knows them
23343
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
185 undecided = set(common.missingancestors(ownheads))
16683
525fdb738975 cleanup: eradicate long lines
Brodie Rao <brodie@sf.io>
parents: 15713
diff changeset
186 # own nodes I know remote lacks
525fdb738975 cleanup: eradicate long lines
Brodie Rao <brodie@sf.io>
parents: 15713
diff changeset
187 missing = set()
525fdb738975 cleanup: eradicate long lines
Brodie Rao <brodie@sf.io>
parents: 15713
diff changeset
188
14624
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
189 full = False
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
190 while undecided:
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
191
14624
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
192 if sample:
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
193 missinginsample = [n for i, n in enumerate(sample) if not yesno[i]]
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
194 missing.update(dag.descendantset(missinginsample, missing))
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
195
14624
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
196 undecided.difference_update(missing)
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
197
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
198 if not undecided:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
199 break
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
200
23747
f82173a90c2c setdiscovery: factorize similar sampling code
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23746
diff changeset
201 if full or common.hasbases():
f82173a90c2c setdiscovery: factorize similar sampling code
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23746
diff changeset
202 if full:
f82173a90c2c setdiscovery: factorize similar sampling code
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23746
diff changeset
203 ui.note(_("sampling from both directions\n"))
f82173a90c2c setdiscovery: factorize similar sampling code
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23746
diff changeset
204 else:
f82173a90c2c setdiscovery: factorize similar sampling code
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23746
diff changeset
205 ui.debug("taking initial sample\n")
23807
e97e363a7000 setdiscovery: delay sample building calls to gather them in a single place
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23806
diff changeset
206 samplefunc = _takefullsample
23130
ced632394371 setdiscovery: limit the size of all sample (issue4411)
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23084
diff changeset
207 targetsize = fullsamplesize
14624
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
208 else:
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
209 # use even cheaper initial sample
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
210 ui.debug("taking quick initial sample\n")
23807
e97e363a7000 setdiscovery: delay sample building calls to gather them in a single place
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23806
diff changeset
211 samplefunc = _takequicksample
23130
ced632394371 setdiscovery: limit the size of all sample (issue4411)
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23084
diff changeset
212 targetsize = initialsamplesize
23808
07d0f59e0ba7 setdiscovery: avoid calling any sample building if the undecided set is small
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23807
diff changeset
213 if len(undecided) < targetsize:
07d0f59e0ba7 setdiscovery: avoid calling any sample building if the undecided set is small
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23807
diff changeset
214 sample = list(undecided)
07d0f59e0ba7 setdiscovery: avoid calling any sample building if the undecided set is small
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23807
diff changeset
215 else:
07d0f59e0ba7 setdiscovery: avoid calling any sample building if the undecided set is small
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23807
diff changeset
216 sample = samplefunc(dag, undecided, targetsize)
07d0f59e0ba7 setdiscovery: avoid calling any sample building if the undecided set is small
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 23807
diff changeset
217 sample = _limitsample(sample, targetsize)
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
218
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
219 roundtrips += 1
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
220 ui.progress(_('searching'), roundtrips, unit=_('queries'))
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
221 ui.debug("query %i; still undecided: %i, sample size is: %i\n"
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
222 % (roundtrips, len(undecided), len(sample)))
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
223 # indices between sample and externalized version must match
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
224 sample = list(sample)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
225 yesno = remote.known(dag.externalizeall(sample))
14624
f03c82d1f50a setdiscovery: batch heads and known(ownheads)
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents: 14206
diff changeset
226 full = True
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
227
23343
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
228 if sample:
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
229 commoninsample = set(n for i, n in enumerate(sample) if yesno[i])
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
230 common.addbases(commoninsample)
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
231 common.removeancestorsfrom(undecided)
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
232
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
233 # heads(common) == heads(common.bases) since common represents common.bases
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
234 # and all its ancestors
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
235 result = dag.headsetofconnecteds(common.bases)
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
236 # common.bases can include nullrev, but our contract requires us to not
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
237 # return any heads in that case, so discard that
f8a2647fe020 setdiscovery: avoid a full changelog graph traversal
Siddharth Agarwal <sid0@fb.com>
parents: 23192
diff changeset
238 result.discard(nullrev)
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
239 ui.progress(_('searching'), None)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
240 ui.debug("%d total queries\n" % roundtrips)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
241
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
242 if not result and srvheadhashes != [nullid]:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
243 if abortwhenunrelated:
26587
56b2bcea2529 error: get Abort from 'error' instead of 'util'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 25973
diff changeset
244 raise error.Abort(_("repository is unrelated"))
14164
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
245 else:
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
246 ui.warn(_("warning: repository is unrelated\n"))
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
247 return (set([nullid]), True, srvheadhashes,)
cb98fed52495 discovery: add new set-based discovery
Peter Arrenbrecht <peter.arrenbrecht@gmail.com>
parents:
diff changeset
248
14981
192e02680d09 setdiscovery: return anyincoming=False when remote's only head is nullid
Andrew Pritchard <andrewp@fogcreek.com>
parents: 14833
diff changeset
249 anyincoming = (srvheadhashes != [nullid])
192e02680d09 setdiscovery: return anyincoming=False when remote's only head is nullid
Andrew Pritchard <andrewp@fogcreek.com>
parents: 14833
diff changeset
250 return dag.externalizeall(result), anyincoming, srvheadhashes