comparison mercurial/copies.py @ 24412:9372180b8c9b stable

mergecopies: reuse ancestry context when traversing file history (issue4537) Merge copies is traversing file history in search for copies and renames. Since 3.3 we are doing "linkrev adjustment" to ensure duplicated filelog entry does not confuse the traversal. This "linkrev adjustment" involved ancestry testing and walking in the changeset graph. If we do such walk in the changesets graph for each file, we end up with a 'O(<changesets>x<files>)' complexity that create massive issue. For examples, grafting a changeset in Mozilla's repo moved from 6 seconds to more than 3 minutes. There is a mechanism to reuse such ancestors computation between all files. But it has to be manually set up in situation were it make sense to take such shortcut. This changesets set this mechanism up and bring back the graph time from 3 minutes to 8 seconds. To do so, we need a bigger control on the way 'filectx' are instantiated during each 'checkcopies' calls that 'mergecopies' is doing. We add a new 'setupctx' that configure and return a 'filectx' factory. The function make sure the ancestry context is properly created and the factory make sure it is properly installed on returned 'filectx'.
author Pierre-Yves David <pierre-yves.david@fb.com>
date Fri, 20 Mar 2015 00:30:35 -0700
parents 751d1138ce35
children 1cfded2fa1a9
comparison
equal deleted inserted replaced
24411:5a12ef618c03 24412:9372180b8c9b
244 return {}, {}, {}, {} 244 return {}, {}, {}, {}
245 m1 = c1.manifest() 245 m1 = c1.manifest()
246 m2 = c2.manifest() 246 m2 = c2.manifest()
247 ma = ca.manifest() 247 ma = ca.manifest()
248 248
249 def makectx(f, n): 249
250 if len(n) != 20: # in a working context? 250 def setupctx(ctx):
251 if c1.rev() is None: 251 """return a 'makectx' function suitable for checkcopies usage from ctx
252 return c1.filectx(f) 252
253 return c2.filectx(f) 253 We have to re-setup the function building 'filectx' for each
254 return repo.filectx(f, fileid=n) 254 'checkcopies' to ensure the linkrev adjustement is properly setup for
255 255 each. Linkrev adjustment is important to avoid bug in rename
256 ctx = util.lrucachefunc(makectx) 256 detection. Moreover, having a proper '_ancestrycontext' setup ensures
257 the performance impact of this adjustment is kept limited. Without it,
258 each file could do a full dag traversal making the time complexity of
259 the operation explode (see issue4537).
260
261 This function exists here mostly to limit the impact on stable. Feel
262 free to refactor on default.
263 """
264 rev = ctx.rev()
265 ac = getattr(ctx, '_ancestrycontext', None)
266 if ac is None:
267 revs = [rev]
268 if rev is None:
269 revs = [p.rev() for p in ctx.parents()]
270 ac = ctx._repo.changelog.ancestors(revs, inclusive=True)
271 ctx._ancestrycontext = ac
272 def makectx(f, n):
273 if len(n) != 20: # in a working context?
274 if c1.rev() is None:
275 return c1.filectx(f)
276 return c2.filectx(f)
277 fctx = repo.filectx(f, fileid=n)
278 # setup only needed for filectx not create from a changectx
279 fctx._ancestrycontext = ac
280 fctx._descendantrev = rev
281 return fctx
282 return util.lrucachefunc(makectx)
283
257 copy = {} 284 copy = {}
258 movewithdir = {} 285 movewithdir = {}
259 fullcopy = {} 286 fullcopy = {}
260 diverge = {} 287 diverge = {}
261 288
270 if u2: 297 if u2:
271 repo.ui.debug(" unmatched files in other:\n %s\n" 298 repo.ui.debug(" unmatched files in other:\n %s\n"
272 % "\n ".join(u2)) 299 % "\n ".join(u2))
273 300
274 for f in u1: 301 for f in u1:
302 ctx = setupctx(c1)
275 checkcopies(ctx, f, m1, m2, ca, limit, diverge, copy, fullcopy) 303 checkcopies(ctx, f, m1, m2, ca, limit, diverge, copy, fullcopy)
276 304
277 for f in u2: 305 for f in u2:
306 ctx = setupctx(c2)
278 checkcopies(ctx, f, m2, m1, ca, limit, diverge, copy, fullcopy) 307 checkcopies(ctx, f, m2, m1, ca, limit, diverge, copy, fullcopy)
279 308
280 renamedelete = {} 309 renamedelete = {}
281 renamedelete2 = set() 310 renamedelete2 = set()
282 diverge2 = set() 311 diverge2 = set()
295 if bothnew: 324 if bothnew:
296 repo.ui.debug(" unmatched files new in both:\n %s\n" 325 repo.ui.debug(" unmatched files new in both:\n %s\n"
297 % "\n ".join(bothnew)) 326 % "\n ".join(bothnew))
298 bothdiverge, _copy, _fullcopy = {}, {}, {} 327 bothdiverge, _copy, _fullcopy = {}, {}, {}
299 for f in bothnew: 328 for f in bothnew:
329 ctx = setupctx(c1)
300 checkcopies(ctx, f, m1, m2, ca, limit, bothdiverge, _copy, _fullcopy) 330 checkcopies(ctx, f, m1, m2, ca, limit, bothdiverge, _copy, _fullcopy)
331 ctx = setupctx(c2)
301 checkcopies(ctx, f, m2, m1, ca, limit, bothdiverge, _copy, _fullcopy) 332 checkcopies(ctx, f, m2, m1, ca, limit, bothdiverge, _copy, _fullcopy)
302 for of, fl in bothdiverge.items(): 333 for of, fl in bothdiverge.items():
303 if len(fl) == 2 and fl[0] == fl[1]: 334 if len(fl) == 2 and fl[0] == fl[1]:
304 copy[fl[0]] = of # not actually divergent, just matching renames 335 copy[fl[0]] = of # not actually divergent, just matching renames
305 336