mergecopies: reuse ancestry context when traversing file history (issue4537) stable
authorPierre-Yves David <pierre-yves.david@fb.com>
Fri, 20 Mar 2015 00:30:35 -0700
branchstable
changeset 24412 9372180b8c9b
parent 24411 5a12ef618c03
child 24415 1cfded2fa1a9
child 24418 a2285e2fc949
mergecopies: reuse ancestry context when traversing file history (issue4537) Merge copies is traversing file history in search for copies and renames. Since 3.3 we are doing "linkrev adjustment" to ensure duplicated filelog entry does not confuse the traversal. This "linkrev adjustment" involved ancestry testing and walking in the changeset graph. If we do such walk in the changesets graph for each file, we end up with a 'O(<changesets>x<files>)' complexity that create massive issue. For examples, grafting a changeset in Mozilla's repo moved from 6 seconds to more than 3 minutes. There is a mechanism to reuse such ancestors computation between all files. But it has to be manually set up in situation were it make sense to take such shortcut. This changesets set this mechanism up and bring back the graph time from 3 minutes to 8 seconds. To do so, we need a bigger control on the way 'filectx' are instantiated during each 'checkcopies' calls that 'mergecopies' is doing. We add a new 'setupctx' that configure and return a 'filectx' factory. The function make sure the ancestry context is properly created and the factory make sure it is properly installed on returned 'filectx'.
mercurial/copies.py
--- a/mercurial/copies.py	Thu Mar 19 23:57:34 2015 -0700
+++ b/mercurial/copies.py	Fri Mar 20 00:30:35 2015 -0700
@@ -246,14 +246,41 @@
     m2 = c2.manifest()
     ma = ca.manifest()
 
-    def makectx(f, n):
-        if len(n) != 20: # in a working context?
-            if c1.rev() is None:
-                return c1.filectx(f)
-            return c2.filectx(f)
-        return repo.filectx(f, fileid=n)
+
+    def setupctx(ctx):
+        """return a 'makectx' function suitable for checkcopies usage from ctx
+
+        We have to re-setup the function building 'filectx' for each
+        'checkcopies' to ensure the linkrev adjustement is properly setup for
+        each. Linkrev adjustment is important to avoid bug in rename
+        detection. Moreover, having a proper '_ancestrycontext' setup ensures
+        the performance impact of this adjustment is kept limited. Without it,
+        each file could do a full dag traversal making the time complexity of
+        the operation explode (see issue4537).
 
-    ctx = util.lrucachefunc(makectx)
+        This function exists here mostly to limit the impact on stable. Feel
+        free to refactor on default.
+        """
+        rev = ctx.rev()
+        ac = getattr(ctx, '_ancestrycontext', None)
+        if ac is None:
+            revs = [rev]
+            if rev is None:
+                revs = [p.rev() for p in ctx.parents()]
+            ac = ctx._repo.changelog.ancestors(revs, inclusive=True)
+            ctx._ancestrycontext = ac
+        def makectx(f, n):
+            if len(n) != 20:  # in a working context?
+                if c1.rev() is None:
+                    return c1.filectx(f)
+                return c2.filectx(f)
+            fctx = repo.filectx(f, fileid=n)
+            # setup only needed for filectx not create from a changectx
+            fctx._ancestrycontext = ac
+            fctx._descendantrev = rev
+            return fctx
+        return util.lrucachefunc(makectx)
+
     copy = {}
     movewithdir = {}
     fullcopy = {}
@@ -272,9 +299,11 @@
                       % "\n   ".join(u2))
 
     for f in u1:
+        ctx = setupctx(c1)
         checkcopies(ctx, f, m1, m2, ca, limit, diverge, copy, fullcopy)
 
     for f in u2:
+        ctx = setupctx(c2)
         checkcopies(ctx, f, m2, m1, ca, limit, diverge, copy, fullcopy)
 
     renamedelete = {}
@@ -297,7 +326,9 @@
                       % "\n   ".join(bothnew))
     bothdiverge, _copy, _fullcopy = {}, {}, {}
     for f in bothnew:
+        ctx = setupctx(c1)
         checkcopies(ctx, f, m1, m2, ca, limit, bothdiverge, _copy, _fullcopy)
+        ctx = setupctx(c2)
         checkcopies(ctx, f, m2, m1, ca, limit, bothdiverge, _copy, _fullcopy)
     for of, fl in bothdiverge.items():
         if len(fl) == 2 and fl[0] == fl[1]: