copies: optimize forward copy detection logic for rebases
authorDurham Goode <durham@fb.com>
Fri, 05 Feb 2016 13:23:24 -0800
changeset 28000 d4247c306d82
parent 27999 b30b40804a3f
child 28001 ade12bf2bf0e
copies: optimize forward copy detection logic for rebases Forward copy detection (i.e. detecting what files have been moved/copied in commit X since ancestor Y) previously required diff'ing the manifests of both X and Y. This was expensive since it required reading both entire manifests and doing a set difference (they weren't already in a set because of the lazymanifest work). This cost almost 1 second on very large repositories, and happens N times for a rebase of N commits. This patch optimizes it for the case of rebase. In a rebase, we are comparing a commit against it's immediate parent, and therefore we can know what files changed by looking at ctx.files(). This lets us drastically decrease the size of the set comparison, and makes it O(# of changes) instead of O(size of manifest). This makes it take 1ms instead of 1000ms.
mercurial/copies.py
--- a/mercurial/copies.py	Tue Jan 26 23:05:19 2016 +0900
+++ b/mercurial/copies.py	Fri Feb 05 13:23:24 2016 -0800
@@ -10,7 +10,9 @@
 import heapq
 
 from . import (
+    node,
     pathutil,
+    scmutil,
     util,
 )
 
@@ -175,7 +177,18 @@
     # we currently don't try to find where old files went, too expensive
     # this means we can miss a case like 'hg rm b; hg cp a b'
     cm = {}
-    missing = _computeforwardmissing(a, b, match=match)
+
+    # Computing the forward missing is quite expensive on large manifests, since
+    # it compares the entire manifests. We can optimize it in the common use
+    # case of computing what copies are in a commit versus its parent (like
+    # during a rebase or histedit). Note, we exclude merge commits from this
+    # optimization, since the ctx.files() for a merge commit is not correct for
+    # this comparison.
+    forwardmissingmatch = match
+    if not match and b.p1() == a and b.p2().node() == node.nullid:
+        forwardmissingmatch = scmutil.matchfiles(a._repo, b.files())
+    missing = _computeforwardmissing(a, b, match=forwardmissingmatch)
+
     ancestrycontext = a._repo.changelog.ancestors([b.rev()], inclusive=True)
     for f in missing:
         fctx = b[f]