dirstate: speed up repeated missing directory checks
In a mozilla repo with tip at
bb3ff09f52fe,
hg update tip~1000 && time hg revert -nq -r tip .
displays ~4:20 minutes. With tip~100, it runs in ~11 s. With revision
100000, it did not finish in 12 minutes.
Revert calls dirstate.status() with a matcher that matches each file
in the target revision. The main problem [1] lies in
dirstate._walkexplicit(), which looks for matching deleted directories
by checking whether each path is prefix of any path in the
dirstate. With m files in the dirstate and n files in the target
revision that are not in the dirstate, this is clearly O(m*n). Let's
improve by keeping a lazily initialized set of all the directories in
the dirstate, so the time becomes O(m+n).
After this patch, the 4:20 minutes become 5.5 s, while for a single
missing path, it slows down from 1.092 s to 1.150 s (best of 4). The
>12 min case becomes 5.8 s.
[1] A narrower optimization would be to make revert take the fast
path for '.' and '--all'.
--- a/mercurial/dirstate.py Wed Nov 19 17:07:27 2014 -0800
+++ b/mercurial/dirstate.py Wed Nov 19 23:15:07 2014 -0800
@@ -629,6 +629,7 @@
results = dict.fromkeys(subrepos)
results['.hg'] = None
+ alldirs = None
for ff in files:
if normalize:
nf = normalize(normpath(ff), False, True)
@@ -657,13 +658,12 @@
if nf in dmap: # does it exactly match a missing file?
results[nf] = None
else: # does it match a missing directory?
- prefix = nf + "/"
- for fn in dmap:
- if fn.startswith(prefix):
- if matchedir:
- matchedir(nf)
- notfoundadd(nf)
- break
+ if alldirs is None:
+ alldirs = scmutil.dirs(dmap)
+ if nf in alldirs:
+ if matchedir:
+ matchedir(nf)
+ notfoundadd(nf)
else:
badfn(ff, inst.strerror)