similar: get rid of quadratic addedfiles.remove()
Instead, build a set of files to be removed and recreate addedfiles
only if necessary.
Benchmark with 50k added/removed files, on tmpfs:
$ hg addremove --dry-run --time -q
original: real 16.550 secs (user 15.000+0.000 sys 1.540+0.000)
previous: real 16.730 secs (user 15.280+0.000 sys 1.440+0.000)
this patch: real 16.070 secs (user 14.470+0.000 sys 1.580+0.000)
--- a/mercurial/similar.py Sun Mar 15 18:58:56 2015 +0900
+++ b/mercurial/similar.py Thu Mar 23 20:50:33 2017 +0900
@@ -107,12 +107,14 @@
if fp in parentctx and parentctx[fp].size() > 0]
# Find exact matches.
- for (a, b) in _findexactmatches(repo, addedfiles[:], removedfiles):
- addedfiles.remove(b)
+ matchedfiles = set()
+ for (a, b) in _findexactmatches(repo, addedfiles, removedfiles):
+ matchedfiles.add(b)
yield (a.path(), b.path(), 1.0)
# If the user requested similar files to be matched, search for them also.
if threshold < 1.0:
+ addedfiles = [x for x in addedfiles if x not in matchedfiles]
for (a, b, score) in _findsimilarmatches(repo, addedfiles,
removedfiles, threshold):
yield (a.path(), b.path(), score)