revset: improve performance of _generatorset.__contains__ (issue 4201)
_generatorset.__contains__ and __contains__ from child classes were
calling into __iter__ to look for values. Since all
previously-encountered values from the generator were cached and checked
in __contains__ before this iteration, __contains__ was effectively
performing iteration busy work which could lead to an explosion of
redundant work.
This patch changes __contains__ to be more intelligent. Instead of
looking at all values via __iter__, __contains__ will instead go
straight to "new" values from the underlying generator.
On a clone of the Firefox repository with around 200,000 changesets,
this patch decreases the execution time of the revset '::(200067)::'
from ~100s to ~4s on the author's machine. Rebase operations (which use
the aforementioned revset), speed up accordingly.
--- a/mercurial/revset.py Sat Mar 22 23:39:51 2014 +0900
+++ b/mercurial/revset.py Mon Mar 24 20:00:18 2014 -0700
@@ -2630,8 +2630,8 @@
if x in self._cache:
return self._cache[x]
- # Use __iter__ which caches values and stores them into self._genlist
- for l in self:
+ # Use new values only, as existing values would be cached.
+ for l in self._consumegen():
if l == x:
return True
@@ -2645,17 +2645,18 @@
# started over the generatorset.
for l in self._genlist:
yield l
- else:
- # Starting iteration over the generatorset.
- self._iterated = True
+
+ for item in self._consumegen():
+ yield item
+
+ def _consumegen(self):
+ self._iterated = True
for item in self._gen:
self._cache[item] = True
self._genlist.append(item)
yield item
- # Iteration over the generator has finished. Whole value list should be
- # cached in self._genlist
self._finished = True
def set(self):
@@ -2680,7 +2681,8 @@
if x in self._cache:
return self._cache[x]
- for l in self:
+ # Use new values only, as existing values would be cached.
+ for l in self._consumegen():
if l == x:
return True
if l > x:
@@ -2702,7 +2704,8 @@
if x in self._cache:
return self._cache[x]
- for l in self:
+ # Use new values only, as existing values would be cached.
+ for l in self._consumegen():
if l == x:
return True
if l < x: