revset: improve performance of _generatorset.__contains__ (issue 4201)
authorGregory Szorc <gregory.szorc@gmail.com>
Mon, 24 Mar 2014 20:00:18 -0700
changeset 20828 3210b7930899
parent 20827 ca5dd216cb62
child 20829 9a09a625bc93
revset: improve performance of _generatorset.__contains__ (issue 4201) _generatorset.__contains__ and __contains__ from child classes were calling into __iter__ to look for values. Since all previously-encountered values from the generator were cached and checked in __contains__ before this iteration, __contains__ was effectively performing iteration busy work which could lead to an explosion of redundant work. This patch changes __contains__ to be more intelligent. Instead of looking at all values via __iter__, __contains__ will instead go straight to "new" values from the underlying generator. On a clone of the Firefox repository with around 200,000 changesets, this patch decreases the execution time of the revset '::(200067)::' from ~100s to ~4s on the author's machine. Rebase operations (which use the aforementioned revset), speed up accordingly.
mercurial/revset.py
--- a/mercurial/revset.py	Sat Mar 22 23:39:51 2014 +0900
+++ b/mercurial/revset.py	Mon Mar 24 20:00:18 2014 -0700
@@ -2630,8 +2630,8 @@
         if x in self._cache:
             return self._cache[x]
 
-        # Use __iter__ which caches values and stores them into self._genlist
-        for l in self:
+        # Use new values only, as existing values would be cached.
+        for l in self._consumegen():
             if l == x:
                 return True
 
@@ -2645,17 +2645,18 @@
             # started over the generatorset.
             for l in self._genlist:
                 yield l
-        else:
-            # Starting iteration over the generatorset.
-            self._iterated = True
+
+        for item in self._consumegen():
+            yield item
+
+    def _consumegen(self):
+        self._iterated = True
 
         for item in self._gen:
             self._cache[item] = True
             self._genlist.append(item)
             yield item
 
-        # Iteration over the generator has finished. Whole value list should be
-        # cached in self._genlist
         self._finished = True
 
     def set(self):
@@ -2680,7 +2681,8 @@
         if x in self._cache:
             return self._cache[x]
 
-        for l in self:
+        # Use new values only, as existing values would be cached.
+        for l in self._consumegen():
             if l == x:
                 return True
             if l > x:
@@ -2702,7 +2704,8 @@
         if x in self._cache:
             return self._cache[x]
 
-        for l in self:
+        # Use new values only, as existing values would be cached.
+        for l in self._consumegen():
             if l == x:
                 return True
             if l < x: