localrepo: cache types for filtered repos (issue5043)
Python introduces a reference cycle on dynamically created types
via __mro__, making them very easy to leak. See
https://bugs.python.org/issue17950.
Previously, repo.filtered() created a type on every invocation.
Long-running processes (like `hg convert`) could call this
function thousands of times, leading to a steady memory leak.
Since we're Unable to stop the leak because this is a bug in
Python, the next best thing is to contain it.
This patch adds a cache of of the dynamically generated repoview/filter
types on the localrepo object. Since we only generate each type
once, we cap the amount of memory that can leak to something
reasonable.
After this change, `hg convert` no longer leaks memory on every
revision. The process will likely grow memory usage over time due
to e.g. larger manifests. But there are no leaks.
--- a/mercurial/localrepo.py Tue Jul 11 02:10:04 2017 +0900
+++ b/mercurial/localrepo.py Sat Jul 01 20:51:19 2017 -0700
@@ -430,6 +430,9 @@
# post-dirstate-status hooks
self._postdsstatus = []
+ # Cache of types representing filtered repos.
+ self._filteredrepotypes = weakref.WeakKeyDictionary()
+
# generic mapping between names and nodes
self.names = namespaces.namespaces()
@@ -539,11 +542,21 @@
def filtered(self, name):
"""Return a filtered version of a repository"""
- # build a new class with the mixin and the current class
- # (possibly subclass of the repo)
- class filteredrepo(repoview.repoview, self.unfiltered().__class__):
- pass
- return filteredrepo(self, name)
+ # Python <3.4 easily leaks types via __mro__. See
+ # https://bugs.python.org/issue17950. We cache dynamically
+ # created types so this method doesn't leak on every
+ # invocation.
+
+ key = self.unfiltered().__class__
+ if key not in self._filteredrepotypes:
+ # Build a new type with the repoview mixin and the base
+ # class of this repo. Give it a name containing the
+ # filter name to aid debugging.
+ bases = (repoview.repoview, key)
+ cls = type('%sfilteredrepo' % name, bases, {})
+ self._filteredrepotypes[key] = cls
+
+ return self._filteredrepotypes[key](self, name)
@repofilecache('bookmarks', 'bookmarks.current')
def _bookmarks(self):
--- a/mercurial/statichttprepo.py Tue Jul 11 02:10:04 2017 +0900
+++ b/mercurial/statichttprepo.py Sat Jul 01 20:51:19 2017 -0700
@@ -165,6 +165,8 @@
self.encodepats = None
self.decodepats = None
self._transref = None
+ # Cache of types representing filtered repos.
+ self._filteredrepotypes = {}
def _restrictcapabilities(self, caps):
caps = super(statichttprepository, self)._restrictcapabilities(caps)