hgwebdir: avoid systematic full garbage collection
authorGeorges Racinet <georges.racinet@octobus.net>
Tue, 20 Jul 2021 17:20:19 +0200
changeset 47802 de2e04fe4897
parent 47795 b1e1559f5a45
child 47859 155a2ec8a9dc
hgwebdir: avoid systematic full garbage collection Forcing a systematic full garbage collection upon each request can serioulsy harm performance. This is reported as https://bz.mercurial-scm.org/show_bug.cgi?id=6075 With this change we're performing the full collection according to a new setting, `experimental.web.full-garbage-collection-rate`. The default value is 1, which doesn't change the behavior and will allow us to test on real use cases. If the value is 0, no full garbage collection occurs. Regardless of the value of the setting, a partial garbage collection still occurs upon each request (not attempting to collect objects from the oldest generation). This should be enough to take care of reference cycles that have been created by the last request (assessment of this requires changing the setting, not to be 1). In my experience chasing memory leaks in Mercurial servers, the full collection never reclaimed any memory, but this is with Python 3 and biased towards small repositories. On the other hand, as explained in the Python developer docs [1], frequent full collections are very harmful in terms of performance if lots of objects survive the collection, and hence stay in the oldest generation. Note that `gc.collect()` is indeed trying to collect the oldest generation [2]. This happens usually in two cases: - unwanted lingering objects (i.e., an actual memory leak that the GC cannot do anything about). Sadly, we have lots of those these days. - desireable long-term objects, typically in caches (not inner caches carried by repositories, which should be collected with them). This is a subject of interest for the Heptapod project. In short, the flat rate that this change still permits is probably a bad idea in most cases, and the default value can be tweaked later on (or even be set to 0) according to experiments in the wild. The test is inspired from test-hgwebdir-paths.py [1] https://devguide.python.org/garbage_collector/#collecting-the-oldest-generation [2] https://docs.python.org/3/library/gc.html#gc.collect Differential Revision: https://phab.mercurial-scm.org/D11204
mercurial/configitems.py
mercurial/hgweb/hgwebdir_mod.py
tests/test-hgwebdir-gc.py
--- a/mercurial/configitems.py	Wed Jul 28 13:45:07 2021 +0300
+++ b/mercurial/configitems.py	Tue Jul 20 17:20:19 2021 +0200
@@ -1266,6 +1266,11 @@
 )
 coreconfigitem(
     b'experimental',
+    b'web.full-garbage-collection-rate',
+    default=1,  # still forcing a full collection on each request
+)
+coreconfigitem(
+    b'experimental',
     b'worker.wdir-get-thread-safe',
     default=False,
 )
--- a/mercurial/hgweb/hgwebdir_mod.py	Wed Jul 28 13:45:07 2021 +0300
+++ b/mercurial/hgweb/hgwebdir_mod.py	Tue Jul 20 17:20:19 2021 +0200
@@ -285,6 +285,7 @@
         self.lastrefresh = 0
         self.motd = None
         self.refresh()
+        self.requests_count = 0
         if not baseui:
             # set up environment for new ui
             extensions.loadall(self.ui)
@@ -341,6 +342,10 @@
 
         self.repos = repos
         self.ui = u
+        self.gc_full_collect_rate = self.ui.configint(
+            b'experimental', b'web.full-garbage-collection-rate'
+        )
+        self.gc_full_collections_done = 0
         encoding.encoding = self.ui.config(b'web', b'encoding')
         self.style = self.ui.config(b'web', b'style')
         self.templatepath = self.ui.config(
@@ -383,12 +388,27 @@
             finally:
                 # There are known cycles in localrepository that prevent
                 # those objects (and tons of held references) from being
-                # collected through normal refcounting. We mitigate those
-                # leaks by performing an explicit GC on every request.
-                # TODO remove this once leaks are fixed.
-                # TODO only run this on requests that create localrepository
-                # instances instead of every request.
-                gc.collect()
+                # collected through normal refcounting.
+                # In some cases, the resulting memory consumption can
+                # be tamed by performing explicit garbage collections.
+                # In presence of actual leaks or big long-lived caches, the
+                # impact on performance of such collections can become a
+                # problem, hence the rate shouldn't be set too low.
+                # See "Collecting the oldest generation" in
+                # https://devguide.python.org/garbage_collector
+                # for more about such trade-offs.
+                rate = self.gc_full_collect_rate
+
+                # this is not thread safe, but the consequence (skipping
+                # a garbage collection) is arguably better than risking
+                # to have several threads perform a collection in parallel
+                # (long useless wait on all threads).
+                self.requests_count += 1
+                if rate > 0 and self.requests_count % rate == 0:
+                    gc.collect()
+                    self.gc_full_collections_done += 1
+                else:
+                    gc.collect(generation=1)
 
     def _runwsgi(self, req, res):
         try:
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tests/test-hgwebdir-gc.py	Tue Jul 20 17:20:19 2021 +0200
@@ -0,0 +1,49 @@
+from __future__ import absolute_import
+
+import os
+from mercurial.hgweb import hgwebdir_mod
+
+hgwebdir = hgwebdir_mod.hgwebdir
+
+os.mkdir(b'webdir')
+os.chdir(b'webdir')
+
+webdir = os.path.realpath(b'.')
+
+
+def trivial_response(req, res):
+    return []
+
+
+def make_hgwebdir(gc_rate=None):
+    config = os.path.join(webdir, b'hgwebdir.conf')
+    with open(config, 'wb') as configfile:
+        configfile.write(b'[experimental]\n')
+        if gc_rate is not None:
+            configfile.write(b'web.full-garbage-collection-rate=%d\n' % gc_rate)
+    hg_wd = hgwebdir(config)
+    hg_wd._runwsgi = trivial_response
+    return hg_wd
+
+
+def process_requests(webdir_instance, number):
+    # we don't care for now about passing realistic arguments
+    for _ in range(number):
+        for chunk in webdir_instance.run_wsgi(None, None):
+            pass
+
+
+without_gc = make_hgwebdir(gc_rate=0)
+process_requests(without_gc, 5)
+assert without_gc.requests_count == 5
+assert without_gc.gc_full_collections_done == 0
+
+with_gc = make_hgwebdir(gc_rate=2)
+process_requests(with_gc, 5)
+assert with_gc.requests_count == 5
+assert with_gc.gc_full_collections_done == 2
+
+with_systematic_gc = make_hgwebdir()  # default value of the setting
+process_requests(with_systematic_gc, 3)
+assert with_systematic_gc.requests_count == 3
+assert with_systematic_gc.gc_full_collections_done == 3