comparison mercurial/hgweb/hgwebdir_mod.py @ 47802:de2e04fe4897

hgwebdir: avoid systematic full garbage collection Forcing a systematic full garbage collection upon each request can serioulsy harm performance. This is reported as https://bz.mercurial-scm.org/show_bug.cgi?id=6075 With this change we're performing the full collection according to a new setting, `experimental.web.full-garbage-collection-rate`. The default value is 1, which doesn't change the behavior and will allow us to test on real use cases. If the value is 0, no full garbage collection occurs. Regardless of the value of the setting, a partial garbage collection still occurs upon each request (not attempting to collect objects from the oldest generation). This should be enough to take care of reference cycles that have been created by the last request (assessment of this requires changing the setting, not to be 1). In my experience chasing memory leaks in Mercurial servers, the full collection never reclaimed any memory, but this is with Python 3 and biased towards small repositories. On the other hand, as explained in the Python developer docs [1], frequent full collections are very harmful in terms of performance if lots of objects survive the collection, and hence stay in the oldest generation. Note that `gc.collect()` is indeed trying to collect the oldest generation [2]. This happens usually in two cases: - unwanted lingering objects (i.e., an actual memory leak that the GC cannot do anything about). Sadly, we have lots of those these days. - desireable long-term objects, typically in caches (not inner caches carried by repositories, which should be collected with them). This is a subject of interest for the Heptapod project. In short, the flat rate that this change still permits is probably a bad idea in most cases, and the default value can be tweaked later on (or even be set to 0) according to experiments in the wild. The test is inspired from test-hgwebdir-paths.py [1] https://devguide.python.org/garbage_collector/#collecting-the-oldest-generation [2] https://docs.python.org/3/library/gc.html#gc.collect Differential Revision: https://phab.mercurial-scm.org/D11204
author Georges Racinet <georges.racinet@octobus.net>
date Tue, 20 Jul 2021 17:20:19 +0200
parents aceede7c4929
children 6000f5b25c9b
comparison
equal deleted inserted replaced
47795:b1e1559f5a45 47802:de2e04fe4897
283 self.baseui = baseui 283 self.baseui = baseui
284 self.ui = None 284 self.ui = None
285 self.lastrefresh = 0 285 self.lastrefresh = 0
286 self.motd = None 286 self.motd = None
287 self.refresh() 287 self.refresh()
288 self.requests_count = 0
288 if not baseui: 289 if not baseui:
289 # set up environment for new ui 290 # set up environment for new ui
290 extensions.loadall(self.ui) 291 extensions.loadall(self.ui)
291 extensions.populateui(self.ui) 292 extensions.populateui(self.ui)
292 293
339 name = name[len(prefix) :] 340 name = name[len(prefix) :]
340 repos.append((name.lstrip(b'/'), repo)) 341 repos.append((name.lstrip(b'/'), repo))
341 342
342 self.repos = repos 343 self.repos = repos
343 self.ui = u 344 self.ui = u
345 self.gc_full_collect_rate = self.ui.configint(
346 b'experimental', b'web.full-garbage-collection-rate'
347 )
348 self.gc_full_collections_done = 0
344 encoding.encoding = self.ui.config(b'web', b'encoding') 349 encoding.encoding = self.ui.config(b'web', b'encoding')
345 self.style = self.ui.config(b'web', b'style') 350 self.style = self.ui.config(b'web', b'style')
346 self.templatepath = self.ui.config( 351 self.templatepath = self.ui.config(
347 b'web', b'templates', untrusted=False 352 b'web', b'templates', untrusted=False
348 ) 353 )
381 for r in self._runwsgi(req, res): 386 for r in self._runwsgi(req, res):
382 yield r 387 yield r
383 finally: 388 finally:
384 # There are known cycles in localrepository that prevent 389 # There are known cycles in localrepository that prevent
385 # those objects (and tons of held references) from being 390 # those objects (and tons of held references) from being
386 # collected through normal refcounting. We mitigate those 391 # collected through normal refcounting.
387 # leaks by performing an explicit GC on every request. 392 # In some cases, the resulting memory consumption can
388 # TODO remove this once leaks are fixed. 393 # be tamed by performing explicit garbage collections.
389 # TODO only run this on requests that create localrepository 394 # In presence of actual leaks or big long-lived caches, the
390 # instances instead of every request. 395 # impact on performance of such collections can become a
391 gc.collect() 396 # problem, hence the rate shouldn't be set too low.
397 # See "Collecting the oldest generation" in
398 # https://devguide.python.org/garbage_collector
399 # for more about such trade-offs.
400 rate = self.gc_full_collect_rate
401
402 # this is not thread safe, but the consequence (skipping
403 # a garbage collection) is arguably better than risking
404 # to have several threads perform a collection in parallel
405 # (long useless wait on all threads).
406 self.requests_count += 1
407 if rate > 0 and self.requests_count % rate == 0:
408 gc.collect()
409 self.gc_full_collections_done += 1
410 else:
411 gc.collect(generation=1)
392 412
393 def _runwsgi(self, req, res): 413 def _runwsgi(self, req, res):
394 try: 414 try:
395 self.refresh() 415 self.refresh()
396 416