view tests/test-transaction-rollback-on-revlog-split.t @ 47802:de2e04fe4897

hgwebdir: avoid systematic full garbage collection Forcing a systematic full garbage collection upon each request can serioulsy harm performance. This is reported as https://bz.mercurial-scm.org/show_bug.cgi?id=6075 With this change we're performing the full collection according to a new setting, `experimental.web.full-garbage-collection-rate`. The default value is 1, which doesn't change the behavior and will allow us to test on real use cases. If the value is 0, no full garbage collection occurs. Regardless of the value of the setting, a partial garbage collection still occurs upon each request (not attempting to collect objects from the oldest generation). This should be enough to take care of reference cycles that have been created by the last request (assessment of this requires changing the setting, not to be 1). In my experience chasing memory leaks in Mercurial servers, the full collection never reclaimed any memory, but this is with Python 3 and biased towards small repositories. On the other hand, as explained in the Python developer docs [1], frequent full collections are very harmful in terms of performance if lots of objects survive the collection, and hence stay in the oldest generation. Note that `gc.collect()` is indeed trying to collect the oldest generation [2]. This happens usually in two cases: - unwanted lingering objects (i.e., an actual memory leak that the GC cannot do anything about). Sadly, we have lots of those these days. - desireable long-term objects, typically in caches (not inner caches carried by repositories, which should be collected with them). This is a subject of interest for the Heptapod project. In short, the flat rate that this change still permits is probably a bad idea in most cases, and the default value can be tweaked later on (or even be set to 0) according to experiments in the wild. The test is inspired from test-hgwebdir-paths.py [1] https://devguide.python.org/garbage_collector/#collecting-the-oldest-generation [2] https://docs.python.org/3/library/gc.html#gc.collect Differential Revision: https://phab.mercurial-scm.org/D11204
author Georges Racinet <georges.racinet@octobus.net>
date Tue, 20 Jul 2021 17:20:19 +0200
parents 8e9295912573
children 308e843f24b1
line wrap: on
line source

Test correctness of revlog inline -> non-inline transition
----------------------------------------------------------

Helper extension to intercept renames.

  $ cat > $TESTTMP/intercept_rename.py << EOF
  > import os
  > import sys
  > from mercurial import extensions, util
  > 
  > def extsetup(ui):
  >     def close(orig, *args, **kwargs):
  >         path = util.normpath(args[0]._atomictempfile__name)
  >         if path.endswith(b'/.hg/store/data/file.i'):
  >             os._exit(80)
  >         return orig(*args, **kwargs)
  >     extensions.wrapfunction(util.atomictempfile, 'close', close)
  > EOF

Test offset computation to correctly factor in the index entries themselve.
Also test that the new data size has the correct size if the transaction is aborted
after the index has been replaced.

Test repo has one small, one moderate and one big change. The clone has
the small and moderate change and will transition to non-inline storage when
adding the big change.

  $ hg init troffset-computation --config format.revlog-compression=none
  $ cd troffset-computation
  $ printf '%20d' '1' > file
  $ hg commit -Aqm_
  $ printf '%1024d' '1' > file
  $ hg commit -Aqm_
  $ dd if=/dev/zero of=file bs=1k count=128 > /dev/null 2>&1
  $ hg commit -Aqm_
  $ cd ..

  $ hg clone -r 1 troffset-computation troffset-computation-copy --config format.revlog-compression=none -q
  $ cd troffset-computation-copy

Reference size:

  $ f -s .hg/store/data/file*
  .hg/store/data/file.i: size=1174

  $ cat > .hg/hgrc <<EOF
  > [hooks]
  > pretxnchangegroup = python:$TESTDIR/helper-killhook.py:killme
  > EOF
#if chg
  $ hg pull ../troffset-computation
  pulling from ../troffset-computation
  [255]
#else
  $ hg pull ../troffset-computation
  pulling from ../troffset-computation
  [80]
#endif
  $ cat .hg/store/journal | tr -s '\000' ' ' | grep data/file | tail -1
  data/file.i 128

The first file.i entry should match the size above.
The first file.d entry is the temporary record during the split,
the second entry after the split happened. The sum of the second file.d
and the second file.i entry should match the first file.i entry.

  $ cat .hg/store/journal | tr -s '\000' ' ' | grep data/file
  data/file.i 1174
  data/file.d 0
  data/file.d 1046
  data/file.i 128
  $ hg recover
  rolling back interrupted transaction
  (verify step skipped, run `hg verify` to check your repository content)
  $ f -s .hg/store/data/file*
  .hg/store/data/file.d: size=1046
  .hg/store/data/file.i: size=128
  $ hg tip
  changeset:   1:3ce491143aec
  tag:         tip
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     _
  
  $ hg verify
  checking changesets
  checking manifests
  crosschecking files in changesets and manifests
  checking files
   warning: revlog 'data/file.d' not in fncache!
  checked 2 changesets with 2 changes to 1 files
  1 warnings encountered!
  hint: run "hg debugrebuildfncache" to recover from corrupt fncache
  $ cd ..


Now retry the procedure but intercept the rename of the index and check that
the journal does not contain the new index size. This demonstrates the edge case
where the data file is left as garbage.

  $ hg clone -r 1 troffset-computation troffset-computation-copy2 --config format.revlog-compression=none -q
  $ cd troffset-computation-copy2
  $ cat > .hg/hgrc <<EOF
  > [extensions]
  > intercept_rename = $TESTTMP/intercept_rename.py
  > [hooks]
  > pretxnchangegroup = python:$TESTDIR/helper-killhook.py:killme
  > EOF
#if chg
  $ hg pull ../troffset-computation
  pulling from ../troffset-computation
  [255]
#else
  $ hg pull ../troffset-computation
  pulling from ../troffset-computation
  [80]
#endif
  $ cat .hg/store/journal | tr -s '\000' ' ' | grep data/file
  data/file.i 1174
  data/file.d 0
  data/file.d 1046

  $ hg recover
  rolling back interrupted transaction
  (verify step skipped, run `hg verify` to check your repository content)
  $ f -s .hg/store/data/file*
  .hg/store/data/file.d: size=1046
  .hg/store/data/file.i: size=1174
  $ hg tip
  changeset:   1:3ce491143aec
  tag:         tip
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     _
  
  $ hg verify
  checking changesets
  checking manifests
  crosschecking files in changesets and manifests
  checking files
  checked 2 changesets with 2 changes to 1 files
  $ cd ..


Repeat the original test but let hg rollback the transaction.

  $ hg clone -r 1 troffset-computation troffset-computation-copy-rb --config format.revlog-compression=none -q
  $ cd troffset-computation-copy-rb
  $ cat > .hg/hgrc <<EOF
  > [hooks]
  > pretxnchangegroup = false
  > EOF
  $ hg pull ../troffset-computation
  pulling from ../troffset-computation
  searching for changes
  adding changesets
  adding manifests
  adding file changes
  transaction abort!
  rollback completed
  abort: pretxnchangegroup hook exited with status 1
  [40]
  $ f -s .hg/store/data/file*
  .hg/store/data/file.d: size=1046
  .hg/store/data/file.i: size=128
  $ hg tip
  changeset:   1:3ce491143aec
  tag:         tip
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     _
  
  $ hg verify
  checking changesets
  checking manifests
  crosschecking files in changesets and manifests
  checking files
   warning: revlog 'data/file.d' not in fncache!
  checked 2 changesets with 2 changes to 1 files
  1 warnings encountered!
  hint: run "hg debugrebuildfncache" to recover from corrupt fncache
  $ cd ..