view tests/test-blackbox.t @ 40326:fed697fa1734

sqlitestore: file storage backend using SQLite This commit provides an extension which uses SQLite to store file data (as opposed to revlogs). As the inline documentation describes, there are still several aspects to the extension that are incomplete. But it's a start. The extension does support basic clone, checkout, and commit workflows, which makes it suitable for simple use cases. One notable missing feature is support for "bundlerepos." This is probably responsible for the most test failures when the extension is activated as part of the test suite. All revision data is stored in SQLite. Data is stored as zstd compressed chunks (default if zstd is available), zlib compressed chunks (default if zstd is not available), or raw chunks (if configured or if a compressed delta is not smaller than the raw delta). This makes things very similar to revlogs. Unlike revlogs, the extension doesn't yet enforce a limit on delta chain length. This is an obvious limitation and should be addressed. This is somewhat mitigated by the use of zstd, which is much faster than zlib to decompress. There is a dedicated table for storing deltas. Deltas are stored by the SHA-1 hash of their uncompressed content. The "fileindex" table has columns that reference the delta for each revision and the base delta that delta should be applied against. A recursive SQL query is used to resolve the delta chain along with the delta data. By storing deltas by hash, we are able to de-duplicate delta storage! With revlogs, the same deltas in different revlogs would result in duplicate storage of that delta. In this scheme, inserting the duplicate delta is a no-op and delta chains simply reference the existing delta. When initially implementing this extension, I did not have content-indexed deltas and deltas could be duplicated across files (just like revlogs). When I implemented content-indexed deltas, the size of the SQLite database for a full clone of mozilla-unified dropped: before: 2,554,261,504 bytes after: 2,488,754,176 bytes Surprisingly, this is still larger than the bytes size of revlog files: revlog files: 2,104,861,230 bytes du -b: 2,254,381,614 I would have expected storage to be smaller since we're not limiting delta chain length and since we're using zstd instead of zlib. I suspect the SQLite indexes and per-column overhead account for the bulk of the differences. (Keep in mind that revlog uses a 64-byte packed struct for revision index data and deltas are stored without padding. Aside from the 12 unused bytes in the 32 byte node field, revlogs are pretty efficient.) Another source of overhead is file name storage. With revlogs, file names are stored in the filesystem. But with SQLite, we need to store file names in the database. This is roughly equivalent to the size of the fncache file, which for the mozilla-unified repository is ~34MB. Since the SQLite database isn't append-only and since delta chains can reference any delta, this opens some interesting possibilities. For example, we could store deltas in reverse, such that fulltexts are stored for newer revisions and deltas are applied to reconstruct older revisions. This is likely a more optimal storage strategy for version control, as new data tends to be more frequently accessed than old data. We would obviously need wire protocol support for transferring revision data from newest to oldest. And we would probably need some kind of mechanism for "re-encoding" stores. But it should be doable. This extension is very much experimental quality. There are a handful of features that don't work. It probably isn't suitable for day-to-day use. But it could be used in limited cases (e.g. read-only checkouts like in CI). And it is also a good proving ground for alternate storage backends. As we continue to define interfaces for all things storage, it will be useful to have a viable alternate storage backend to see how things shake out in practice. test-storage.py passes on Python 2 and introduces no new test failures on Python 3. Having the storage-level unit tests has proved to be insanely useful when developing this extension. Those tests caught numerous bugs during development and I'm convinced this style of testing is the way forward for ensuring alternate storage backends work as intended. Of course, test coverage isn't close to what it needs to be. But it is a start. And what coverage we have gives me confidence that basic store functionality is implemented properly. Differential Revision: https://phab.mercurial-scm.org/D4928
author Gregory Szorc <gregory.szorc@gmail.com>
date Tue, 09 Oct 2018 08:50:13 -0700
parents 5abc47d4ca6b
children ef6cab7930b3
line wrap: on
line source

setup

  $ cat > myextension.py <<EOF
  > from mercurial import error, registrar
  > cmdtable = {}
  > command = registrar.command(cmdtable)
  > @command(b'crash', [], b'hg crash')
  > def crash(ui, *args, **kwargs):
  >     raise Exception("oops")
  > @command(b'abort', [], b'hg abort')
  > def abort(ui, *args, **kwargs):
  >     raise error.Abort(b"oops")
  > EOF
  $ abspath=`pwd`/myextension.py

  $ cat >> $HGRCPATH <<EOF
  > [extensions]
  > blackbox=
  > mock=$TESTDIR/mockblackbox.py
  > mq=
  > myextension=$TESTTMP/myextension.py
  > [alias]
  > confuse = log --limit 3
  > so-confusing = confuse --style compact
  > EOF

  $ hg init blackboxtest
  $ cd blackboxtest

command, exit codes, and duration

  $ echo a > a
  $ hg add a
  $ hg blackbox --config blackbox.dirty=True
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000 (5000)> init blackboxtest exited 0 after * seconds (glob)
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000 (5000)> add a
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000 (5000)> add a exited 0 after * seconds (glob)
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000+ (5000)> blackbox --config *blackbox.dirty=True* (glob)

failure exit code
  $ rm ./.hg/blackbox.log
  $ hg add non-existent
  non-existent: $ENOENT$
  [1]
  $ hg blackbox
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000 (5000)> add non-existent
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000 (5000)> add non-existent exited 1 after * seconds (glob)
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000 (5000)> blackbox

abort exit code
  $ rm ./.hg/blackbox.log
  $ hg abort 2> /dev/null
  [255]
  $ hg blackbox -l 2
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000 (5000)> abort exited 255 after * seconds (glob)
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000 (5000)> blackbox -l 2

unhandled exception
  $ rm ./.hg/blackbox.log
  $ hg crash 2> /dev/null
  [1]
  $ hg blackbox -l 2
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000 (5000)> crash exited 1 after * seconds (glob)
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000 (5000)> blackbox -l 2

alias expansion is logged
  $ rm ./.hg/blackbox.log
  $ hg confuse
  $ hg blackbox
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000 (5000)> confuse
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000 (5000)> alias 'confuse' expands to 'log --limit 3'
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000 (5000)> confuse exited 0 after * seconds (glob)
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000 (5000)> blackbox

recursive aliases work correctly
  $ rm ./.hg/blackbox.log
  $ hg so-confusing
  $ hg blackbox
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000 (5000)> so-confusing
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000 (5000)> alias 'so-confusing' expands to 'confuse --style compact'
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000 (5000)> alias 'confuse' expands to 'log --limit 3'
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000 (5000)> so-confusing exited 0 after * seconds (glob)
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000 (5000)> blackbox

incoming change tracking

create two heads to verify that we only see one change in the log later
  $ hg commit -ma
  $ hg up null
  0 files updated, 0 files merged, 1 files removed, 0 files unresolved
  $ echo b > b
  $ hg commit -Amb
  adding b
  created new head

clone, commit, pull
  $ hg clone . ../blackboxtest2
  updating to branch default
  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
  $ echo c > c
  $ hg commit -Amc
  adding c
  $ cd ../blackboxtest2
  $ hg pull
  pulling from $TESTTMP/blackboxtest
  searching for changes
  adding changesets
  adding manifests
  adding file changes
  added 1 changesets with 1 changes to 1 files
  new changesets d02f48003e62
  (run 'hg update' to get a working copy)
  $ hg blackbox -l 6
  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> pull
  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> updated served branch cache in * seconds (glob)
  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> wrote served branch cache with 1 labels and 2 nodes
  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> 1 incoming changes - new heads: d02f48003e62
  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> pull exited 0 after * seconds (glob)
  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> blackbox -l 6

we must not cause a failure if we cannot write to the log

  $ hg rollback
  repository tip rolled back to revision 1 (undo pull)

  $ mv .hg/blackbox.log .hg/blackbox.log-
  $ mkdir .hg/blackbox.log
  $ hg --debug incoming
  warning: cannot write to blackbox.log: * (glob)
  comparing with $TESTTMP/blackboxtest
  query 1; heads
  searching for changes
  all local heads known remotely
  changeset:   2:d02f48003e62c24e2659d97d30f2a83abe5d5d51
  tag:         tip
  phase:       draft
  parent:      1:6563da9dcf87b1949716e38ff3e3dfaa3198eb06
  parent:      -1:0000000000000000000000000000000000000000
  manifest:    2:ab9d46b053ebf45b7996f2922b9893ff4b63d892
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  files+:      c
  extra:       branch=default
  description:
  c
  
  
  $ hg pull
  pulling from $TESTTMP/blackboxtest
  searching for changes
  adding changesets
  adding manifests
  adding file changes
  added 1 changesets with 1 changes to 1 files
  new changesets d02f48003e62
  (run 'hg update' to get a working copy)

a failure reading from the log is fatal

  $ hg blackbox -l 3
  abort: *$TESTTMP/blackboxtest2/.hg/blackbox.log* (glob)
  [255]

  $ rmdir .hg/blackbox.log
  $ mv .hg/blackbox.log- .hg/blackbox.log

backup bundles get logged

  $ touch d
  $ hg commit -Amd
  adding d
  created new head
  $ hg strip tip
  0 files updated, 0 files merged, 1 files removed, 0 files unresolved
  saved backup bundle to $TESTTMP/blackboxtest2/.hg/strip-backup/*-backup.hg (glob)
  $ hg blackbox -l 6
  1970/01/01 00:00:00 bob @73f6ee326b27d820b0472f1a825e3a50f3dc489b (5000)> strip tip
  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> saved backup bundle to $TESTTMP/blackboxtest2/.hg/strip-backup/73f6ee326b27-7612e004-backup.hg
  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> updated base branch cache in * seconds (glob)
  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> wrote base branch cache with 1 labels and 2 nodes
  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> strip tip exited 0 after * seconds (glob)
  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> blackbox -l 6

extension and python hooks - use the eol extension for a pythonhook

  $ echo '[extensions]' >> .hg/hgrc
  $ echo 'eol=' >> .hg/hgrc
  $ echo '[hooks]' >> .hg/hgrc
  $ echo 'update = echo hooked' >> .hg/hgrc
  $ hg update
  The fsmonitor extension is incompatible with the eol extension and has been disabled. (fsmonitor !)
  hooked
  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
  updated to "d02f48003e62: c"
  1 other heads for branch "default"
  $ cat >> .hg/hgrc <<EOF
  > [extensions]
  > # disable eol, because it is not needed for subsequent tests
  > # (in addition, keeping it requires extra care for fsmonitor)
  > eol=!
  > EOF
  $ hg blackbox -l 5
  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> update (no-chg !)
  1970/01/01 00:00:00 bob @6563da9dcf87b1949716e38ff3e3dfaa3198eb06 (5000)> pythonhook-preupdate: hgext.eol.preupdate finished in * seconds (glob)
  1970/01/01 00:00:00 bob @d02f48003e62c24e2659d97d30f2a83abe5d5d51 (5000)> exthook-update: echo hooked finished in * seconds (glob)
  1970/01/01 00:00:00 bob @d02f48003e62c24e2659d97d30f2a83abe5d5d51 (5000)> update exited 0 after * seconds (glob)
  1970/01/01 00:00:00 bob @d02f48003e62c24e2659d97d30f2a83abe5d5d51 (5000)> serve --cmdserver chgunix --address $TESTTMP.chgsock/server.* --daemon-postexec 'chdir:/' (glob) (chg !)
  1970/01/01 00:00:00 bob @d02f48003e62c24e2659d97d30f2a83abe5d5d51 (5000)> blackbox -l 5

log rotation

  $ echo '[blackbox]' >> .hg/hgrc
  $ echo 'maxsize = 20 b' >> .hg/hgrc
  $ echo 'maxfiles = 3' >> .hg/hgrc
  $ hg status
  $ hg status
  $ hg status
  $ hg tip -q
  2:d02f48003e62
  $ ls .hg/blackbox.log*
  .hg/blackbox.log
  .hg/blackbox.log.1
  .hg/blackbox.log.2
  $ cd ..

  $ hg init blackboxtest3
  $ cd blackboxtest3
  $ hg blackbox
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000 (5000)> init blackboxtest3 exited 0 after * seconds (glob)
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000 (5000)> blackbox
  $ mv .hg/blackbox.log .hg/blackbox.log-
  $ mkdir .hg/blackbox.log
  $ sed -e 's/\(.*test1.*\)/#\1/; s#\(.*commit2.*\)#os.rmdir(".hg/blackbox.log")\
  > os.rename(".hg/blackbox.log-", ".hg/blackbox.log")\
  > \1#' $TESTDIR/test-dispatch.py > ../test-dispatch.py
  $ "$PYTHON" $TESTDIR/blackbox-readonly-dispatch.py
  running: --debug add foo
  warning: cannot write to blackbox.log: Is a directory (no-windows !)
  warning: cannot write to blackbox.log: $TESTTMP/blackboxtest3/.hg/blackbox.log: Access is denied (windows !)
  adding foo
  result: 0
  running: --debug commit -m commit1 -d 2000-01-01 foo
  warning: cannot write to blackbox.log: Is a directory (no-windows !)
  warning: cannot write to blackbox.log: $TESTTMP/blackboxtest3/.hg/blackbox.log: Access is denied (windows !)
  committing files:
  foo
  committing manifest
  committing changelog
  updating the branch cache
  committed changeset 0:0e46349438790c460c5c9f7546bfcd39b267bbd2
  result: 0
  running: --debug commit -m commit2 -d 2000-01-02 foo
  committing files:
  foo
  committing manifest
  committing changelog
  updating the branch cache
  committed changeset 1:45589e459b2edfbf3dbde7e01f611d2c1e7453d7
  result: 0
  running: --debug log -r 0
  changeset:   0:0e46349438790c460c5c9f7546bfcd39b267bbd2
  phase:       draft
  parent:      -1:0000000000000000000000000000000000000000
  parent:      -1:0000000000000000000000000000000000000000
  manifest:    0:9091aa5df980aea60860a2e39c95182e68d1ddec
  user:        test
  date:        Sat Jan 01 00:00:00 2000 +0000
  files+:      foo
  extra:       branch=default
  description:
  commit1
  
  
  result: 0
  running: --debug log -r tip
  changeset:   1:45589e459b2edfbf3dbde7e01f611d2c1e7453d7
  tag:         tip
  phase:       draft
  parent:      0:0e46349438790c460c5c9f7546bfcd39b267bbd2
  parent:      -1:0000000000000000000000000000000000000000
  manifest:    1:895aa9b7886f89dd017a6d62524e1f9180b04df9
  user:        test
  date:        Sun Jan 02 00:00:00 2000 +0000
  files:       foo
  extra:       branch=default
  description:
  commit2
  
  
  result: 0
  $ hg blackbox
  1970/01/01 00:00:00 bob @45589e459b2edfbf3dbde7e01f611d2c1e7453d7 (5000)> updating the branch cache
  1970/01/01 00:00:00 bob @45589e459b2edfbf3dbde7e01f611d2c1e7453d7 (5000)> updated served branch cache in * seconds (glob)
  1970/01/01 00:00:00 bob @45589e459b2edfbf3dbde7e01f611d2c1e7453d7 (5000)> wrote served branch cache with 1 labels and 1 nodes
  1970/01/01 00:00:00 bob @45589e459b2edfbf3dbde7e01f611d2c1e7453d7 (5000)> --debug commit -m commit2 -d 2000-01-02 foo exited 0 after *.?? seconds (glob)
  1970/01/01 00:00:00 bob @45589e459b2edfbf3dbde7e01f611d2c1e7453d7 (5000)> --debug log -r 0
  1970/01/01 00:00:00 bob @45589e459b2edfbf3dbde7e01f611d2c1e7453d7 (5000)> writing .hg/cache/tags2-visible with 0 tags
  1970/01/01 00:00:00 bob @45589e459b2edfbf3dbde7e01f611d2c1e7453d7 (5000)> --debug log -r 0 exited 0 after *.?? seconds (glob)
  1970/01/01 00:00:00 bob @45589e459b2edfbf3dbde7e01f611d2c1e7453d7 (5000)> --debug log -r tip
  1970/01/01 00:00:00 bob @45589e459b2edfbf3dbde7e01f611d2c1e7453d7 (5000)> --debug log -r tip exited 0 after *.?? seconds (glob)
  1970/01/01 00:00:00 bob @45589e459b2edfbf3dbde7e01f611d2c1e7453d7 (5000)> blackbox

Test log recursion from dirty status check

  $ cat > ../r.py <<EOF
  > from mercurial import context, error, extensions
  > x=[False]
  > def status(orig, *args, **opts):
  >     args[0].repo().ui.log(b"broken", b"recursion?")
  >     return orig(*args, **opts)
  > def reposetup(ui, repo):
  >     extensions.wrapfunction(context.basectx, 'status', status)
  > EOF
  $ hg id --config extensions.x=../r.py --config blackbox.dirty=True
  45589e459b2e tip

cleanup
  $ cd ..

#if chg

when using chg, blackbox.log should get rotated correctly

  $ cat > $TESTTMP/noop.py << EOF
  > from __future__ import absolute_import
  > import time
  > from mercurial import registrar, scmutil
  > cmdtable = {}
  > command = registrar.command(cmdtable)
  > @command('noop')
  > def noop(ui, repo):
  >     pass
  > EOF

  $ hg init blackbox-chg
  $ cd blackbox-chg

  $ cat > .hg/hgrc << EOF
  > [blackbox]
  > maxsize = 500B
  > [extensions]
  > # extension change forces chg to restart
  > noop=$TESTTMP/noop.py
  > EOF

  $ "$PYTHON" -c 'print("a" * 400)' > .hg/blackbox.log
  $ chg noop
  $ chg noop
  $ chg noop
  $ chg noop
  $ chg noop

  $ cat > showsize.py << 'EOF'
  > import os, sys
  > limit = 500
  > for p in sys.argv[1:]:
  >     size = os.stat(p).st_size
  >     if size >= limit:
  >         desc = '>='
  >     else:
  >         desc = '<'
  >     print('%s: %s %d' % (p, desc, limit))
  > EOF

  $ "$PYTHON" showsize.py .hg/blackbox*
  .hg/blackbox.log: < 500
  .hg/blackbox.log.1: >= 500
  .hg/blackbox.log.2: >= 500

  $ cd ..

With chg, blackbox should not create the log file if the repo is gone

  $ hg init repo1
  $ hg --config extensions.a=! -R repo1 log
  $ rm -rf $TESTTMP/repo1
  $ hg --config extensions.a=! init repo1

#endif

blackbox should work if repo.ui.log is not called (issue5518)

  $ cat > $TESTTMP/raise.py << EOF
  > from __future__ import absolute_import
  > from mercurial import registrar, scmutil
  > cmdtable = {}
  > command = registrar.command(cmdtable)
  > @command(b'raise')
  > def raisecmd(*args):
  >     raise RuntimeError('raise')
  > EOF

  $ cat >> $HGRCPATH << EOF
  > [blackbox]
  > track = commandexception
  > [extensions]
  > raise=$TESTTMP/raise.py
  > EOF

  $ hg init $TESTTMP/blackbox-exception-only
  $ cd $TESTTMP/blackbox-exception-only

#if chg
 (chg exits 255 because it fails to receive an exit code)
  $ hg raise 2>/dev/null
  [255]
#else
 (hg exits 1 because Python default exit code for uncaught exception is 1)
  $ hg raise 2>/dev/null
  [1]
#endif

  $ head -1 .hg/blackbox.log
  1970/01/01 00:00:00 bob @0000000000000000000000000000000000000000 (5000)> ** Unknown exception encountered with possibly-broken third-party extension mock
  $ tail -2 .hg/blackbox.log
  RuntimeError: raise