Mercurial > hg
view tests/test-debugcommands.t @ 40326:fed697fa1734
sqlitestore: file storage backend using SQLite
This commit provides an extension which uses SQLite to store file
data (as opposed to revlogs).
As the inline documentation describes, there are still several
aspects to the extension that are incomplete. But it's a start.
The extension does support basic clone, checkout, and commit
workflows, which makes it suitable for simple use cases.
One notable missing feature is support for "bundlerepos." This is
probably responsible for the most test failures when the extension
is activated as part of the test suite.
All revision data is stored in SQLite. Data is stored as zstd
compressed chunks (default if zstd is available), zlib compressed
chunks (default if zstd is not available), or raw chunks (if
configured or if a compressed delta is not smaller than the raw
delta). This makes things very similar to revlogs.
Unlike revlogs, the extension doesn't yet enforce a limit on delta
chain length. This is an obvious limitation and should be addressed.
This is somewhat mitigated by the use of zstd, which is much faster
than zlib to decompress.
There is a dedicated table for storing deltas. Deltas are stored
by the SHA-1 hash of their uncompressed content. The "fileindex" table
has columns that reference the delta for each revision and the base
delta that delta should be applied against. A recursive SQL query
is used to resolve the delta chain along with the delta data.
By storing deltas by hash, we are able to de-duplicate delta storage!
With revlogs, the same deltas in different revlogs would result in
duplicate storage of that delta. In this scheme, inserting the
duplicate delta is a no-op and delta chains simply reference the
existing delta.
When initially implementing this extension, I did not have
content-indexed deltas and deltas could be duplicated across files
(just like revlogs). When I implemented content-indexed deltas, the
size of the SQLite database for a full clone of mozilla-unified
dropped:
before: 2,554,261,504 bytes
after: 2,488,754,176 bytes
Surprisingly, this is still larger than the bytes size of revlog
files:
revlog files: 2,104,861,230 bytes
du -b: 2,254,381,614
I would have expected storage to be smaller since we're not limiting
delta chain length and since we're using zstd instead of zlib. I
suspect the SQLite indexes and per-column overhead account for the
bulk of the differences. (Keep in mind that revlog uses a 64-byte
packed struct for revision index data and deltas are stored without
padding. Aside from the 12 unused bytes in the 32 byte node field,
revlogs are pretty efficient.) Another source of overhead is file
name storage. With revlogs, file names are stored in the filesystem.
But with SQLite, we need to store file names in the database. This is
roughly equivalent to the size of the fncache file, which for the
mozilla-unified repository is ~34MB.
Since the SQLite database isn't append-only and since delta chains
can reference any delta, this opens some interesting possibilities.
For example, we could store deltas in reverse, such that fulltexts
are stored for newer revisions and deltas are applied to reconstruct
older revisions. This is likely a more optimal storage strategy for
version control, as new data tends to be more frequently accessed
than old data. We would obviously need wire protocol support for
transferring revision data from newest to oldest. And we would
probably need some kind of mechanism for "re-encoding" stores. But
it should be doable.
This extension is very much experimental quality. There are a handful
of features that don't work. It probably isn't suitable for day-to-day
use. But it could be used in limited cases (e.g. read-only checkouts
like in CI). And it is also a good proving ground for alternate
storage backends. As we continue to define interfaces for all things
storage, it will be useful to have a viable alternate storage backend
to see how things shake out in practice.
test-storage.py passes on Python 2 and introduces no new test failures on
Python 3. Having the storage-level unit tests has proved to be insanely
useful when developing this extension. Those tests caught numerous bugs
during development and I'm convinced this style of testing is the way
forward for ensuring alternate storage backends work as intended. Of
course, test coverage isn't close to what it needs to be. But it is
a start. And what coverage we have gives me confidence that basic store
functionality is implemented properly.
Differential Revision: https://phab.mercurial-scm.org/D4928
author | Gregory Szorc <gregory.szorc@gmail.com> |
---|---|
date | Tue, 09 Oct 2018 08:50:13 -0700 |
parents | 8cf459d8b111 |
children | 4f37af86d5d5 |
line wrap: on
line source
$ cat << EOF >> $HGRCPATH > [ui] > interactive=yes > EOF $ hg init debugrevlog $ cd debugrevlog $ echo a > a $ hg ci -Am adda adding a $ hg rm . removing a $ hg ci -Am make-it-empty $ hg revert --all -r 0 adding a $ hg ci -Am make-it-full #if reporevlogstore $ hg debugrevlog -c format : 1 flags : inline revisions : 3 merges : 0 ( 0.00%) normal : 3 (100.00%) revisions : 3 empty : 0 ( 0.00%) text : 0 (100.00%) delta : 0 (100.00%) snapshot : 3 (100.00%) lvl-0 : 3 (100.00%) deltas : 0 ( 0.00%) revision size : 191 snapshot : 191 (100.00%) lvl-0 : 191 (100.00%) deltas : 0 ( 0.00%) chunks : 3 0x75 (u) : 3 (100.00%) chunks size : 191 0x75 (u) : 191 (100.00%) avg chain length : 0 max chain length : 0 max chain reach : 67 compression ratio : 0 uncompressed data size (min/max/avg) : 57 / 66 / 62 full revision size (min/max/avg) : 58 / 67 / 63 inter-snapshot size (min/max/avg) : 0 / 0 / 0 delta size (min/max/avg) : 0 / 0 / 0 $ hg debugrevlog -m format : 1 flags : inline, generaldelta revisions : 3 merges : 0 ( 0.00%) normal : 3 (100.00%) revisions : 3 empty : 1 (33.33%) text : 1 (100.00%) delta : 0 ( 0.00%) snapshot : 2 (66.67%) lvl-0 : 2 (66.67%) deltas : 0 ( 0.00%) revision size : 88 snapshot : 88 (100.00%) lvl-0 : 88 (100.00%) deltas : 0 ( 0.00%) chunks : 3 empty : 1 (33.33%) 0x75 (u) : 2 (66.67%) chunks size : 88 empty : 0 ( 0.00%) 0x75 (u) : 88 (100.00%) avg chain length : 0 max chain length : 0 max chain reach : 44 compression ratio : 0 uncompressed data size (min/max/avg) : 0 / 43 / 28 full revision size (min/max/avg) : 44 / 44 / 44 inter-snapshot size (min/max/avg) : 0 / 0 / 0 delta size (min/max/avg) : 0 / 0 / 0 $ hg debugrevlog a format : 1 flags : inline, generaldelta revisions : 1 merges : 0 ( 0.00%) normal : 1 (100.00%) revisions : 1 empty : 0 ( 0.00%) text : 0 (100.00%) delta : 0 (100.00%) snapshot : 1 (100.00%) lvl-0 : 1 (100.00%) deltas : 0 ( 0.00%) revision size : 3 snapshot : 3 (100.00%) lvl-0 : 3 (100.00%) deltas : 0 ( 0.00%) chunks : 1 0x75 (u) : 1 (100.00%) chunks size : 3 0x75 (u) : 3 (100.00%) avg chain length : 0 max chain length : 0 max chain reach : 3 compression ratio : 0 uncompressed data size (min/max/avg) : 2 / 2 / 2 full revision size (min/max/avg) : 3 / 3 / 3 inter-snapshot size (min/max/avg) : 0 / 0 / 0 delta size (min/max/avg) : 0 / 0 / 0 #endif Test debugindex, with and without the --verbose/--debug flag $ hg debugrevlogindex a rev linkrev nodeid p1 p2 0 0 b789fdd96dc2 000000000000 000000000000 #if no-reposimplestore $ hg --verbose debugrevlogindex a rev offset length linkrev nodeid p1 p2 0 0 3 0 b789fdd96dc2 000000000000 000000000000 $ hg --debug debugrevlogindex a rev offset length linkrev nodeid p1 p2 0 0 3 0 b789fdd96dc2f3bd229c1dd8eedf0fc60e2b68e3 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 #endif $ hg debugrevlogindex -f 1 a rev flag size link p1 p2 nodeid 0 0000 2 0 -1 -1 b789fdd96dc2 #if no-reposimplestore $ hg --verbose debugrevlogindex -f 1 a rev flag offset length size link p1 p2 nodeid 0 0000 0 3 2 0 -1 -1 b789fdd96dc2 $ hg --debug debugrevlogindex -f 1 a rev flag offset length size link p1 p2 nodeid 0 0000 0 3 2 0 -1 -1 b789fdd96dc2f3bd229c1dd8eedf0fc60e2b68e3 #endif $ hg debugindex -c rev linkrev nodeid p1 p2 0 0 07f494440405 000000000000 000000000000 1 1 8cccb4b5fec2 07f494440405 000000000000 2 2 b1e228c512c5 8cccb4b5fec2 000000000000 $ hg debugindex -c --debug rev linkrev nodeid p1 p2 0 0 07f4944404050f47db2e5c5071e0e84e7a27bba9 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 1 1 8cccb4b5fec20cafeb99dd01c26d4dee8ea4388a 07f4944404050f47db2e5c5071e0e84e7a27bba9 0000000000000000000000000000000000000000 2 2 b1e228c512c5d7066d70562ed839c3323a62d6d2 8cccb4b5fec20cafeb99dd01c26d4dee8ea4388a 0000000000000000000000000000000000000000 $ hg debugindex -m rev linkrev nodeid p1 p2 0 0 a0c8bcbbb45c 000000000000 000000000000 1 1 57faf8a737ae a0c8bcbbb45c 000000000000 2 2 a35b10320954 57faf8a737ae 000000000000 $ hg debugindex -m --debug rev linkrev nodeid p1 p2 0 0 a0c8bcbbb45c63b90b70ad007bf38961f64f2af0 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 1 1 57faf8a737ae7faf490582941a82319ba6529dca a0c8bcbbb45c63b90b70ad007bf38961f64f2af0 0000000000000000000000000000000000000000 2 2 a35b103209548032201c16c7688cb2657f037a38 57faf8a737ae7faf490582941a82319ba6529dca 0000000000000000000000000000000000000000 $ hg debugindex a rev linkrev nodeid p1 p2 0 0 b789fdd96dc2 000000000000 000000000000 $ hg debugindex --debug a rev linkrev nodeid p1 p2 0 0 b789fdd96dc2f3bd229c1dd8eedf0fc60e2b68e3 0000000000000000000000000000000000000000 0000000000000000000000000000000000000000 debugdelta chain basic output #if reporevlogstore $ hg debugindexstats node trie capacity: 4 node trie count: 2 node trie depth: 1 node trie last rev scanned: -1 node trie lookups: 4 node trie misses: 1 node trie splits: 1 revs in memory: 3 $ hg debugdeltachain -m rev chain# chainlen prev delta size rawsize chainsize ratio lindist extradist extraratio 0 1 1 -1 base 44 43 44 1.02326 44 0 0.00000 1 2 1 -1 base 0 0 0 0.00000 0 0 0.00000 2 3 1 -1 base 44 43 44 1.02326 44 0 0.00000 $ hg debugdeltachain -m -T '{rev} {chainid} {chainlen}\n' 0 1 1 1 2 1 2 3 1 $ hg debugdeltachain -m -Tjson [ { "chainid": 1, "chainlen": 1, "chainratio": 1.02325581395, (no-py3 !) "chainratio": 1.0232558139534884, (py3 !) "chainsize": 44, "compsize": 44, "deltatype": "base", "extradist": 0, "extraratio": 0.0, "lindist": 44, "prevrev": -1, "rev": 0, "uncompsize": 43 }, { "chainid": 2, "chainlen": 1, "chainratio": 0, "chainsize": 0, "compsize": 0, "deltatype": "base", "extradist": 0, "extraratio": 0, "lindist": 0, "prevrev": -1, "rev": 1, "uncompsize": 0 }, { "chainid": 3, "chainlen": 1, "chainratio": 1.02325581395, (no-py3 !) "chainratio": 1.0232558139534884, (py3 !) "chainsize": 44, "compsize": 44, "deltatype": "base", "extradist": 0, "extraratio": 0.0, "lindist": 44, "prevrev": -1, "rev": 2, "uncompsize": 43 } ] debugdelta chain with sparse read enabled $ cat >> $HGRCPATH <<EOF > [experimental] > sparse-read = True > EOF $ hg debugdeltachain -m rev chain# chainlen prev delta size rawsize chainsize ratio lindist extradist extraratio readsize largestblk rddensity srchunks 0 1 1 -1 base 44 43 44 1.02326 44 0 0.00000 44 44 1.00000 1 1 2 1 -1 base 0 0 0 0.00000 0 0 0.00000 0 0 1.00000 1 2 3 1 -1 base 44 43 44 1.02326 44 0 0.00000 44 44 1.00000 1 $ hg debugdeltachain -m -T '{rev} {chainid} {chainlen} {readsize} {largestblock} {readdensity}\n' 0 1 1 44 44 1.0 1 2 1 0 0 1 2 3 1 44 44 1.0 $ hg debugdeltachain -m -Tjson [ { "chainid": 1, "chainlen": 1, "chainratio": 1.02325581395, (no-py3 !) "chainratio": 1.0232558139534884, (py3 !) "chainsize": 44, "compsize": 44, "deltatype": "base", "extradist": 0, "extraratio": 0.0, "largestblock": 44, "lindist": 44, "prevrev": -1, "readdensity": 1.0, "readsize": 44, "rev": 0, "srchunks": 1, "uncompsize": 43 }, { "chainid": 2, "chainlen": 1, "chainratio": 0, "chainsize": 0, "compsize": 0, "deltatype": "base", "extradist": 0, "extraratio": 0, "largestblock": 0, "lindist": 0, "prevrev": -1, "readdensity": 1, "readsize": 0, "rev": 1, "srchunks": 1, "uncompsize": 0 }, { "chainid": 3, "chainlen": 1, "chainratio": 1.02325581395, (no-py3 !) "chainratio": 1.0232558139534884, (py3 !) "chainsize": 44, "compsize": 44, "deltatype": "base", "extradist": 0, "extraratio": 0.0, "largestblock": 44, "lindist": 44, "prevrev": -1, "readdensity": 1.0, "readsize": 44, "rev": 2, "srchunks": 1, "uncompsize": 43 } ] $ printf "This test checks things.\n" >> a $ hg ci -m a $ hg branch other marked working directory as branch other (branches are permanent and global, did you want a bookmark?) $ for i in `$TESTDIR/seq.py 5`; do > printf "shorter ${i}" >> a > hg ci -m "a other:$i" > hg up -q default > printf "for the branch default we want longer chains: ${i}" >> a > hg ci -m "a default:$i" > hg up -q other > done $ hg debugdeltachain a -T '{rev} {srchunks}\n' \ > --config experimental.sparse-read.density-threshold=0.50 \ > --config experimental.sparse-read.min-gap-size=0 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 2 11 1 $ hg --config extensions.strip= strip --no-backup -r 1 1 files updated, 0 files merged, 0 files removed, 0 files unresolved Test max chain len $ cat >> $HGRCPATH << EOF > [format] > maxchainlen=4 > EOF $ printf "This test checks if maxchainlen config value is respected also it can serve as basic test for debugrevlog -d <file>.\n" >> a $ hg ci -m a $ printf "b\n" >> a $ hg ci -m a $ printf "c\n" >> a $ hg ci -m a $ printf "d\n" >> a $ hg ci -m a $ printf "e\n" >> a $ hg ci -m a $ printf "f\n" >> a $ hg ci -m a $ printf 'g\n' >> a $ hg ci -m a $ printf 'h\n' >> a $ hg ci -m a $ hg debugrevlog -d a # rev p1rev p2rev start end deltastart base p1 p2 rawsize totalsize compression heads chainlen 0 -1 -1 0 ??? 0 0 0 0 ??? ???? ? 1 0 (glob) 1 0 -1 ??? ??? 0 0 0 0 ??? ???? ? 1 1 (glob) 2 1 -1 ??? ??? ??? ??? ??? 0 ??? ???? ? 1 2 (glob) 3 2 -1 ??? ??? ??? ??? ??? 0 ??? ???? ? 1 3 (glob) 4 3 -1 ??? ??? ??? ??? ??? 0 ??? ???? ? 1 4 (glob) 5 4 -1 ??? ??? ??? ??? ??? 0 ??? ???? ? 1 0 (glob) 6 5 -1 ??? ??? ??? ??? ??? 0 ??? ???? ? 1 1 (glob) 7 6 -1 ??? ??? ??? ??? ??? 0 ??? ???? ? 1 2 (glob) 8 7 -1 ??? ??? ??? ??? ??? 0 ??? ???? ? 1 3 (glob) #endif Test debuglocks command: $ hg debuglocks lock: free wlock: free * Test setting the lock waitlock <file> will wait for file to be created. If it isn't in a reasonable amount of time, displays error message and returns 1 $ waitlock() { > start=`date +%s` > timeout=5 > while [ \( ! -f $1 \) -a \( ! -L $1 \) ]; do > now=`date +%s` > if [ "`expr $now - $start`" -gt $timeout ]; then > echo "timeout: $1 was not created in $timeout seconds" > return 1 > fi > sleep 0.1 > done > } $ dolock() { > { > waitlock .hg/unlock > rm -f .hg/unlock > echo y > } | hg debuglocks "$@" > /dev/null > } $ dolock -s & $ waitlock .hg/store/lock $ hg debuglocks lock: user *, process * (*s) (glob) wlock: free [1] $ touch .hg/unlock $ wait $ [ -f .hg/store/lock ] || echo "There is no lock" There is no lock * Test setting the wlock $ dolock -S & $ waitlock .hg/wlock $ hg debuglocks lock: free wlock: user *, process * (*s) (glob) [1] $ touch .hg/unlock $ wait $ [ -f .hg/wlock ] || echo "There is no wlock" There is no wlock * Test setting both locks $ dolock -Ss & $ waitlock .hg/wlock && waitlock .hg/store/lock $ hg debuglocks lock: user *, process * (*s) (glob) wlock: user *, process * (*s) (glob) [2] * Test failing to set a lock $ hg debuglocks -s abort: lock is already held [255] $ hg debuglocks -S abort: wlock is already held [255] $ touch .hg/unlock $ wait $ hg debuglocks lock: free wlock: free * Test forcing the lock $ dolock -s & $ waitlock .hg/store/lock $ hg debuglocks lock: user *, process * (*s) (glob) wlock: free [1] $ hg debuglocks -L $ hg debuglocks lock: free wlock: free $ touch .hg/unlock $ wait * Test forcing the wlock $ dolock -S & $ waitlock .hg/wlock $ hg debuglocks lock: free wlock: user *, process * (*s) (glob) [1] $ hg debuglocks -W $ hg debuglocks lock: free wlock: free $ touch .hg/unlock $ wait Test WdirUnsupported exception $ hg debugdata -c ffffffffffffffffffffffffffffffffffffffff abort: working directory revision cannot be specified [255] Test cache warming command $ rm -rf .hg/cache/ $ hg debugupdatecaches --debug updating the branch cache $ ls -r .hg/cache/* .hg/cache/rbc-revs-v1 .hg/cache/rbc-names-v1 .hg/cache/manifestfulltextcache (reporevlogstore !) .hg/cache/branch2-served Test debugcolor #if no-windows $ hg debugcolor --style --color always | egrep 'mode|style|log\.' color mode: 'ansi' available style: \x1b[0;33mlog.changeset\x1b[0m: \x1b[0;33myellow\x1b[0m (esc) #endif $ hg debugcolor --style --color never color mode: None available style: $ cd .. Test internal debugstacktrace command $ cat > debugstacktrace.py << EOF > from __future__ import absolute_import > from mercurial import ( > pycompat, > util, > ) > def f(): > util.debugstacktrace(f=pycompat.stdout) > g() > def g(): > util.dst(b'hello from g\\n', skip=1) > h() > def h(): > util.dst(b'hi ...\\nfrom h hidden in g', 1, depth=2) > f() > EOF $ "$PYTHON" debugstacktrace.py stacktrace at: debugstacktrace.py:14 in * (glob) debugstacktrace.py:7 in f hello from g at: debugstacktrace.py:14 in * (glob) debugstacktrace.py:8 in f hi ... from h hidden in g at: debugstacktrace.py:8 in f debugstacktrace.py:11 in g Test debugcapabilities command: $ hg debugcapabilities ./debugrevlog/ Main capabilities: branchmap $USUAL_BUNDLE2_CAPS$ getbundle known lookup pushkey unbundle Bundle2 capabilities: HG20 bookmarks changegroup 01 02 digests md5 sha1 sha512 error abort unsupportedcontent pushraced pushkey hgtagsfnodes listkeys phases heads pushkey remote-changegroup http https rev-branch-cache stream v2 Test debugpeer $ hg --config ui.ssh="\"$PYTHON\" \"$TESTDIR/dummyssh\"" debugpeer ssh://user@dummy/debugrevlog url: ssh://user@dummy/debugrevlog local: no pushable: yes $ hg --config ui.ssh="\"$PYTHON\" \"$TESTDIR/dummyssh\"" --debug debugpeer ssh://user@dummy/debugrevlog running "*" "*/tests/dummyssh" 'user@dummy' 'hg -R debugrevlog serve --stdio' (glob) (no-windows !) running "*" "*\tests/dummyssh" "user@dummy" "hg -R debugrevlog serve --stdio" (glob) (windows !) devel-peer-request: hello+between devel-peer-request: pairs: 81 bytes sending hello command sending between command remote: 427 remote: capabilities: batch branchmap $USUAL_BUNDLE2_CAPS$ changegroupsubset getbundle known lookup protocaps pushkey streamreqs=generaldelta,revlogv1 unbundle=HG10GZ,HG10BZ,HG10UN unbundlehash remote: 1 devel-peer-request: protocaps devel-peer-request: caps: * bytes (glob) sending protocaps command url: ssh://user@dummy/debugrevlog local: no pushable: yes