annotate tests/test-clone-pull-corruption.t @ 40326:fed697fa1734

sqlitestore: file storage backend using SQLite This commit provides an extension which uses SQLite to store file data (as opposed to revlogs). As the inline documentation describes, there are still several aspects to the extension that are incomplete. But it's a start. The extension does support basic clone, checkout, and commit workflows, which makes it suitable for simple use cases. One notable missing feature is support for "bundlerepos." This is probably responsible for the most test failures when the extension is activated as part of the test suite. All revision data is stored in SQLite. Data is stored as zstd compressed chunks (default if zstd is available), zlib compressed chunks (default if zstd is not available), or raw chunks (if configured or if a compressed delta is not smaller than the raw delta). This makes things very similar to revlogs. Unlike revlogs, the extension doesn't yet enforce a limit on delta chain length. This is an obvious limitation and should be addressed. This is somewhat mitigated by the use of zstd, which is much faster than zlib to decompress. There is a dedicated table for storing deltas. Deltas are stored by the SHA-1 hash of their uncompressed content. The "fileindex" table has columns that reference the delta for each revision and the base delta that delta should be applied against. A recursive SQL query is used to resolve the delta chain along with the delta data. By storing deltas by hash, we are able to de-duplicate delta storage! With revlogs, the same deltas in different revlogs would result in duplicate storage of that delta. In this scheme, inserting the duplicate delta is a no-op and delta chains simply reference the existing delta. When initially implementing this extension, I did not have content-indexed deltas and deltas could be duplicated across files (just like revlogs). When I implemented content-indexed deltas, the size of the SQLite database for a full clone of mozilla-unified dropped: before: 2,554,261,504 bytes after: 2,488,754,176 bytes Surprisingly, this is still larger than the bytes size of revlog files: revlog files: 2,104,861,230 bytes du -b: 2,254,381,614 I would have expected storage to be smaller since we're not limiting delta chain length and since we're using zstd instead of zlib. I suspect the SQLite indexes and per-column overhead account for the bulk of the differences. (Keep in mind that revlog uses a 64-byte packed struct for revision index data and deltas are stored without padding. Aside from the 12 unused bytes in the 32 byte node field, revlogs are pretty efficient.) Another source of overhead is file name storage. With revlogs, file names are stored in the filesystem. But with SQLite, we need to store file names in the database. This is roughly equivalent to the size of the fncache file, which for the mozilla-unified repository is ~34MB. Since the SQLite database isn't append-only and since delta chains can reference any delta, this opens some interesting possibilities. For example, we could store deltas in reverse, such that fulltexts are stored for newer revisions and deltas are applied to reconstruct older revisions. This is likely a more optimal storage strategy for version control, as new data tends to be more frequently accessed than old data. We would obviously need wire protocol support for transferring revision data from newest to oldest. And we would probably need some kind of mechanism for "re-encoding" stores. But it should be doable. This extension is very much experimental quality. There are a handful of features that don't work. It probably isn't suitable for day-to-day use. But it could be used in limited cases (e.g. read-only checkouts like in CI). And it is also a good proving ground for alternate storage backends. As we continue to define interfaces for all things storage, it will be useful to have a viable alternate storage backend to see how things shake out in practice. test-storage.py passes on Python 2 and introduces no new test failures on Python 3. Having the storage-level unit tests has proved to be insanely useful when developing this extension. Those tests caught numerous bugs during development and I'm convinced this style of testing is the way forward for ensuring alternate storage backends work as intended. Of course, test coverage isn't close to what it needs to be. But it is a start. And what coverage we have gives me confidence that basic store functionality is implemented properly. Differential Revision: https://phab.mercurial-scm.org/D4928
author Gregory Szorc <gregory.szorc@gmail.com>
date Tue, 09 Oct 2018 08:50:13 -0700
parents f1186c292d03
children 2f2682f40ea0
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
12412
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
1 Corrupt an hg repo with a pull started during an aborted commit
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
2 Create two repos, so that one of them can pull from the other one.
1785
81ca1a9bd061 Added test cases for repo corruption fixed in 2e0a288ca93e (issue132)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff changeset
3
12412
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
4 $ hg init source
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
5 $ cd source
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
6 $ touch foo
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
7 $ hg add foo
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
8 $ hg ci -m 'add foo'
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
9 $ hg clone . ../corrupted
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
10 updating to branch default
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
11 1 files updated, 0 files merged, 0 files removed, 0 files unresolved
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
12 $ echo >> foo
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
13 $ hg ci -m 'change foo'
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
14
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
15 Add a hook to wait 5 seconds and then abort the commit
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
16
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
17 $ cd ../corrupted
16962
d2fe9aaedcaf test-clone-pull-corruption: adapt for Windows
Adrian Buehlmann <adrian@cadifra.com>
parents: 16913
diff changeset
18 $ echo "[hooks]" >> .hg/hgrc
24838
b2c1ff96c1e1 tests: use double quote to quote arguments in hook for portability
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 16962
diff changeset
19 $ echo 'pretxncommit = sh -c "sleep 5; exit 1"' >> .hg/hgrc
12412
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
20
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
21 start a commit...
1785
81ca1a9bd061 Added test cases for repo corruption fixed in 2e0a288ca93e (issue132)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff changeset
22
12412
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
23 $ touch bar
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
24 $ hg add bar
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
25 $ hg ci -m 'add bar' &
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
26
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
27 ... and start a pull while the commit is still running
1785
81ca1a9bd061 Added test cases for repo corruption fixed in 2e0a288ca93e (issue132)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff changeset
28
12412
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
29 $ sleep 1
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
30 $ hg pull ../source 2>/dev/null
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
31 pulling from ../source
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
32 transaction abort!
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
33 rollback completed
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
34 abort: pretxncommit hook exited with status 1
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
35 searching for changes
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
36 adding changesets
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
37 adding manifests
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
38 adding file changes
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
39 added 1 changesets with 1 changes to 1 files
34661
eb586ed5d8ce transaction-summary: show the range of new revisions upon pull/unbundle (BC)
Denis Laxalde <denis.laxalde@logilab.fr>
parents: 24838
diff changeset
40 new changesets 52998019f625
12412
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
41 (run 'hg update' to get a working copy)
1785
81ca1a9bd061 Added test cases for repo corruption fixed in 2e0a288ca93e (issue132)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff changeset
42
12412
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
43 see what happened
1785
81ca1a9bd061 Added test cases for repo corruption fixed in 2e0a288ca93e (issue132)
Thomas Arendsen Hein <thomas@intevation.de>
parents:
diff changeset
44
12412
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
45 $ wait
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
46 $ hg verify
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
47 checking changesets
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
48 checking manifests
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
49 crosschecking files in changesets and manifests
2dbb9e5e3454 tests: unify test-clone-pull-corruption
Matt Mackall <mpm@selenic.com>
parents: 1785
diff changeset
50 checking files
39489
f1186c292d03 verify: make output less confusing (issue5924)
Meirambek Omyrzak <meirambek77@gmail.com>
parents: 34661
diff changeset
51 checked 2 changesets with 2 changes to 1 files
16913
f2719b387380 tests: add missing trailing 'cd ..'
Mads Kiilerich <mads@kiilerich.com>
parents: 15445
diff changeset
52
f2719b387380 tests: add missing trailing 'cd ..'
Mads Kiilerich <mads@kiilerich.com>
parents: 15445
diff changeset
53 $ cd ..