revlog: automatically read from opened file handles
The revlog reading code commonly opens a new file handle for
reading on demand. There is support for passing a file handle
to revlog.revision(). But it is marked as an internal argument.
When revlogs are written, we write() data as it is available. But
we don't flush() data until all revisions are written.
Putting these two traits together, it is possible for an in-process
revlog reader during active writes to trigger the opening of a new
file handle on a file with unflushed writes. The reader won't have
access to all "available" revlog data (as it hasn't been flushed).
And with the introduction of the previous patch, this can lead to
the revlog raising an error due to a partial read.
I witnessed this behavior when applying changegroup data (via
`hg pull`) before issue6006 was fixed via different means. Having
this and the previous patch in play would have helped cause errors
earlier rather than manifesting as hash verification failures.
While this has been a long-standing issue, I believe the relatively
new delta computation code has tickled it into being more common.
This is because the new delta computation code will compute deltas
in more scenarios. This can lead to revlog reading. While the delta
computation code is probably supposed to reuse file handles, it
appears it isn't doing so in all circumstances.
But the issue runs deeper than that. Theoretically, any code can
access revision data during revlog writes. It appears we were just
getting lucky that it wasn't. (The "add revision callback" passed to
addgroup() provides an avenue to do this.)
If I changed the revlog's behavior to not cache the full revision
text or to clear caches after revision insertion during addgroup(),
I was able to produce crashes 100% of the time when writing changelog
revisions. This is because changelog's add revision callback attempts
to resolve the revision data to access the changed files list. And
without the revision's fulltext being cached, we performed a revlog
read, which required opening a new file handle. This attempted to read
unflushed data, leading to a partial read and a crash.
This commit teaches the revlog to store the file handles used for
writing multiple revisions during addgroup(). It also teaches the
code for resolving a file handle when reading to use these handles,
if available. This ensures that *any* reads (regardless of their
source) use the active writing file handles, if available. These
file handles have access to the unflushed data because they wrote it.
This allows reads to complete without issue.
Differential Revision: https://phab.mercurial-scm.org/D5267
$ cat >> $HGRCPATH <<EOF
> [experimental]
> bundle-phases=yes
> [extensions]
> strip=
> drawdag=$TESTDIR/drawdag.py
> EOF
Set up repo with linear history
$ hg init linear
$ cd linear
$ hg debugdrawdag <<'EOF'
> E
> |
> D
> |
> C
> |
> B
> |
> A
> EOF
$ hg phase --public A
$ hg phase --force --secret D
$ hg log -G -T '{desc} {phase}\n'
o E secret
|
o D secret
|
o C draft
|
o B draft
|
o A public
Phases are restored when unbundling
$ hg bundle --base B -r E bundle
3 changesets found
$ hg debugbundle bundle
Stream params: {Compression: BZ}
changegroup -- {nbchanges: 3, targetphase: 2, version: 02} (mandatory: True)
26805aba1e600a82e93661149f2313866a221a7b
f585351a92f85104bff7c284233c338b10eb1df7
9bc730a19041f9ec7cb33c626e811aa233efb18c
cache:rev-branch-cache -- {} (mandatory: False)
phase-heads -- {} (mandatory: True)
26805aba1e600a82e93661149f2313866a221a7b draft
$ hg strip --no-backup C
$ hg unbundle -q bundle
$ rm bundle
$ hg log -G -T '{desc} {phase}\n'
o E secret
|
o D secret
|
o C draft
|
o B draft
|
o A public
Root revision's phase is preserved
$ hg bundle -a bundle
5 changesets found
$ hg strip --no-backup A
$ hg unbundle -q bundle
$ rm bundle
$ hg log -G -T '{desc} {phase}\n'
o E secret
|
o D secret
|
o C draft
|
o B draft
|
o A public
Completely public history can be restored
$ hg phase --public E
$ hg bundle -a bundle
5 changesets found
$ hg strip --no-backup A
$ hg unbundle -q bundle
$ rm bundle
$ hg log -G -T '{desc} {phase}\n'
o E public
|
o D public
|
o C public
|
o B public
|
o A public
Direct transition from public to secret can be restored
$ hg phase --secret --force D
$ hg bundle -a bundle
5 changesets found
$ hg strip --no-backup A
$ hg unbundle -q bundle
$ rm bundle
$ hg log -G -T '{desc} {phase}\n'
o E secret
|
o D secret
|
o C public
|
o B public
|
o A public
Revisions within bundle preserve their phase even if parent changes its phase
$ hg phase --draft --force B
$ hg bundle --base B -r E bundle
3 changesets found
$ hg strip --no-backup C
$ hg phase --public B
$ hg unbundle -q bundle
$ rm bundle
$ hg log -G -T '{desc} {phase}\n'
o E secret
|
o D secret
|
o C draft
|
o B public
|
o A public
Phase of ancestors of stripped node get advanced to accommodate child
$ hg bundle --base B -r E bundle
3 changesets found
$ hg strip --no-backup C
$ hg phase --force --secret B
$ hg unbundle -q bundle
$ rm bundle
$ hg log -G -T '{desc} {phase}\n'
o E secret
|
o D secret
|
o C draft
|
o B draft
|
o A public
Unbundling advances phases of changesets even if they were already in the repo.
To test that, create a bundle of everything in draft phase and then unbundle
to see that secret becomes draft, but public remains public.
$ hg phase --draft --force A
$ hg phase --draft E
$ hg bundle -a bundle
5 changesets found
$ hg phase --public A
$ hg phase --secret --force E
$ hg unbundle -q bundle
$ rm bundle
$ hg log -G -T '{desc} {phase}\n'
o E draft
|
o D draft
|
o C draft
|
o B draft
|
o A public
Unbundling change in the middle of a stack does not affect later changes
$ hg strip --no-backup E
$ hg phase --secret --force D
$ hg log -G -T '{desc} {phase}\n'
o D secret
|
o C draft
|
o B draft
|
o A public
$ hg bundle --base A -r B bundle
1 changesets found
$ hg unbundle -q bundle
$ rm bundle
$ hg log -G -T '{desc} {phase}\n'
o D secret
|
o C draft
|
o B draft
|
o A public
$ cd ..
Set up repo with non-linear history
$ hg init non-linear
$ cd non-linear
$ hg debugdrawdag <<'EOF'
> D E
> |\|
> B C
> |/
> A
> EOF
$ hg phase --public C
$ hg phase --force --secret B
$ hg log -G -T '{node|short} {desc} {phase}\n'
o 03ca77807e91 E draft
|
| o 4e4f9194f9f1 D secret
|/|
o | dc0947a82db8 C public
| |
| o 112478962961 B secret
|/
o 426bada5c675 A public
Restore bundle of entire repo
$ hg bundle -a bundle
5 changesets found
$ hg debugbundle bundle
Stream params: {Compression: BZ}
changegroup -- {nbchanges: 5, targetphase: 2, version: 02} (mandatory: True)
426bada5c67598ca65036d57d9e4b64b0c1ce7a0
112478962961147124edd43549aedd1a335e44bf
dc0947a82db884575bb76ea10ac97b08536bfa03
4e4f9194f9f181c57f62e823e8bdfa46ab9e4ff4
03ca77807e919db8807c3749086dc36fb478cac0
cache:rev-branch-cache -- {} (mandatory: False)
phase-heads -- {} (mandatory: True)
dc0947a82db884575bb76ea10ac97b08536bfa03 public
03ca77807e919db8807c3749086dc36fb478cac0 draft
$ hg strip --no-backup A
$ hg unbundle -q bundle
$ rm bundle
$ hg log -G -T '{node|short} {desc} {phase}\n'
o 03ca77807e91 E draft
|
| o 4e4f9194f9f1 D secret
|/|
o | dc0947a82db8 C public
| |
| o 112478962961 B secret
|/
o 426bada5c675 A public
$ hg bundle --base 'A + C' -r D bundle
2 changesets found
$ hg debugbundle bundle
Stream params: {Compression: BZ}
changegroup -- {nbchanges: 2, targetphase: 2, version: 02} (mandatory: True)
112478962961147124edd43549aedd1a335e44bf
4e4f9194f9f181c57f62e823e8bdfa46ab9e4ff4
cache:rev-branch-cache -- {} (mandatory: False)
phase-heads -- {} (mandatory: True)
$ rm bundle
$ hg bundle --base A -r D bundle
3 changesets found
$ hg debugbundle bundle
Stream params: {Compression: BZ}
changegroup -- {nbchanges: 3, targetphase: 2, version: 02} (mandatory: True)
112478962961147124edd43549aedd1a335e44bf
dc0947a82db884575bb76ea10ac97b08536bfa03
4e4f9194f9f181c57f62e823e8bdfa46ab9e4ff4
cache:rev-branch-cache -- {} (mandatory: False)
phase-heads -- {} (mandatory: True)
dc0947a82db884575bb76ea10ac97b08536bfa03 public
$ rm bundle
$ hg bundle --base 'B + C' -r 'D + E' bundle
2 changesets found
$ hg debugbundle bundle
Stream params: {Compression: BZ}
changegroup -- {nbchanges: 2, targetphase: 2, version: 02} (mandatory: True)
4e4f9194f9f181c57f62e823e8bdfa46ab9e4ff4
03ca77807e919db8807c3749086dc36fb478cac0
cache:rev-branch-cache -- {} (mandatory: False)
phase-heads -- {} (mandatory: True)
03ca77807e919db8807c3749086dc36fb478cac0 draft
$ rm bundle