Mercurial > hg
annotate contrib/undumprevlog @ 40056:324b4b10351e
revlog: rewrite censoring logic
I was able to corrupt a revlog relatively easily with the existing
censoring code. The underlying problem is that the existing code
doesn't fully take delta chains into account. When copying revisions
that occur after the censored revision, the delta base can refer
to a censored revision. Then at read time, things blow up due to the
revision data not being a compressed delta.
This commit rewrites the revlog censoring code to take a higher-level
approach. We now create a new revlog instance pointing at temp files.
We iterate through each revision in the source revlog and insert
those revisions into the new revlog, replacing the censored revision's
data along the way.
The new implementation isn't as efficient as the old one. This is
because it will fully engage delta computation on insertion. But I
don't think it matters.
The new implementation is a bit hacky because it attempts to reload
the revlog instance with a new revlog index/data file. This is fragile.
But this is needed because the index (which could be backed by C) would
have a cached copy of the old, possibly changed data and that could
lead to problems accessing index or revision data later.
One benefit of the new approach is that we integrate with the
transaction. The old revlog is backed up and if the transaction is
rolled back, the original revlog is restored.
As part of this, we had to teach the transaction about the store
vfs. I'm not super keen about this. But this was the easiest way
to hook things up to the transaction. We /could/ just ignore the
transaction like we were doing before. But any file mutation should
be governed by transaction semantics, including undo during rollback.
Differential Revision: https://phab.mercurial-scm.org/D4869
author | Gregory Szorc <gregory.szorc@gmail.com> |
---|---|
date | Tue, 02 Oct 2018 17:34:34 -0700 |
parents | a063b84ce064 |
children | 99e231afc29c |
rev | line source |
---|---|
6433
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
1 #!/usr/bin/env python |
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
2 # Undump a dump from dumprevlog |
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
3 # $ hg init |
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
4 # $ undumprevlog < repo.dump |
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
5 |
33872
5d9890d8ca77
undumprevlog: update to valid Python 3 syntax
Augie Fackler <raf@durin42.com>
parents:
31248
diff
changeset
|
6 from __future__ import absolute_import, print_function |
29167
4f76c0c490b3
py3: make contrib/undumprevlog use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents:
23310
diff
changeset
|
7 |
6433
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
8 import sys |
29167
4f76c0c490b3
py3: make contrib/undumprevlog use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents:
23310
diff
changeset
|
9 from mercurial import ( |
39947
a063b84ce064
py3: byteify contrib/dumprevlog
Matt Harbison <matt_harbison@yahoo.com>
parents:
37120
diff
changeset
|
10 encoding, |
29167
4f76c0c490b3
py3: make contrib/undumprevlog use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents:
23310
diff
changeset
|
11 node, |
39947
a063b84ce064
py3: byteify contrib/dumprevlog
Matt Harbison <matt_harbison@yahoo.com>
parents:
37120
diff
changeset
|
12 pycompat, |
29167
4f76c0c490b3
py3: make contrib/undumprevlog use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents:
23310
diff
changeset
|
13 revlog, |
4f76c0c490b3
py3: make contrib/undumprevlog use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents:
23310
diff
changeset
|
14 transaction, |
31248
8d3e8c8c9049
vfs: use 'vfs' module directly in 'contrib/undumprevlog'
Pierre-Yves David <pierre-yves.david@ens-lyon.org>
parents:
31216
diff
changeset
|
15 vfs as vfsmod, |
29167
4f76c0c490b3
py3: make contrib/undumprevlog use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents:
23310
diff
changeset
|
16 ) |
37120
a8a902d7176e
procutil: bulk-replace function calls to point to new module
Yuya Nishihara <yuya@tcha.org>
parents:
33872
diff
changeset
|
17 from mercurial.utils import ( |
a8a902d7176e
procutil: bulk-replace function calls to point to new module
Yuya Nishihara <yuya@tcha.org>
parents:
33872
diff
changeset
|
18 procutil, |
a8a902d7176e
procutil: bulk-replace function calls to point to new module
Yuya Nishihara <yuya@tcha.org>
parents:
33872
diff
changeset
|
19 ) |
6433
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
20 |
6466
9c426da6b03b
contrib: fix binary file issues with dumprevlog on Windows
Adrian Buehlmann <adrian@cadifra.com>
parents:
6433
diff
changeset
|
21 for fp in (sys.stdin, sys.stdout, sys.stderr): |
37120
a8a902d7176e
procutil: bulk-replace function calls to point to new module
Yuya Nishihara <yuya@tcha.org>
parents:
33872
diff
changeset
|
22 procutil.setbinary(fp) |
6466
9c426da6b03b
contrib: fix binary file issues with dumprevlog on Windows
Adrian Buehlmann <adrian@cadifra.com>
parents:
6433
diff
changeset
|
23 |
39947
a063b84ce064
py3: byteify contrib/dumprevlog
Matt Harbison <matt_harbison@yahoo.com>
parents:
37120
diff
changeset
|
24 opener = vfsmod.vfs(b'.', False) |
a063b84ce064
py3: byteify contrib/dumprevlog
Matt Harbison <matt_harbison@yahoo.com>
parents:
37120
diff
changeset
|
25 tr = transaction.transaction(sys.stderr.write, opener, {b'store': opener}, |
a063b84ce064
py3: byteify contrib/dumprevlog
Matt Harbison <matt_harbison@yahoo.com>
parents:
37120
diff
changeset
|
26 b"undump.journal") |
19022
cba222f01056
tests: run check-code on Python files without .py extension
Mads Kiilerich <madski@unity3d.com>
parents:
14233
diff
changeset
|
27 while True: |
6433
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
28 l = sys.stdin.readline() |
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
29 if not l: |
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
30 break |
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
31 if l.startswith("file:"): |
39947
a063b84ce064
py3: byteify contrib/dumprevlog
Matt Harbison <matt_harbison@yahoo.com>
parents:
37120
diff
changeset
|
32 f = encoding.strtolocal(l[6:-1]) |
6433
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
33 r = revlog.revlog(opener, f) |
39947
a063b84ce064
py3: byteify contrib/dumprevlog
Matt Harbison <matt_harbison@yahoo.com>
parents:
37120
diff
changeset
|
34 pycompat.stdout.write(b'%s\n' % f) |
6433
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
35 elif l.startswith("node:"): |
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
36 n = node.bin(l[6:-1]) |
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
37 elif l.startswith("linkrev:"): |
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
38 lr = int(l[9:-1]) |
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
39 elif l.startswith("parents:"): |
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
40 p = l[9:-1].split() |
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
41 p1 = node.bin(p[0]) |
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
42 p2 = node.bin(p[1]) |
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
43 elif l.startswith("length:"): |
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
44 length = int(l[8:-1]) |
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
45 sys.stdin.readline() # start marker |
39947
a063b84ce064
py3: byteify contrib/dumprevlog
Matt Harbison <matt_harbison@yahoo.com>
parents:
37120
diff
changeset
|
46 d = encoding.strtolocal(sys.stdin.read(length)) |
6433
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
47 sys.stdin.readline() # end marker |
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
48 r.addrevision(d, tr, lr, p1, p2) |
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
49 |
ec5d77eb3431
add simple dump and undump scripts to contrib/
Matt Mackall <mpm@selenic.com>
parents:
diff
changeset
|
50 tr.close() |