filelog: raise CensoredNodeError when hash checks fail with censor metadata
With this change, when a revlog revision hash does not match its content, and
the content is empty with a special metadata key, the integrity failure is
assumed to be intentionally caused to remove sensitive content from repository
history.
To allow different Mercurial functionality to handle this scenario differently
a more specific exception is raised than "ordinary" hash failures.
Alternatives to this approach include, but are not limited to:
- Calling a hook when hashes mismatch to allow arbitrary tombstone validation.
Cons: Irresponsibly easy to disable integrity checking altogether.
- Returning empty revision data eagerly instead of raising, masking the error.
Cons: Push/pull won't roundtrip the tombstone, so client repos are unusable.
- Doing nothing differently at this layer. Callers must do their own detection
of tombstoned data if they want to handle some hash checks and not others.
- Impacts dozens of callsites, many of which don't have the revision data
- Would probably be missing one or two callsites at any given time
- Currently we throw a RevlogError, as do 12 other places in revlog.py.
Callers would need to parse the exception message and/or ensure
RevlogError is not thrown from any other part of their call tree.
$ hg init
$ python -c 'file("a", "wb").write("confuse str.splitlines\nembedded\rnewline\n")'
$ hg ci -Ama -d '1 0'
adding a
$ echo clean diff >> a
$ hg ci -mb -d '2 0'
$ hg diff -r0 -r1
diff -r 107ba6f817b5 -r 310ce7989cdc a
--- a/a Thu Jan 01 00:00:01 1970 +0000
+++ b/a Thu Jan 01 00:00:02 1970 +0000
@@ -1,2 +1,3 @@
confuse str.splitlines
embedded\r (no-eol) (esc)
newline
+clean diff