similar: compare between actual file contents for exact identity
Before this patch, similarity detection logic (for addremove and
automv) depends entirely on SHA-1 digesting. But this causes incorrect
rename detection, if:
- removing file A and adding file B occur at same committing, and
- SHA-1 hash values of file A and B are same
This may prevent security experts from managing sample files for
SHAttered issue in Mercurial repository, for example.
https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html
https://shattered.it/
Hash collision itself isn't so serious for core repository
functionality of Mercurial, described by mpm as below, though.
https://www.mercurial-scm.org/wiki/mpm/SHA1
This patch compares between actual file contents after hash comparison
for exact identity.
Even after this patch, SHA-1 is still used, because it is reasonable
enough to quickly detect existence of "(almost) same" file.
- replacing SHA-1 causes decreasing performance, and
- replacement of it has ambiguity, yet
Getting content of removed file (= rfctx.data()) at each exact
comparison should be cheap enough, even though getting content of
added one costs much.
======= ============== =====================
file fctx data() reads from
======= ============== =====================
removed filectx in-memory revlog data
added workingfilectx storage
======= ============== =====================
$ hg init
$ touch a
$ hg add a
$ hg ci -m "a"
$ echo 123 > b
$ hg add b
$ hg diff --nodates
diff -r 3903775176ed b
--- /dev/null
+++ b/b
@@ -0,0 +1,1 @@
+123
$ hg diff --nodates -r tip
diff -r 3903775176ed b
--- /dev/null
+++ b/b
@@ -0,0 +1,1 @@
+123
$ echo foo > a
$ hg diff --nodates
diff -r 3903775176ed a
--- a/a
+++ b/a
@@ -0,0 +1,1 @@
+foo
diff -r 3903775176ed b
--- /dev/null
+++ b/b
@@ -0,0 +1,1 @@
+123
$ hg diff -r ""
hg: parse error: empty query
[255]
$ hg diff -r tip -r ""
hg: parse error: empty query
[255]
Remove a file that was added via merge. Since the file is not in parent 1,
it should not be in the diff.
$ hg ci -m 'a=foo' a
$ hg co -Cq null
$ echo 123 > b
$ hg add b
$ hg ci -m "b"
created new head
$ hg merge 1
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
(branch merge, don't forget to commit)
$ hg rm -f a
$ hg diff --nodates
Rename a file that was added via merge. Since the rename source is not in
parent 1, the diff should be relative to /dev/null
$ hg co -Cq 2
$ hg merge 1
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
(branch merge, don't forget to commit)
$ hg mv a a2
$ hg diff --nodates
diff -r cf44b38435e5 a2
--- /dev/null
+++ b/a2
@@ -0,0 +1,1 @@
+foo
$ hg diff --nodates --git
diff --git a/a2 b/a2
new file mode 100644
--- /dev/null
+++ b/a2
@@ -0,0 +1,1 @@
+foo