similar: compare between actual file contents for exact identity
Before this patch, similarity detection logic (for addremove and
automv) depends entirely on SHA-1 digesting. But this causes incorrect
rename detection, if:
- removing file A and adding file B occur at same committing, and
- SHA-1 hash values of file A and B are same
This may prevent security experts from managing sample files for
SHAttered issue in Mercurial repository, for example.
https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html
https://shattered.it/
Hash collision itself isn't so serious for core repository
functionality of Mercurial, described by mpm as below, though.
https://www.mercurial-scm.org/wiki/mpm/SHA1
This patch compares between actual file contents after hash comparison
for exact identity.
Even after this patch, SHA-1 is still used, because it is reasonable
enough to quickly detect existence of "(almost) same" file.
- replacing SHA-1 causes decreasing performance, and
- replacement of it has ambiguity, yet
Getting content of removed file (= rfctx.data()) at each exact
comparison should be cheap enough, even though getting content of
added one costs much.
======= ============== =====================
file fctx data() reads from
======= ============== =====================
removed filectx in-memory revlog data
added workingfilectx storage
======= ============== =====================
#!/usr/bin/env bash
hg init rebase
cd rebase
# @ 7: 'H'
# |
# | o 6: 'G'
# |/|
# o | 5: 'F'
# | |
# | o 4: 'E'
# |/
# | o 3: 'D'
# | |
# | o 2: 'C'
# | |
# | o 1: 'B'
# |/
# o 0: 'A'
echo A > A
hg ci -Am A
echo B > B
hg ci -Am B
echo C > C
hg ci -Am C
echo D > D
hg ci -Am D
hg up -q -C 0
echo E > E
hg ci -Am E
hg up -q -C 0
echo F > F
hg ci -Am F
hg merge -r 4
hg ci -m G
hg up -q -C 5
echo H > H
hg ci -Am H
hg bundle -a ../rebase.hg
cd ..
rm -Rf rebase