share: fix typo to drop 'shared' requirement on unshare
This must be a typo and it seems correct to drop the requirement since the
repo is no longer a shared repository.
similar: compare between actual file contents for exact identity
Before this patch, similarity detection logic (for addremove and
automv) depends entirely on SHA-1 digesting. But this causes incorrect
rename detection, if:
- removing file A and adding file B occur at same committing, and
- SHA-1 hash values of file A and B are same
This may prevent security experts from managing sample files for
SHAttered issue in Mercurial repository, for example.
https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html
https://shattered.it/
Hash collision itself isn't so serious for core repository
functionality of Mercurial, described by mpm as below, though.
https://www.mercurial-scm.org/wiki/mpm/SHA1
This patch compares between actual file contents after hash comparison
for exact identity.
Even after this patch, SHA-1 is still used, because it is reasonable
enough to quickly detect existence of "(almost) same" file.
- replacing SHA-1 causes decreasing performance, and
- replacement of it has ambiguity, yet
Getting content of removed file (= rfctx.data()) at each exact
comparison should be cheap enough, even though getting content of
added one costs much.
======= ============== =====================
file fctx data() reads from
======= ============== =====================
removed filectx in-memory revlog data
added workingfilectx storage
======= ============== =====================
localrepo: handle rename with hardlinks properly
In "aftertrans", we rename "journal.*" to "undo.*". We expect "journal.*"
files to disappear after renaming.
However, if "journal.foo" and "undo.foo" refer to a same file (hardlink),
rename may be a no-op, leaving both files on disk, according to Linux
manpage [1]:
If oldpath and newpath are existing hard links referring to the same
file, then rename() does nothing, and returns a suc‐ cess status.
The POSIX specification [2] is not very clear about what to do.
To be safe, remove "undo.*" before the rename so "journal.*" cannot be left
on disk.
[1]: http://man7.org/linux/man-pages/man2/rename.2.html
[2]: http://pubs.opengroup.org/onlinepubs/
9699919799/
dirstate: avoid unnecessary load+dump during backup
Previously, dirstate.savebackup unconditionally dumps the dirstate map to
disk. It may require loading dirstate first to be able to dump it. Those
operations could be expensive if the dirstate is big, and could be avoided
if we know the dirstate file is up-to-date.
This patch avoids the read and write if the dirstate is clean. In that case,
we just do a plain copy without any serialization.
This should make commands which use transactions but do not touch dirstate
faster. For example, "hg bookmark -r REV NAME".
dirstate: try to use hardlink to backup dirstate
This should be more efficient once util.copyfile has real hardlink support.
dirstate: track updated files to improve write time
Previously, dirstate.write() would iterate over the entire dirstate to find any
entries that needed to be marked 'lookup' (i.e. if they have the same timestamp
as now). This was O(working copy) and slow in large repos. It was most visible
when rebasing or histediting multiple commits, since it gets executed once per
commit, even if the entire rebase/histedit is wrapped in a transaction.
The fix is to track which files have been editted, and only check those to see
if they need to be marked as 'lookup'. This saves 25% on histedit times in very
large repositories.
I tested this by adding temporary debug logic to verify that the old files
processed in the loop matched the new files processed in the loop and running
the test suite.