Fri, 03 Mar 2017 00:11:18 +0900 share: fix typo to drop 'shared' requirement on unshare
Yuya Nishihara <yuya@tcha.org> [Fri, 03 Mar 2017 00:11:18 +0900] rev 31211
share: fix typo to drop 'shared' requirement on unshare This must be a typo and it seems correct to drop the requirement since the repo is no longer a shared repository.
Fri, 03 Mar 2017 02:57:06 +0900 similar: compare between actual file contents for exact identity
FUJIWARA Katsunori <foozy@lares.dti.ne.jp> [Fri, 03 Mar 2017 02:57:06 +0900] rev 31210
similar: compare between actual file contents for exact identity Before this patch, similarity detection logic (for addremove and automv) depends entirely on SHA-1 digesting. But this causes incorrect rename detection, if: - removing file A and adding file B occur at same committing, and - SHA-1 hash values of file A and B are same This may prevent security experts from managing sample files for SHAttered issue in Mercurial repository, for example. https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html https://shattered.it/ Hash collision itself isn't so serious for core repository functionality of Mercurial, described by mpm as below, though. https://www.mercurial-scm.org/wiki/mpm/SHA1 This patch compares between actual file contents after hash comparison for exact identity. Even after this patch, SHA-1 is still used, because it is reasonable enough to quickly detect existence of "(almost) same" file. - replacing SHA-1 causes decreasing performance, and - replacement of it has ambiguity, yet Getting content of removed file (= rfctx.data()) at each exact comparison should be cheap enough, even though getting content of added one costs much. ======= ============== ===================== file fctx data() reads from ======= ============== ===================== removed filectx in-memory revlog data added workingfilectx storage ======= ============== =====================
Thu, 02 Mar 2017 21:49:30 -0800 localrepo: handle rename with hardlinks properly
Jun Wu <quark@fb.com> [Thu, 02 Mar 2017 21:49:30 -0800] rev 31209
localrepo: handle rename with hardlinks properly In "aftertrans", we rename "journal.*" to "undo.*". We expect "journal.*" files to disappear after renaming. However, if "journal.foo" and "undo.foo" refer to a same file (hardlink), rename may be a no-op, leaving both files on disk, according to Linux manpage [1]: If oldpath and newpath are existing hard links referring to the same file, then rename() does nothing, and returns a suc‐ cess status. The POSIX specification [2] is not very clear about what to do. To be safe, remove "undo.*" before the rename so "journal.*" cannot be left on disk. [1]: http://man7.org/linux/man-pages/man2/rename.2.html [2]: http://pubs.opengroup.org/onlinepubs/9699919799/
Wed, 01 Mar 2017 18:21:06 -0800 dirstate: avoid unnecessary load+dump during backup
Jun Wu <quark@fb.com> [Wed, 01 Mar 2017 18:21:06 -0800] rev 31208
dirstate: avoid unnecessary load+dump during backup Previously, dirstate.savebackup unconditionally dumps the dirstate map to disk. It may require loading dirstate first to be able to dump it. Those operations could be expensive if the dirstate is big, and could be avoided if we know the dirstate file is up-to-date. This patch avoids the read and write if the dirstate is clean. In that case, we just do a plain copy without any serialization. This should make commands which use transactions but do not touch dirstate faster. For example, "hg bookmark -r REV NAME".
Wed, 01 Mar 2017 17:59:21 -0800 dirstate: try to use hardlink to backup dirstate
Jun Wu <quark@fb.com> [Wed, 01 Mar 2017 17:59:21 -0800] rev 31207
dirstate: try to use hardlink to backup dirstate This should be more efficient once util.copyfile has real hardlink support.
Sun, 05 Mar 2017 16:20:07 -0800 dirstate: track updated files to improve write time
Durham Goode <durham@fb.com> [Sun, 05 Mar 2017 16:20:07 -0800] rev 31206
dirstate: track updated files to improve write time Previously, dirstate.write() would iterate over the entire dirstate to find any entries that needed to be marked 'lookup' (i.e. if they have the same timestamp as now). This was O(working copy) and slow in large repos. It was most visible when rebasing or histediting multiple commits, since it gets executed once per commit, even if the entire rebase/histedit is wrapped in a transaction. The fix is to track which files have been editted, and only check those to see if they need to be marked as 'lookup'. This saves 25% on histedit times in very large repositories. I tested this by adding temporary debug logic to verify that the old files processed in the loop matched the new files processed in the loop and running the test suite.
(0) -30000 -10000 -3000 -1000 -300 -100 -30 -10 -6 +6 +10 +30 +100 +300 +1000 +3000 +10000 tip