automv: use 95 as the default similarity threshold
authorMartijn Pieters <mjpieters@fb.com>
Tue, 16 Feb 2016 15:58:32 +0000
changeset 28183 e07daee83029
parent 28182 e4fe4e903e97
child 28184 11c2f8af09c2
automv: use 95 as the default similarity threshold The motivation for the change from 100 to 95 is included in a comment. * Updated the tests to include a change to a moved file that still should be caught as a move. * Use ui.configint() to non-integer configuration entries more gracefully. Also complain if a similarity outside of the acceptable range is set.
hgext/automv.py
tests/test-automv.t
--- a/hgext/automv.py	Fri Feb 19 22:28:09 2016 +0100
+++ b/hgext/automv.py	Tue Feb 16 15:58:32 2016 +0000
@@ -11,14 +11,25 @@
 
 The threshold at which a file is considered a move can be set with the
 ``automv.similarity`` config option. This option takes a percentage between 0
-(disabled) and 100 (files must be identical), the default is 100.
+(disabled) and 100 (files must be identical), the default is 95.
 
 """
+
+# Using 95 as a default similarity is based on an analysis of the mercurial
+# repositories of the cpython, mozilla-central & mercurial repositories, as
+# well as 2 very large facebook repositories. At 95 50% of all potential
+# missed moves would be caught, as well as correspond with 87% of all
+# explicitly marked moves.  Together, 80% of moved files are 95% similar or
+# more.
+#
+# See http://markmail.org/thread/5pxnljesvufvom57 for context.
+
 from __future__ import absolute_import
 
 from mercurial import (
     commands,
     copies,
+    error,
     extensions,
     scmutil,
     similar
@@ -37,7 +48,9 @@
     renames = None
     disabled = opts.pop('no_automv', False)
     if not disabled:
-        threshold = float(ui.config('automv', 'similarity', '100'))
+        threshold = ui.configint('automv', 'similarity', 95)
+        if not 0 <= threshold <= 100:
+            raise error.Abort(_('automv.similarity must be between 0 and 100'))
         if threshold > 0:
             match = scmutil.match(repo[None], pats, opts)
             added, removed = _interestingfiles(repo, match)
--- a/tests/test-automv.t	Fri Feb 19 22:28:09 2016 +0100
+++ b/tests/test-automv.t	Tue Feb 16 15:58:32 2016 +0000
@@ -13,7 +13,7 @@
 
 Test automv command for commit
 
-  $ echo 'foo' > a.txt
+  $ printf 'foo\nbar\nbaz\n' > a.txt
   $ hg add a.txt
   $ hg commit -m 'init repo with a'
 
@@ -37,6 +37,24 @@
   $ mv a.txt b.txt
   $ hg rm a.txt
   $ hg add b.txt
+  $ printf '\n' >> b.txt
+  $ hg status -C
+  A b.txt
+  R a.txt
+  $ hg commit -m 'msg'
+  detected move of 1 files
+  created new head
+  $ hg status --change . -C
+  A b.txt
+    a.txt
+  R a.txt
+  $ hg up -r 0
+  1 files updated, 0 files merged, 1 files removed, 0 files unresolved
+
+mv/rm/add/modif
+  $ mv a.txt b.txt
+  $ hg rm a.txt
+  $ hg add b.txt
   $ printf '\nfoo\n' >> b.txt
   $ hg status -C
   A b.txt
@@ -161,6 +179,29 @@
   $ mv a.txt b.txt
   $ hg rm a.txt
   $ hg add b.txt
+  $ printf '\n' >> b.txt
+  $ hg status -C
+  A b.txt
+  R a.txt
+  $ hg commit --amend -m 'amended'
+  detected move of 1 files
+  saved backup bundle to $TESTTMP/repo/.hg/strip-backup/*-amend-backup.hg (glob)
+  $ hg status --change . -C
+  A b.txt
+    a.txt
+  A c.txt
+  R a.txt
+  $ hg up -r 0
+  1 files updated, 0 files merged, 2 files removed, 0 files unresolved
+
+mv/rm/add/modif
+  $ echo 'c' > c.txt
+  $ hg add c.txt
+  $ hg commit -m 'revision to amend to'
+  created new head
+  $ mv a.txt b.txt
+  $ hg rm a.txt
+  $ hg add b.txt
   $ printf '\nfoo\n' >> b.txt
   $ hg status -C
   A b.txt
@@ -285,3 +326,13 @@
   $ hg status --change . -C
   A b.txt
   R a.txt
+
+error conditions
+
+  $ cat >> $HGRCPATH << EOF
+  > [automv]
+  > similarity=110
+  > EOF
+  $ hg commit -m 'revision to amend to'
+  abort: automv.similarity must be between 0 and 100
+  [255]