convert: replace revision references in messages if they are >= short hashes
Convert will try to find references to revisions in commit messages and replace
them with references to the converted revision. It will take any string that
looks like a hash (and thus also decimal numbers) and look it up in the source
repo. If it finds anything, it will use that in the commit message instead.
It would do that for all hex digit sequences of 6 to 40 characters. That was
usually no problem for small repos where it was unlikely that there would be a
matching 6 'digit' hash prefix. It was also no problem on repos with less than
100000 changesets where numbers with 6 or more digits not would match any
revision number. With more than 100000 revisions random numbers in commit
messages would be replaced with a "random" hash. For example, 'handle 100000
requests' would be changed to to 'handle 9117c6 requests'. Convert could thus
not really be used on real repositories with more than 100000 changesets.
The default hash length shown by Mercurial is 12 'digits'. It is unexpected and
unwanted that convert by default tries to replace revision references that use
less than that amount of 'digits'.
To fix this, don't match strings that are less than the default hash size of 12
characters.
--- a/hgext/convert/hg.py Fri Jan 30 04:59:05 2015 +0900
+++ b/hgext/convert/hg.py Fri Jan 30 18:51:20 2015 +0100
@@ -26,7 +26,7 @@
from common import NoRepo, commit, converter_source, converter_sink
import re
-sha1re = re.compile(r'\b[0-9a-f]{6,40}\b')
+sha1re = re.compile(r'\b[0-9a-f]{12,40}\b')
class mercurial_sink(converter_sink):
def __init__(self, ui, path):