# HG changeset patch # User David Greenaway # Date 1270256296 -39600 # Node ID ef4aa90b1e58ff75bfe53e8bd180aec8612d407e # Parent f6dcbeb5babe7278531e127b91c55f2d94a0c8b9 Move 'findrenames' code into its own file. The next few patches will increase the size of the "findrenames" functionality. This patch simply moves the function into its own file to avoid clutter building up in 'cmdutil.py'. diff -r f6dcbeb5babe -r ef4aa90b1e58 mercurial/cmdutil.py --- a/mercurial/cmdutil.py Sat May 01 14:32:50 2010 +0200 +++ b/mercurial/cmdutil.py Sat Apr 03 11:58:16 2010 +1100 @@ -10,6 +10,7 @@ import os, sys, errno, re, glob, tempfile import mdiff, bdiff, util, templater, patch, error, encoding, templatekw import match as _match +import similar revrangesep = ':' @@ -286,52 +287,6 @@ def matchfiles(repo, files): return _match.exact(repo.root, repo.getcwd(), files) -def findrenames(repo, added, removed, threshold): - '''find renamed files -- yields (before, after, score) tuples''' - copies = {} - ctx = repo['.'] - for i, r in enumerate(removed): - repo.ui.progress(_('searching'), i, total=len(removed)) - if r not in ctx: - continue - fctx = ctx.filectx(r) - - # lazily load text - @util.cachefunc - def data(): - orig = fctx.data() - return orig, mdiff.splitnewlines(orig) - - def score(text): - if not len(text): - return 0.0 - if not fctx.cmp(text): - return 1.0 - if threshold == 1.0: - return 0.0 - orig, lines = data() - # bdiff.blocks() returns blocks of matching lines - # count the number of bytes in each - equal = 0 - matches = bdiff.blocks(text, orig) - for x1, x2, y1, y2 in matches: - for line in lines[y1:y2]: - equal += len(line) - - lengths = len(text) + len(orig) - return equal * 2.0 / lengths - - for a in added: - bestscore = copies.get(a, (None, threshold))[1] - myscore = score(repo.wread(a)) - if myscore >= bestscore: - copies[a] = (r, myscore) - repo.ui.progress(_('searching'), None) - - for dest, v in copies.iteritems(): - source, score = v - yield source, dest, score - def addremove(repo, pats=[], opts={}, dry_run=None, similarity=None): if dry_run is None: dry_run = opts.get('dry_run') @@ -366,8 +321,8 @@ added.append(abs) copies = {} if similarity > 0: - for old, new, score in findrenames(repo, added + unknown, - removed + deleted, similarity): + for old, new, score in similar.findrenames(repo, + added + unknown, removed + deleted, similarity): if repo.ui.verbose or not m.exact(old) or not m.exact(new): repo.ui.status(_('recording removal of %s as rename to %s ' '(%d%% similar)\n') % diff -r f6dcbeb5babe -r ef4aa90b1e58 mercurial/similar.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/mercurial/similar.py Sat Apr 03 11:58:16 2010 +1100 @@ -0,0 +1,59 @@ +# similar.py - mechanisms for finding similar files +# +# Copyright 2005-2007 Matt Mackall +# +# This software may be used and distributed according to the terms of the +# GNU General Public License version 2 or any later version. + +from i18n import _ +import util +import mdiff +import bdiff + +def findrenames(repo, added, removed, threshold): + '''find renamed files -- yields (before, after, score) tuples''' + copies = {} + ctx = repo['.'] + for i, r in enumerate(removed): + repo.ui.progress(_('searching'), i, total=len(removed)) + if r not in ctx: + continue + fctx = ctx.filectx(r) + + # lazily load text + @util.cachefunc + def data(): + orig = fctx.data() + return orig, mdiff.splitnewlines(orig) + + def score(text): + if not len(text): + return 0.0 + if not fctx.cmp(text): + return 1.0 + if threshold == 1.0: + return 0.0 + orig, lines = data() + # bdiff.blocks() returns blocks of matching lines + # count the number of bytes in each + equal = 0 + matches = bdiff.blocks(text, orig) + for x1, x2, y1, y2 in matches: + for line in lines[y1:y2]: + equal += len(line) + + lengths = len(text) + len(orig) + return equal * 2.0 / lengths + + for a in added: + bestscore = copies.get(a, (None, threshold))[1] + myscore = score(repo.wread(a)) + if myscore >= bestscore: + copies[a] = (r, myscore) + repo.ui.progress(_('searching'), None) + + for dest, v in copies.iteritems(): + source, score = v + yield source, dest, score + +