annotate mercurial/similar.py @ 31626:0febf8e4e2ce

repair: use context manager for lock management If repo.lock() raised inside of the try block, 'tr' would have been None in the finally block where it tries to release(). Modernize the syntax instead of just winching the lock out of the try block. I found several other instances of acquiring the lock inside of the 'try', but those finally blocks handle None references. I also started switching some trivial try/finally blocks to context managers, but didn't get them all because indenting over 3x for lock, wlock and transaction would have spilled over 80 characters. That got me wondering if there should be a repo.rwlock(), to handle locking and unlocking in the proper order. It also looks like py27 supports supports multiple context managers for a single 'with' statement. Should I hold off on the rest until py26 is dropped?
author Matt Harbison <matt_harbison@yahoo.com>
date Thu, 23 Mar 2017 23:47:23 -0400
parents 985a98c6bad0
children ded48ad55146
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
11059
ef4aa90b1e58 Move 'findrenames' code into its own file.
David Greenaway <hg-dev@davidgreenaway.com>
parents:
diff changeset
1 # similar.py - mechanisms for finding similar files
ef4aa90b1e58 Move 'findrenames' code into its own file.
David Greenaway <hg-dev@davidgreenaway.com>
parents:
diff changeset
2 #
ef4aa90b1e58 Move 'findrenames' code into its own file.
David Greenaway <hg-dev@davidgreenaway.com>
parents:
diff changeset
3 # Copyright 2005-2007 Matt Mackall <mpm@selenic.com>
ef4aa90b1e58 Move 'findrenames' code into its own file.
David Greenaway <hg-dev@davidgreenaway.com>
parents:
diff changeset
4 #
ef4aa90b1e58 Move 'findrenames' code into its own file.
David Greenaway <hg-dev@davidgreenaway.com>
parents:
diff changeset
5 # This software may be used and distributed according to the terms of the
ef4aa90b1e58 Move 'findrenames' code into its own file.
David Greenaway <hg-dev@davidgreenaway.com>
parents:
diff changeset
6 # GNU General Public License version 2 or any later version.
ef4aa90b1e58 Move 'findrenames' code into its own file.
David Greenaway <hg-dev@davidgreenaway.com>
parents:
diff changeset
7
27359
a56c47ed3885 similar: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 16683
diff changeset
8 from __future__ import absolute_import
a56c47ed3885 similar: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 16683
diff changeset
9
a56c47ed3885 similar: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 16683
diff changeset
10 from .i18n import _
a56c47ed3885 similar: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 16683
diff changeset
11 from . import (
a56c47ed3885 similar: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 16683
diff changeset
12 bdiff,
a56c47ed3885 similar: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 16683
diff changeset
13 mdiff,
a56c47ed3885 similar: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 16683
diff changeset
14 )
11059
ef4aa90b1e58 Move 'findrenames' code into its own file.
David Greenaway <hg-dev@davidgreenaway.com>
parents:
diff changeset
15
11060
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
16 def _findexactmatches(repo, added, removed):
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
17 '''find renamed files that have no changes
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
18
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
19 Takes a list of new filectxs and a list of removed filectxs, and yields
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
20 (before, after) tuples of exact matches.
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
21 '''
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
22 numfiles = len(added) + len(removed)
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
23
31584
985a98c6bad0 similar: use cheaper hash() function to test exact matches
Yuya Nishihara <yuya@tcha.org>
parents: 31583
diff changeset
24 # Build table of removed files: {hash(fctx.data()): [fctx, ...]}.
985a98c6bad0 similar: use cheaper hash() function to test exact matches
Yuya Nishihara <yuya@tcha.org>
parents: 31583
diff changeset
25 # We use hash() to discard fctx.data() from memory.
11060
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
26 hashes = {}
31584
985a98c6bad0 similar: use cheaper hash() function to test exact matches
Yuya Nishihara <yuya@tcha.org>
parents: 31583
diff changeset
27 for i, fctx in enumerate(removed):
28468
0d6b3630b9a3 similar: specify unit for ui.progress when operating on files
Anton Shestakov <av6@dwimlabs.net>
parents: 27359
diff changeset
28 repo.ui.progress(_('searching for exact renames'), i, total=numfiles,
0d6b3630b9a3 similar: specify unit for ui.progress when operating on files
Anton Shestakov <av6@dwimlabs.net>
parents: 27359
diff changeset
29 unit=_('files'))
31584
985a98c6bad0 similar: use cheaper hash() function to test exact matches
Yuya Nishihara <yuya@tcha.org>
parents: 31583
diff changeset
30 h = hash(fctx.data())
985a98c6bad0 similar: use cheaper hash() function to test exact matches
Yuya Nishihara <yuya@tcha.org>
parents: 31583
diff changeset
31 if h not in hashes:
985a98c6bad0 similar: use cheaper hash() function to test exact matches
Yuya Nishihara <yuya@tcha.org>
parents: 31583
diff changeset
32 hashes[h] = [fctx]
985a98c6bad0 similar: use cheaper hash() function to test exact matches
Yuya Nishihara <yuya@tcha.org>
parents: 31583
diff changeset
33 else:
985a98c6bad0 similar: use cheaper hash() function to test exact matches
Yuya Nishihara <yuya@tcha.org>
parents: 31583
diff changeset
34 hashes[h].append(fctx)
11060
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
35
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
36 # For each added file, see if it corresponds to a removed file.
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
37 for i, fctx in enumerate(added):
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
38 repo.ui.progress(_('searching for exact renames'), i + len(removed),
28468
0d6b3630b9a3 similar: specify unit for ui.progress when operating on files
Anton Shestakov <av6@dwimlabs.net>
parents: 27359
diff changeset
39 total=numfiles, unit=_('files'))
31210
e1d035905b2e similar: compare between actual file contents for exact identity
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 30809
diff changeset
40 adata = fctx.data()
31584
985a98c6bad0 similar: use cheaper hash() function to test exact matches
Yuya Nishihara <yuya@tcha.org>
parents: 31583
diff changeset
41 h = hash(adata)
985a98c6bad0 similar: use cheaper hash() function to test exact matches
Yuya Nishihara <yuya@tcha.org>
parents: 31583
diff changeset
42 for rfctx in hashes.get(h, []):
31210
e1d035905b2e similar: compare between actual file contents for exact identity
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 30809
diff changeset
43 # compare between actual file contents for exact identity
e1d035905b2e similar: compare between actual file contents for exact identity
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 30809
diff changeset
44 if adata == rfctx.data():
e1d035905b2e similar: compare between actual file contents for exact identity
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 30809
diff changeset
45 yield (rfctx, fctx)
31584
985a98c6bad0 similar: use cheaper hash() function to test exact matches
Yuya Nishihara <yuya@tcha.org>
parents: 31583
diff changeset
46 break
11060
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
47
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
48 # Done
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
49 repo.ui.progress(_('searching for exact renames'), None)
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
50
30805
0ae287eb6a4f similar: move score function to module level
Sean Farley <sean@farley.io>
parents: 30791
diff changeset
51 def _ctxdata(fctx):
0ae287eb6a4f similar: move score function to module level
Sean Farley <sean@farley.io>
parents: 30791
diff changeset
52 # lazily load text
0ae287eb6a4f similar: move score function to module level
Sean Farley <sean@farley.io>
parents: 30791
diff changeset
53 orig = fctx.data()
0ae287eb6a4f similar: move score function to module level
Sean Farley <sean@farley.io>
parents: 30791
diff changeset
54 return orig, mdiff.splitnewlines(orig)
0ae287eb6a4f similar: move score function to module level
Sean Farley <sean@farley.io>
parents: 30791
diff changeset
55
30809
8614546154cb similar: remove caching from the module level
Pierre-Yves David <pierre-yves.david@ens-lyon.org>
parents: 30805
diff changeset
56 def _score(fctx, otherdata):
8614546154cb similar: remove caching from the module level
Pierre-Yves David <pierre-yves.david@ens-lyon.org>
parents: 30805
diff changeset
57 orig, lines = otherdata
8614546154cb similar: remove caching from the module level
Pierre-Yves David <pierre-yves.david@ens-lyon.org>
parents: 30805
diff changeset
58 text = fctx.data()
30805
0ae287eb6a4f similar: move score function to module level
Sean Farley <sean@farley.io>
parents: 30791
diff changeset
59 # bdiff.blocks() returns blocks of matching lines
0ae287eb6a4f similar: move score function to module level
Sean Farley <sean@farley.io>
parents: 30791
diff changeset
60 # count the number of bytes in each
0ae287eb6a4f similar: move score function to module level
Sean Farley <sean@farley.io>
parents: 30791
diff changeset
61 equal = 0
0ae287eb6a4f similar: move score function to module level
Sean Farley <sean@farley.io>
parents: 30791
diff changeset
62 matches = bdiff.blocks(text, orig)
0ae287eb6a4f similar: move score function to module level
Sean Farley <sean@farley.io>
parents: 30791
diff changeset
63 for x1, x2, y1, y2 in matches:
0ae287eb6a4f similar: move score function to module level
Sean Farley <sean@farley.io>
parents: 30791
diff changeset
64 for line in lines[y1:y2]:
0ae287eb6a4f similar: move score function to module level
Sean Farley <sean@farley.io>
parents: 30791
diff changeset
65 equal += len(line)
0ae287eb6a4f similar: move score function to module level
Sean Farley <sean@farley.io>
parents: 30791
diff changeset
66
0ae287eb6a4f similar: move score function to module level
Sean Farley <sean@farley.io>
parents: 30791
diff changeset
67 lengths = len(text) + len(orig)
0ae287eb6a4f similar: move score function to module level
Sean Farley <sean@farley.io>
parents: 30791
diff changeset
68 return equal * 2.0 / lengths
0ae287eb6a4f similar: move score function to module level
Sean Farley <sean@farley.io>
parents: 30791
diff changeset
69
30809
8614546154cb similar: remove caching from the module level
Pierre-Yves David <pierre-yves.david@ens-lyon.org>
parents: 30805
diff changeset
70 def score(fctx1, fctx2):
8614546154cb similar: remove caching from the module level
Pierre-Yves David <pierre-yves.david@ens-lyon.org>
parents: 30805
diff changeset
71 return _score(fctx1, _ctxdata(fctx2))
8614546154cb similar: remove caching from the module level
Pierre-Yves David <pierre-yves.david@ens-lyon.org>
parents: 30805
diff changeset
72
11060
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
73 def _findsimilarmatches(repo, added, removed, threshold):
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
74 '''find potentially renamed files based on similar file content
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
75
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
76 Takes a list of new filectxs and a list of removed filectxs, and yields
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
77 (before, after, score) tuples of partial matches.
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
78 '''
11059
ef4aa90b1e58 Move 'findrenames' code into its own file.
David Greenaway <hg-dev@davidgreenaway.com>
parents:
diff changeset
79 copies = {}
ef4aa90b1e58 Move 'findrenames' code into its own file.
David Greenaway <hg-dev@davidgreenaway.com>
parents:
diff changeset
80 for i, r in enumerate(removed):
16683
525fdb738975 cleanup: eradicate long lines
Brodie Rao <brodie@sf.io>
parents: 11085
diff changeset
81 repo.ui.progress(_('searching for similar files'), i,
28468
0d6b3630b9a3 similar: specify unit for ui.progress when operating on files
Anton Shestakov <av6@dwimlabs.net>
parents: 27359
diff changeset
82 total=len(removed), unit=_('files'))
11059
ef4aa90b1e58 Move 'findrenames' code into its own file.
David Greenaway <hg-dev@davidgreenaway.com>
parents:
diff changeset
83
30809
8614546154cb similar: remove caching from the module level
Pierre-Yves David <pierre-yves.david@ens-lyon.org>
parents: 30805
diff changeset
84 data = None
11059
ef4aa90b1e58 Move 'findrenames' code into its own file.
David Greenaway <hg-dev@davidgreenaway.com>
parents:
diff changeset
85 for a in added:
ef4aa90b1e58 Move 'findrenames' code into its own file.
David Greenaway <hg-dev@davidgreenaway.com>
parents:
diff changeset
86 bestscore = copies.get(a, (None, threshold))[1]
30809
8614546154cb similar: remove caching from the module level
Pierre-Yves David <pierre-yves.david@ens-lyon.org>
parents: 30805
diff changeset
87 if data is None:
8614546154cb similar: remove caching from the module level
Pierre-Yves David <pierre-yves.david@ens-lyon.org>
parents: 30805
diff changeset
88 data = _ctxdata(r)
8614546154cb similar: remove caching from the module level
Pierre-Yves David <pierre-yves.david@ens-lyon.org>
parents: 30805
diff changeset
89 myscore = _score(a, data)
31583
2efd9771323e similar: take the first match instead of the last
Yuya Nishihara <yuya@tcha.org>
parents: 31582
diff changeset
90 if myscore > bestscore:
11059
ef4aa90b1e58 Move 'findrenames' code into its own file.
David Greenaway <hg-dev@davidgreenaway.com>
parents:
diff changeset
91 copies[a] = (r, myscore)
ef4aa90b1e58 Move 'findrenames' code into its own file.
David Greenaway <hg-dev@davidgreenaway.com>
parents:
diff changeset
92 repo.ui.progress(_('searching'), None)
ef4aa90b1e58 Move 'findrenames' code into its own file.
David Greenaway <hg-dev@davidgreenaway.com>
parents:
diff changeset
93
ef4aa90b1e58 Move 'findrenames' code into its own file.
David Greenaway <hg-dev@davidgreenaway.com>
parents:
diff changeset
94 for dest, v in copies.iteritems():
30791
ada160a8cfd8 similar: rename local variable to not collide with previous
Sean Farley <sean@farley.io>
parents: 29341
diff changeset
95 source, bscore = v
ada160a8cfd8 similar: rename local variable to not collide with previous
Sean Farley <sean@farley.io>
parents: 29341
diff changeset
96 yield source, dest, bscore
11059
ef4aa90b1e58 Move 'findrenames' code into its own file.
David Greenaway <hg-dev@davidgreenaway.com>
parents:
diff changeset
97
31582
2e254165a37c similar: do not look up and create filectx more than once
Yuya Nishihara <yuya@tcha.org>
parents: 31581
diff changeset
98 def _dropempty(fctxs):
2e254165a37c similar: do not look up and create filectx more than once
Yuya Nishihara <yuya@tcha.org>
parents: 31581
diff changeset
99 return [x for x in fctxs if x.size() > 0]
2e254165a37c similar: do not look up and create filectx more than once
Yuya Nishihara <yuya@tcha.org>
parents: 31581
diff changeset
100
11060
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
101 def findrenames(repo, added, removed, threshold):
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
102 '''find renamed files -- yields (before, after, score) tuples'''
31581
b1528d195a13 similar: use common names for changectx variables
Yuya Nishihara <yuya@tcha.org>
parents: 31580
diff changeset
103 wctx = repo[None]
b1528d195a13 similar: use common names for changectx variables
Yuya Nishihara <yuya@tcha.org>
parents: 31580
diff changeset
104 pctx = wctx.p1()
11059
ef4aa90b1e58 Move 'findrenames' code into its own file.
David Greenaway <hg-dev@davidgreenaway.com>
parents:
diff changeset
105
11060
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
106 # Zero length files will be frequently unrelated to each other, and
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
107 # tracking the deletion/addition of such a file will probably cause more
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
108 # harm than good. We strip them out here to avoid matching them later on.
31582
2e254165a37c similar: do not look up and create filectx more than once
Yuya Nishihara <yuya@tcha.org>
parents: 31581
diff changeset
109 addedfiles = _dropempty(wctx[fp] for fp in sorted(added))
2e254165a37c similar: do not look up and create filectx more than once
Yuya Nishihara <yuya@tcha.org>
parents: 31581
diff changeset
110 removedfiles = _dropempty(pctx[fp] for fp in sorted(removed) if fp in pctx)
11060
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
111
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
112 # Find exact matches.
31580
d3e2af4e0128 similar: get rid of quadratic addedfiles.remove()
Yuya Nishihara <yuya@tcha.org>
parents: 31579
diff changeset
113 matchedfiles = set()
d3e2af4e0128 similar: get rid of quadratic addedfiles.remove()
Yuya Nishihara <yuya@tcha.org>
parents: 31579
diff changeset
114 for (a, b) in _findexactmatches(repo, addedfiles, removedfiles):
d3e2af4e0128 similar: get rid of quadratic addedfiles.remove()
Yuya Nishihara <yuya@tcha.org>
parents: 31579
diff changeset
115 matchedfiles.add(b)
11060
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
116 yield (a.path(), b.path(), 1.0)
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
117
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
118 # If the user requested similar files to be matched, search for them also.
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
119 if threshold < 1.0:
31580
d3e2af4e0128 similar: get rid of quadratic addedfiles.remove()
Yuya Nishihara <yuya@tcha.org>
parents: 31579
diff changeset
120 addedfiles = [x for x in addedfiles if x not in matchedfiles]
31579
3a383caa97f4 similar: sort files not by object id but by path for stable result
Yuya Nishihara <yuya@tcha.org>
parents: 31210
diff changeset
121 for (a, b, score) in _findsimilarmatches(repo, addedfiles,
3a383caa97f4 similar: sort files not by object id but by path for stable result
Yuya Nishihara <yuya@tcha.org>
parents: 31210
diff changeset
122 removedfiles, threshold):
11060
e6df01776e08 findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes.
David Greenaway <hg-dev@davidgreenaway.com>
parents: 11059
diff changeset
123 yield (a.path(), b.path(), score)