annotate mercurial/filelog.py @ 37443:65250a66b55c

revlog: move censor logic into main revlog class Previously, the revlog class implemented dummy methods for various censor-related functionality. Revision censoring was (and will continue to be) only possible on filelog instances. So filelog implemented these methods to perform something reasonable. A problem with implementing censoring on filelog is that it assumes filelog is a revlog. Upcoming work to formalize the filelog interface will make this not true. Furthermore, the censoring logic is security-sensitive. I think action-at-a-distance with custom implementation of core revlog APIs in derived classes is a bit dangerous. I think at a minimum the censor logic should live in revlog.py. I was tempted to created a "censored revlog" class that basically pulled these methods out of filelog. But, I wasn't a huge fan of overriding core methods in child classes. A reason to do that would be performance. However, the censoring code only comes into play when: * hash verification fails * delta generation * applying deltas from changegroups The new code is conditional on an instance attribute. So the overhead for running the censored code when the revlog isn't censorable is an attribute lookup. All of these operations are at least a magnitude slower than a Python attribute lookup. So there shouldn't be a performance concern. Differential Revision: https://phab.mercurial-scm.org/D3151
author Gregory Szorc <gregory.szorc@gmail.com>
date Thu, 05 Apr 2018 16:31:45 -0700
parents 0596d27457c6
children 1541e1a8e87d
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1089
142b5d5ec9cc Break apart hg.py
mpm@selenic.com
parents: 1072
diff changeset
1 # filelog.py - file history class for mercurial
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
2 #
4635
63b9d2deed48 Updated copyright notices and add "and others" to "hg version"
Thomas Arendsen Hein <thomas@intevation.de>
parents: 4258
diff changeset
3 # Copyright 2005-2007 Matt Mackall <mpm@selenic.com>
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
4 #
8225
46293a0c7e9f updated license to be explicit about GPL version 2
Martin Geisler <mg@lazybytes.net>
parents: 7634
diff changeset
5 # This software may be used and distributed according to the terms of the
10263
25e572394f5c Update license to GPLv2+
Matt Mackall <mpm@selenic.com>
parents: 8531
diff changeset
6 # GNU General Public License version 2 or any later version.
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
7
25948
34bd1a5eef5b filelog: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 24255
diff changeset
8 from __future__ import absolute_import
34bd1a5eef5b filelog: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 24255
diff changeset
9
37441
a3202fa83aff filelog: declare that filelog implements a storage interface
Gregory Szorc <gregory.szorc@gmail.com>
parents: 35567
diff changeset
10 from .thirdparty.zope import (
a3202fa83aff filelog: declare that filelog implements a storage interface
Gregory Szorc <gregory.szorc@gmail.com>
parents: 35567
diff changeset
11 interface as zi,
a3202fa83aff filelog: declare that filelog implements a storage interface
Gregory Szorc <gregory.szorc@gmail.com>
parents: 35567
diff changeset
12 )
25948
34bd1a5eef5b filelog: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 24255
diff changeset
13 from . import (
37441
a3202fa83aff filelog: declare that filelog implements a storage interface
Gregory Szorc <gregory.szorc@gmail.com>
parents: 35567
diff changeset
14 repository,
25948
34bd1a5eef5b filelog: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 24255
diff changeset
15 revlog,
34bd1a5eef5b filelog: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 24255
diff changeset
16 )
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
17
37441
a3202fa83aff filelog: declare that filelog implements a storage interface
Gregory Szorc <gregory.szorc@gmail.com>
parents: 35567
diff changeset
18 @zi.implementer(repository.ifilestorage)
7634
14a4337a9b9b revlog: kill from-style imports
Matt Mackall <mpm@selenic.com>
parents: 7622
diff changeset
19 class filelog(revlog.revlog):
4258
b11a2fb59cf5 revlog: simplify revlog version handling
Matt Mackall <mpm@selenic.com>
parents: 4257
diff changeset
20 def __init__(self, opener, path):
19148
3bda242bf244 filelog: use super() for calling base functions
Durham Goode <durham@fb.com>
parents: 14287
diff changeset
21 super(filelog, self).__init__(opener,
37443
65250a66b55c revlog: move censor logic into main revlog class
Gregory Szorc <gregory.szorc@gmail.com>
parents: 37442
diff changeset
22 "/".join(("data", path + ".i")),
65250a66b55c revlog: move censor logic into main revlog class
Gregory Szorc <gregory.szorc@gmail.com>
parents: 37442
diff changeset
23 censorable=True)
35567
07769a04bc66 filelog: add the ability to report the user facing name
Matt Harbison <matt_harbison@yahoo.com>
parents: 34023
diff changeset
24 # full name of the user visible file, relative to the repository root
07769a04bc66 filelog: add the ability to report the user facing name
Matt Harbison <matt_harbison@yahoo.com>
parents: 34023
diff changeset
25 self.filename = path
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
26
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
27 def read(self, node):
360
10519e4cbd02 filelog: add metadata support
mpm@selenic.com
parents: 358
diff changeset
28 t = self.revision(node)
686
d7d68d27ebe5 Reapply startswith() changes that got lost with stale edit
Matt Mackall <mpm@selenic.com>
parents: 681
diff changeset
29 if not t.startswith('\1\n'):
360
10519e4cbd02 filelog: add metadata support
mpm@selenic.com
parents: 358
diff changeset
30 return t
2579
0875cda033fd use __contains__, index or split instead of str.find
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 2470
diff changeset
31 s = t.index('\1\n', 2)
10282
08a0f04b56bd many, many trivial check-code fixups
Matt Mackall <mpm@selenic.com>
parents: 10263
diff changeset
32 return t[s + 2:]
360
10519e4cbd02 filelog: add metadata support
mpm@selenic.com
parents: 358
diff changeset
33
10519e4cbd02 filelog: add metadata support
mpm@selenic.com
parents: 358
diff changeset
34 def add(self, text, meta, transaction, link, p1=None, p2=None):
686
d7d68d27ebe5 Reapply startswith() changes that got lost with stale edit
Matt Mackall <mpm@selenic.com>
parents: 681
diff changeset
35 if meta or text.startswith('\1\n'):
37442
0596d27457c6 revlog: move parsemeta() and packmeta() from filelog (API)
Gregory Szorc <gregory.szorc@gmail.com>
parents: 37441
diff changeset
36 text = revlog.packmeta(meta, text)
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
37 return self.addrevision(text, transaction, link, p1, p2)
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
38
1116
0cdd73b0767c Add some rename debugging support
mpm@selenic.com
parents: 1089
diff changeset
39 def renamed(self, node):
7634
14a4337a9b9b revlog: kill from-style imports
Matt Mackall <mpm@selenic.com>
parents: 7622
diff changeset
40 if self.parents(node)[0] != revlog.nullid:
1116
0cdd73b0767c Add some rename debugging support
mpm@selenic.com
parents: 1089
diff changeset
41 return False
13240
e5060aa22043 filelog: move metadata parsing to a helper function
Matt Mackall <mpm@selenic.com>
parents: 11541
diff changeset
42 t = self.revision(node)
37442
0596d27457c6 revlog: move parsemeta() and packmeta() from filelog (API)
Gregory Szorc <gregory.szorc@gmail.com>
parents: 37441
diff changeset
43 m = revlog.parsemeta(t)[0]
5915
d0576d065993 Prefer i in d over d.has_key(i)
Christian Ebert <blacktrash@gmx.net>
parents: 4635
diff changeset
44 if m and "copy" in m:
7634
14a4337a9b9b revlog: kill from-style imports
Matt Mackall <mpm@selenic.com>
parents: 7622
diff changeset
45 return (m["copy"], revlog.bin(m["copyrev"]))
1116
0cdd73b0767c Add some rename debugging support
mpm@selenic.com
parents: 1089
diff changeset
46 return False
0cdd73b0767c Add some rename debugging support
mpm@selenic.com
parents: 1089
diff changeset
47
2898
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
48 def size(self, rev):
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
49 """return the size of a given revision"""
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
50
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
51 # for revisions with renames, we have to go the slow way
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
52 node = self.node(rev)
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
53 if self.renamed(node):
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
54 return len(self.read(node))
24118
76f6ae06ddf5 revlog: add "iscensored()" to revlog public API
Mike Edgar <adgar@google.com>
parents: 24117
diff changeset
55 if self.iscensored(rev):
22597
58ec36686f0e filelog: censored files compare against empty data, have 0 size
Mike Edgar <adgar@google.com>
parents: 22596
diff changeset
56 return 0
2898
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
57
11540
2370e270a29a filelog: test behaviour for data starting with "\1\n"
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11539
diff changeset
58 # XXX if self.read(node).startswith("\1\n"), this returns (size+4)
19148
3bda242bf244 filelog: use super() for calling base functions
Durham Goode <durham@fb.com>
parents: 14287
diff changeset
59 return super(filelog, self).size(rev)
2898
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
60
2887
05257fd28591 filelog: add hash-based comparisons
Matt Mackall <mpm@selenic.com>
parents: 2859
diff changeset
61 def cmp(self, node, text):
11539
a463e3c50212 cmp: document the fact that we return True if content is different
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 10706
diff changeset
62 """compare text with a given file revision
a463e3c50212 cmp: document the fact that we return True if content is different
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 10706
diff changeset
63
a463e3c50212 cmp: document the fact that we return True if content is different
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 10706
diff changeset
64 returns True if text is different than what is stored.
a463e3c50212 cmp: document the fact that we return True if content is different
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 10706
diff changeset
65 """
2887
05257fd28591 filelog: add hash-based comparisons
Matt Mackall <mpm@selenic.com>
parents: 2859
diff changeset
66
11541
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
67 t = text
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
68 if text.startswith('\1\n'):
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
69 t = '\1\n\1\n' + text
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
70
19148
3bda242bf244 filelog: use super() for calling base functions
Durham Goode <durham@fb.com>
parents: 14287
diff changeset
71 samehashes = not super(filelog, self).cmp(node, t)
11541
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
72 if samehashes:
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
73 return False
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
74
22597
58ec36686f0e filelog: censored files compare against empty data, have 0 size
Mike Edgar <adgar@google.com>
parents: 22596
diff changeset
75 # censored files compare against the empty file
24118
76f6ae06ddf5 revlog: add "iscensored()" to revlog public API
Mike Edgar <adgar@google.com>
parents: 24117
diff changeset
76 if self.iscensored(self.rev(node)):
22597
58ec36686f0e filelog: censored files compare against empty data, have 0 size
Mike Edgar <adgar@google.com>
parents: 22596
diff changeset
77 return text != ''
58ec36686f0e filelog: censored files compare against empty data, have 0 size
Mike Edgar <adgar@google.com>
parents: 22596
diff changeset
78
11541
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
79 # renaming a file produces a different hash, even if the data
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
80 # remains unchanged. Check if it's the case (slow):
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
81 if self.renamed(node):
2887
05257fd28591 filelog: add hash-based comparisons
Matt Mackall <mpm@selenic.com>
parents: 2859
diff changeset
82 t2 = self.read(node)
2895
21631c2c09a5 filelog.cmp: return 0 for equality
Matt Mackall <mpm@selenic.com>
parents: 2890
diff changeset
83 return t2 != text
2887
05257fd28591 filelog: add hash-based comparisons
Matt Mackall <mpm@selenic.com>
parents: 2859
diff changeset
84
11541
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
85 return True