mercurial/filelog.py
author Gregory Szorc <gregory.szorc@gmail.com>
Thu, 05 Apr 2018 16:31:45 -0700
changeset 37443 65250a66b55c
parent 37442 0596d27457c6
child 37497 1541e1a8e87d
permissions -rw-r--r--
revlog: move censor logic into main revlog class Previously, the revlog class implemented dummy methods for various censor-related functionality. Revision censoring was (and will continue to be) only possible on filelog instances. So filelog implemented these methods to perform something reasonable. A problem with implementing censoring on filelog is that it assumes filelog is a revlog. Upcoming work to formalize the filelog interface will make this not true. Furthermore, the censoring logic is security-sensitive. I think action-at-a-distance with custom implementation of core revlog APIs in derived classes is a bit dangerous. I think at a minimum the censor logic should live in revlog.py. I was tempted to created a "censored revlog" class that basically pulled these methods out of filelog. But, I wasn't a huge fan of overriding core methods in child classes. A reason to do that would be performance. However, the censoring code only comes into play when: * hash verification fails * delta generation * applying deltas from changegroups The new code is conditional on an instance attribute. So the overhead for running the censored code when the revlog isn't censorable is an attribute lookup. All of these operations are at least a magnitude slower than a Python attribute lookup. So there shouldn't be a performance concern. Differential Revision: https://phab.mercurial-scm.org/D3151
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
1089
142b5d5ec9cc Break apart hg.py
mpm@selenic.com
parents: 1072
diff changeset
     1
# filelog.py - file history class for mercurial
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
     2
#
4635
63b9d2deed48 Updated copyright notices and add "and others" to "hg version"
Thomas Arendsen Hein <thomas@intevation.de>
parents: 4258
diff changeset
     3
# Copyright 2005-2007 Matt Mackall <mpm@selenic.com>
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
     4
#
8225
46293a0c7e9f updated license to be explicit about GPL version 2
Martin Geisler <mg@lazybytes.net>
parents: 7634
diff changeset
     5
# This software may be used and distributed according to the terms of the
10263
25e572394f5c Update license to GPLv2+
Matt Mackall <mpm@selenic.com>
parents: 8531
diff changeset
     6
# GNU General Public License version 2 or any later version.
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
     7
25948
34bd1a5eef5b filelog: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 24255
diff changeset
     8
from __future__ import absolute_import
34bd1a5eef5b filelog: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 24255
diff changeset
     9
37441
a3202fa83aff filelog: declare that filelog implements a storage interface
Gregory Szorc <gregory.szorc@gmail.com>
parents: 35567
diff changeset
    10
from .thirdparty.zope import (
a3202fa83aff filelog: declare that filelog implements a storage interface
Gregory Szorc <gregory.szorc@gmail.com>
parents: 35567
diff changeset
    11
    interface as zi,
a3202fa83aff filelog: declare that filelog implements a storage interface
Gregory Szorc <gregory.szorc@gmail.com>
parents: 35567
diff changeset
    12
)
25948
34bd1a5eef5b filelog: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 24255
diff changeset
    13
from . import (
37441
a3202fa83aff filelog: declare that filelog implements a storage interface
Gregory Szorc <gregory.szorc@gmail.com>
parents: 35567
diff changeset
    14
    repository,
25948
34bd1a5eef5b filelog: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 24255
diff changeset
    15
    revlog,
34bd1a5eef5b filelog: use absolute_import
Gregory Szorc <gregory.szorc@gmail.com>
parents: 24255
diff changeset
    16
)
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    17
37441
a3202fa83aff filelog: declare that filelog implements a storage interface
Gregory Szorc <gregory.szorc@gmail.com>
parents: 35567
diff changeset
    18
@zi.implementer(repository.ifilestorage)
7634
14a4337a9b9b revlog: kill from-style imports
Matt Mackall <mpm@selenic.com>
parents: 7622
diff changeset
    19
class filelog(revlog.revlog):
4258
b11a2fb59cf5 revlog: simplify revlog version handling
Matt Mackall <mpm@selenic.com>
parents: 4257
diff changeset
    20
    def __init__(self, opener, path):
19148
3bda242bf244 filelog: use super() for calling base functions
Durham Goode <durham@fb.com>
parents: 14287
diff changeset
    21
        super(filelog, self).__init__(opener,
37443
65250a66b55c revlog: move censor logic into main revlog class
Gregory Szorc <gregory.szorc@gmail.com>
parents: 37442
diff changeset
    22
                        "/".join(("data", path + ".i")),
65250a66b55c revlog: move censor logic into main revlog class
Gregory Szorc <gregory.szorc@gmail.com>
parents: 37442
diff changeset
    23
                                      censorable=True)
35567
07769a04bc66 filelog: add the ability to report the user facing name
Matt Harbison <matt_harbison@yahoo.com>
parents: 34023
diff changeset
    24
        # full name of the user visible file, relative to the repository root
07769a04bc66 filelog: add the ability to report the user facing name
Matt Harbison <matt_harbison@yahoo.com>
parents: 34023
diff changeset
    25
        self.filename = path
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    26
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    27
    def read(self, node):
360
10519e4cbd02 filelog: add metadata support
mpm@selenic.com
parents: 358
diff changeset
    28
        t = self.revision(node)
686
d7d68d27ebe5 Reapply startswith() changes that got lost with stale edit
Matt Mackall <mpm@selenic.com>
parents: 681
diff changeset
    29
        if not t.startswith('\1\n'):
360
10519e4cbd02 filelog: add metadata support
mpm@selenic.com
parents: 358
diff changeset
    30
            return t
2579
0875cda033fd use __contains__, index or split instead of str.find
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 2470
diff changeset
    31
        s = t.index('\1\n', 2)
10282
08a0f04b56bd many, many trivial check-code fixups
Matt Mackall <mpm@selenic.com>
parents: 10263
diff changeset
    32
        return t[s + 2:]
360
10519e4cbd02 filelog: add metadata support
mpm@selenic.com
parents: 358
diff changeset
    33
10519e4cbd02 filelog: add metadata support
mpm@selenic.com
parents: 358
diff changeset
    34
    def add(self, text, meta, transaction, link, p1=None, p2=None):
686
d7d68d27ebe5 Reapply startswith() changes that got lost with stale edit
Matt Mackall <mpm@selenic.com>
parents: 681
diff changeset
    35
        if meta or text.startswith('\1\n'):
37442
0596d27457c6 revlog: move parsemeta() and packmeta() from filelog (API)
Gregory Szorc <gregory.szorc@gmail.com>
parents: 37441
diff changeset
    36
            text = revlog.packmeta(meta, text)
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    37
        return self.addrevision(text, transaction, link, p1, p2)
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    38
1116
0cdd73b0767c Add some rename debugging support
mpm@selenic.com
parents: 1089
diff changeset
    39
    def renamed(self, node):
7634
14a4337a9b9b revlog: kill from-style imports
Matt Mackall <mpm@selenic.com>
parents: 7622
diff changeset
    40
        if self.parents(node)[0] != revlog.nullid:
1116
0cdd73b0767c Add some rename debugging support
mpm@selenic.com
parents: 1089
diff changeset
    41
            return False
13240
e5060aa22043 filelog: move metadata parsing to a helper function
Matt Mackall <mpm@selenic.com>
parents: 11541
diff changeset
    42
        t = self.revision(node)
37442
0596d27457c6 revlog: move parsemeta() and packmeta() from filelog (API)
Gregory Szorc <gregory.szorc@gmail.com>
parents: 37441
diff changeset
    43
        m = revlog.parsemeta(t)[0]
5915
d0576d065993 Prefer i in d over d.has_key(i)
Christian Ebert <blacktrash@gmx.net>
parents: 4635
diff changeset
    44
        if m and "copy" in m:
7634
14a4337a9b9b revlog: kill from-style imports
Matt Mackall <mpm@selenic.com>
parents: 7622
diff changeset
    45
            return (m["copy"], revlog.bin(m["copyrev"]))
1116
0cdd73b0767c Add some rename debugging support
mpm@selenic.com
parents: 1089
diff changeset
    46
        return False
0cdd73b0767c Add some rename debugging support
mpm@selenic.com
parents: 1089
diff changeset
    47
2898
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
    48
    def size(self, rev):
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
    49
        """return the size of a given revision"""
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
    50
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
    51
        # for revisions with renames, we have to go the slow way
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
    52
        node = self.node(rev)
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
    53
        if self.renamed(node):
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
    54
            return len(self.read(node))
24118
76f6ae06ddf5 revlog: add "iscensored()" to revlog public API
Mike Edgar <adgar@google.com>
parents: 24117
diff changeset
    55
        if self.iscensored(rev):
22597
58ec36686f0e filelog: censored files compare against empty data, have 0 size
Mike Edgar <adgar@google.com>
parents: 22596
diff changeset
    56
            return 0
2898
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
    57
11540
2370e270a29a filelog: test behaviour for data starting with "\1\n"
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11539
diff changeset
    58
        # XXX if self.read(node).startswith("\1\n"), this returns (size+4)
19148
3bda242bf244 filelog: use super() for calling base functions
Durham Goode <durham@fb.com>
parents: 14287
diff changeset
    59
        return super(filelog, self).size(rev)
2898
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
    60
2887
05257fd28591 filelog: add hash-based comparisons
Matt Mackall <mpm@selenic.com>
parents: 2859
diff changeset
    61
    def cmp(self, node, text):
11539
a463e3c50212 cmp: document the fact that we return True if content is different
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 10706
diff changeset
    62
        """compare text with a given file revision
a463e3c50212 cmp: document the fact that we return True if content is different
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 10706
diff changeset
    63
a463e3c50212 cmp: document the fact that we return True if content is different
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 10706
diff changeset
    64
        returns True if text is different than what is stored.
a463e3c50212 cmp: document the fact that we return True if content is different
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 10706
diff changeset
    65
        """
2887
05257fd28591 filelog: add hash-based comparisons
Matt Mackall <mpm@selenic.com>
parents: 2859
diff changeset
    66
11541
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    67
        t = text
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    68
        if text.startswith('\1\n'):
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    69
            t = '\1\n\1\n' + text
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    70
19148
3bda242bf244 filelog: use super() for calling base functions
Durham Goode <durham@fb.com>
parents: 14287
diff changeset
    71
        samehashes = not super(filelog, self).cmp(node, t)
11541
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    72
        if samehashes:
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    73
            return False
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    74
22597
58ec36686f0e filelog: censored files compare against empty data, have 0 size
Mike Edgar <adgar@google.com>
parents: 22596
diff changeset
    75
        # censored files compare against the empty file
24118
76f6ae06ddf5 revlog: add "iscensored()" to revlog public API
Mike Edgar <adgar@google.com>
parents: 24117
diff changeset
    76
        if self.iscensored(self.rev(node)):
22597
58ec36686f0e filelog: censored files compare against empty data, have 0 size
Mike Edgar <adgar@google.com>
parents: 22596
diff changeset
    77
            return text != ''
58ec36686f0e filelog: censored files compare against empty data, have 0 size
Mike Edgar <adgar@google.com>
parents: 22596
diff changeset
    78
11541
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    79
        # renaming a file produces a different hash, even if the data
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    80
        # remains unchanged. Check if it's the case (slow):
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    81
        if self.renamed(node):
2887
05257fd28591 filelog: add hash-based comparisons
Matt Mackall <mpm@selenic.com>
parents: 2859
diff changeset
    82
            t2 = self.read(node)
2895
21631c2c09a5 filelog.cmp: return 0 for equality
Matt Mackall <mpm@selenic.com>
parents: 2890
diff changeset
    83
            return t2 != text
2887
05257fd28591 filelog: add hash-based comparisons
Matt Mackall <mpm@selenic.com>
parents: 2859
diff changeset
    84
11541
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    85
        return True