mercurial/filelog.py
author Greg Ward <greg@gerg.ca>
Sun, 20 Mar 2011 17:41:09 -0400
changeset 13704 a464763e99f1
parent 13240 e5060aa22043
child 14074 e8271159c8c2
permissions -rw-r--r--
dirstate: avoid a race with multiple commits in the same process (issue2264, issue2516) The race happens when two commits in a row change the same file without changing its size, *if* those two commits happen in the same second in the same process while holding the same repo lock. For example: commit 1: M a M b commit 2: # same process, same second, same repo lock M b # modify b without changing its size M c This first manifested in transplant, which is the most common way to do multiple commits in the same process. But it can manifest in any script or extension that does multiple commits under the same repo lock. (Thus, the test script tests both transplant and a custom script.) The problem was that dirstate.status() failed to notice the change to b when localrepo is about to do the second commit, meaning that change gets left in the working directory. In the context of transplant, that means either a crash ("RuntimeError: nothing committed after transplant") or a silently inaccurate transplant, depending on whether any other files were modified by the second transplanted changeset. The fix is to make status() work a little harder when we have previously marked files as clean (state 'normal') in the same process. Specifically, dirstate.normal() adds files to self._lastnormal, and other state-changing methods remove them. Then dirstate.status() puts any files in self._lastnormal into state 'lookup', which will make localrepository.status() read file contents to see if it has really changed. So we pay a small performance penalty for the second (and subsequent) commits in the same process, without affecting the common case. Anything that does lots of status updates and checks in the same process could suffer a performance hit. Incidentally, there is a simpler fix: call dirstate.normallookup() on every file updated by commit() at the end of the commit. The trouble with that solution is that it imposes a performance penalty on the common case: it means the next status-dependent hg command after every "hg commit" will be a little bit slower. The patch here is more complex, but only affects performance for the uncommon case.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
1089
142b5d5ec9cc Break apart hg.py
mpm@selenic.com
parents: 1072
diff changeset
     1
# filelog.py - file history class for mercurial
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
     2
#
4635
63b9d2deed48 Updated copyright notices and add "and others" to "hg version"
Thomas Arendsen Hein <thomas@intevation.de>
parents: 4258
diff changeset
     3
# Copyright 2005-2007 Matt Mackall <mpm@selenic.com>
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
     4
#
8225
46293a0c7e9f updated license to be explicit about GPL version 2
Martin Geisler <mg@lazybytes.net>
parents: 7634
diff changeset
     5
# This software may be used and distributed according to the terms of the
10263
25e572394f5c Update license to GPLv2+
Matt Mackall <mpm@selenic.com>
parents: 8531
diff changeset
     6
# GNU General Public License version 2 or any later version.
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
     7
7634
14a4337a9b9b revlog: kill from-style imports
Matt Mackall <mpm@selenic.com>
parents: 7622
diff changeset
     8
import revlog
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
     9
13240
e5060aa22043 filelog: move metadata parsing to a helper function
Matt Mackall <mpm@selenic.com>
parents: 11541
diff changeset
    10
def _parsemeta(text):
e5060aa22043 filelog: move metadata parsing to a helper function
Matt Mackall <mpm@selenic.com>
parents: 11541
diff changeset
    11
    if not text.startswith('\1\n'):
e5060aa22043 filelog: move metadata parsing to a helper function
Matt Mackall <mpm@selenic.com>
parents: 11541
diff changeset
    12
        return {}
e5060aa22043 filelog: move metadata parsing to a helper function
Matt Mackall <mpm@selenic.com>
parents: 11541
diff changeset
    13
    s = text.index('\1\n', 2)
e5060aa22043 filelog: move metadata parsing to a helper function
Matt Mackall <mpm@selenic.com>
parents: 11541
diff changeset
    14
    mt = text[2:s]
e5060aa22043 filelog: move metadata parsing to a helper function
Matt Mackall <mpm@selenic.com>
parents: 11541
diff changeset
    15
    m = {}
e5060aa22043 filelog: move metadata parsing to a helper function
Matt Mackall <mpm@selenic.com>
parents: 11541
diff changeset
    16
    for l in mt.splitlines():
e5060aa22043 filelog: move metadata parsing to a helper function
Matt Mackall <mpm@selenic.com>
parents: 11541
diff changeset
    17
        k, v = l.split(": ", 1)
e5060aa22043 filelog: move metadata parsing to a helper function
Matt Mackall <mpm@selenic.com>
parents: 11541
diff changeset
    18
        m[k] = v
e5060aa22043 filelog: move metadata parsing to a helper function
Matt Mackall <mpm@selenic.com>
parents: 11541
diff changeset
    19
    return m
e5060aa22043 filelog: move metadata parsing to a helper function
Matt Mackall <mpm@selenic.com>
parents: 11541
diff changeset
    20
7634
14a4337a9b9b revlog: kill from-style imports
Matt Mackall <mpm@selenic.com>
parents: 7622
diff changeset
    21
class filelog(revlog.revlog):
4258
b11a2fb59cf5 revlog: simplify revlog version handling
Matt Mackall <mpm@selenic.com>
parents: 4257
diff changeset
    22
    def __init__(self, opener, path):
7634
14a4337a9b9b revlog: kill from-style imports
Matt Mackall <mpm@selenic.com>
parents: 7622
diff changeset
    23
        revlog.revlog.__init__(self, opener,
8531
810387f59696 filelog encoding: move the encoding/decoding into store
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 8225
diff changeset
    24
                        "/".join(("data", path + ".i")))
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    25
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    26
    def read(self, node):
360
10519e4cbd02 filelog: add metadata support
mpm@selenic.com
parents: 358
diff changeset
    27
        t = self.revision(node)
686
d7d68d27ebe5 Reapply startswith() changes that got lost with stale edit
Matt Mackall <mpm@selenic.com>
parents: 681
diff changeset
    28
        if not t.startswith('\1\n'):
360
10519e4cbd02 filelog: add metadata support
mpm@selenic.com
parents: 358
diff changeset
    29
            return t
2579
0875cda033fd use __contains__, index or split instead of str.find
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 2470
diff changeset
    30
        s = t.index('\1\n', 2)
10282
08a0f04b56bd many, many trivial check-code fixups
Matt Mackall <mpm@selenic.com>
parents: 10263
diff changeset
    31
        return t[s + 2:]
360
10519e4cbd02 filelog: add metadata support
mpm@selenic.com
parents: 358
diff changeset
    32
10519e4cbd02 filelog: add metadata support
mpm@selenic.com
parents: 358
diff changeset
    33
    def add(self, text, meta, transaction, link, p1=None, p2=None):
686
d7d68d27ebe5 Reapply startswith() changes that got lost with stale edit
Matt Mackall <mpm@selenic.com>
parents: 681
diff changeset
    34
        if meta or text.startswith('\1\n'):
10705
194342b34870 filelog: no need to optimize an uncommon case, assume meta = {}
Benoit Boissinot <benoit.boissinot@ens-lyon.org>
parents: 10490
diff changeset
    35
            mt = ["%s: %s\n" % (k, v) for k, v in sorted(meta.iteritems())]
1540
8ca9f5b17257 minor optimization: save some string trash
twaldmann@thinkmo.de
parents: 1117
diff changeset
    36
            text = "\1\n%s\1\n%s" % ("".join(mt), text)
0
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    37
        return self.addrevision(text, transaction, link, p1, p2)
9117c6561b0b Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff changeset
    38
1116
0cdd73b0767c Add some rename debugging support
mpm@selenic.com
parents: 1089
diff changeset
    39
    def renamed(self, node):
7634
14a4337a9b9b revlog: kill from-style imports
Matt Mackall <mpm@selenic.com>
parents: 7622
diff changeset
    40
        if self.parents(node)[0] != revlog.nullid:
1116
0cdd73b0767c Add some rename debugging support
mpm@selenic.com
parents: 1089
diff changeset
    41
            return False
13240
e5060aa22043 filelog: move metadata parsing to a helper function
Matt Mackall <mpm@selenic.com>
parents: 11541
diff changeset
    42
        t = self.revision(node)
e5060aa22043 filelog: move metadata parsing to a helper function
Matt Mackall <mpm@selenic.com>
parents: 11541
diff changeset
    43
        m = _parsemeta(t)
5915
d0576d065993 Prefer i in d over d.has_key(i)
Christian Ebert <blacktrash@gmx.net>
parents: 4635
diff changeset
    44
        if m and "copy" in m:
7634
14a4337a9b9b revlog: kill from-style imports
Matt Mackall <mpm@selenic.com>
parents: 7622
diff changeset
    45
            return (m["copy"], revlog.bin(m["copyrev"]))
1116
0cdd73b0767c Add some rename debugging support
mpm@selenic.com
parents: 1089
diff changeset
    46
        return False
0cdd73b0767c Add some rename debugging support
mpm@selenic.com
parents: 1089
diff changeset
    47
2898
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
    48
    def size(self, rev):
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
    49
        """return the size of a given revision"""
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
    50
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
    51
        # for revisions with renames, we have to go the slow way
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
    52
        node = self.node(rev)
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
    53
        if self.renamed(node):
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
    54
            return len(self.read(node))
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
    55
11540
2370e270a29a filelog: test behaviour for data starting with "\1\n"
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11539
diff changeset
    56
        # XXX if self.read(node).startswith("\1\n"), this returns (size+4)
7634
14a4337a9b9b revlog: kill from-style imports
Matt Mackall <mpm@selenic.com>
parents: 7622
diff changeset
    57
        return revlog.revlog.size(self, rev)
2898
db397c38005d merge: use file size stored in revlog index
Matt Mackall <mpm@selenic.com>
parents: 2895
diff changeset
    58
2887
05257fd28591 filelog: add hash-based comparisons
Matt Mackall <mpm@selenic.com>
parents: 2859
diff changeset
    59
    def cmp(self, node, text):
11539
a463e3c50212 cmp: document the fact that we return True if content is different
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 10706
diff changeset
    60
        """compare text with a given file revision
a463e3c50212 cmp: document the fact that we return True if content is different
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 10706
diff changeset
    61
a463e3c50212 cmp: document the fact that we return True if content is different
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 10706
diff changeset
    62
        returns True if text is different than what is stored.
a463e3c50212 cmp: document the fact that we return True if content is different
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 10706
diff changeset
    63
        """
2887
05257fd28591 filelog: add hash-based comparisons
Matt Mackall <mpm@selenic.com>
parents: 2859
diff changeset
    64
11541
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    65
        t = text
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    66
        if text.startswith('\1\n'):
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    67
            t = '\1\n\1\n' + text
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    68
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    69
        samehashes = not revlog.revlog.cmp(self, node, t)
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    70
        if samehashes:
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    71
            return False
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    72
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    73
        # renaming a file produces a different hash, even if the data
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    74
        # remains unchanged. Check if it's the case (slow):
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    75
        if self.renamed(node):
2887
05257fd28591 filelog: add hash-based comparisons
Matt Mackall <mpm@selenic.com>
parents: 2859
diff changeset
    76
            t2 = self.read(node)
2895
21631c2c09a5 filelog.cmp: return 0 for equality
Matt Mackall <mpm@selenic.com>
parents: 2890
diff changeset
    77
            return t2 != text
2887
05257fd28591 filelog: add hash-based comparisons
Matt Mackall <mpm@selenic.com>
parents: 2859
diff changeset
    78
11541
ab9fa7a85dd9 filelog: cmp: don't read data if hashes are identical (issue2273)
Nicolas Dumazet <nicdumz.commits@gmail.com>
parents: 11540
diff changeset
    79
        return True