tests/md5sum.py
author Durham Goode <durham@fb.com>
Tue, 27 Jan 2015 11:26:27 -0800
changeset 26013 38f92d12357c
parent 25660 328739ea70c3
child 29485 6a98f9408a50
permissions -rwxr-xr-x
copy: add flag for disabling copy tracing Copy tracing can be up to 80% of rebase time when rebasing stacks of commits in large repos (hundreds of thousands of files). This provides the option of turning off the majority of copy tracing. It does not turn off _forwardcopies() since that is used to carry copy information inside a commit across a rebase. This will affect the situation where a user edits a file, then rebases on top of commits that have moved that file. The move will not be detected and the user will have to manually resolve the issue (possibly by redoing the rebase with this flag off). The reason to have a flag instead of trying to fix the actual copy tracing performance is that copy tracing is fundamentally an O(number of files in the repo) operation. In order to know if file X in the rebase source was copied anywhere, we have to walk the filelog for every new file that exists in the rebase destination (i.e. a file in the destination that is not in the common ancestor). Without an index that lets us trace forward (i.e. from file Y in the common ancestor forward to the rebase destination), it will never be an O(number of changes in my branch) operation. In mozilla-central, rebasing a 3 commit stack across 20,000 revs goes from 39s to 11s.

#!/usr/bin/env python
#
# Based on python's Tools/scripts/md5sum.py
#
# This software may be used and distributed according to the terms
# of the PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2, which is
# GPL-compatible.

import sys, os

try:
    from hashlib import md5
except ImportError:
    from md5 import md5

try:
    import msvcrt
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
    msvcrt.setmode(sys.stderr.fileno(), os.O_BINARY)
except ImportError:
    pass

for filename in sys.argv[1:]:
    try:
        fp = open(filename, 'rb')
    except IOError as msg:
        sys.stderr.write('%s: Can\'t open: %s\n' % (filename, msg))
        sys.exit(1)

    m = md5()
    try:
        while True:
            data = fp.read(8192)
            if not data:
                break
            m.update(data)
    except IOError as msg:
        sys.stderr.write('%s: I/O error: %s\n' % (filename, msg))
        sys.exit(1)
    sys.stdout.write('%s  %s\n' % (m.hexdigest(), filename))

sys.exit(0)