view tests/md5sum.py @ 28000:d4247c306d82

copies: optimize forward copy detection logic for rebases Forward copy detection (i.e. detecting what files have been moved/copied in commit X since ancestor Y) previously required diff'ing the manifests of both X and Y. This was expensive since it required reading both entire manifests and doing a set difference (they weren't already in a set because of the lazymanifest work). This cost almost 1 second on very large repositories, and happens N times for a rebase of N commits. This patch optimizes it for the case of rebase. In a rebase, we are comparing a commit against it's immediate parent, and therefore we can know what files changed by looking at ctx.files(). This lets us drastically decrease the size of the set comparison, and makes it O(# of changes) instead of O(size of manifest). This makes it take 1ms instead of 1000ms.
author Durham Goode <durham@fb.com>
date Fri, 05 Feb 2016 13:23:24 -0800
parents 328739ea70c3
children 6a98f9408a50
line wrap: on
line source

#!/usr/bin/env python
#
# Based on python's Tools/scripts/md5sum.py
#
# This software may be used and distributed according to the terms
# of the PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2, which is
# GPL-compatible.

import sys, os

try:
    from hashlib import md5
except ImportError:
    from md5 import md5

try:
    import msvcrt
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
    msvcrt.setmode(sys.stderr.fileno(), os.O_BINARY)
except ImportError:
    pass

for filename in sys.argv[1:]:
    try:
        fp = open(filename, 'rb')
    except IOError as msg:
        sys.stderr.write('%s: Can\'t open: %s\n' % (filename, msg))
        sys.exit(1)

    m = md5()
    try:
        while True:
            data = fp.read(8192)
            if not data:
                break
            m.update(data)
    except IOError as msg:
        sys.stderr.write('%s: I/O error: %s\n' % (filename, msg))
        sys.exit(1)
    sys.stdout.write('%s  %s\n' % (m.hexdigest(), filename))

sys.exit(0)