tests/md5sum.py
author David Greenaway <hg-dev@davidgreenaway.com>
Sat, 03 Apr 2010 11:58:16 +1100
changeset 11060 e6df01776e08
parent 7080 a6477aa893b8
child 14494 1ffeeb91c55d
permissions -rwxr-xr-x
findrenames: Optimise "addremove -s100" by matching files by their SHA1 hashes. We speed up 'findrenames' for the usecase when a user specifies they want a similarity of 100% by matching files by their exact SHA1 hash value. This reduces the number of comparisons required to find exact matches from O(n^2) to O(n). While it would be nice if we could just use mercurial's pre-calculated SHA1 hash for existing files, this hash includes the file's ancestor information making it unsuitable for our purposes. Instead, we calculate the hash of old content from scratch. The following benchmarks were taken on the current head of crew: addremove 100% similarity: rm -rf *; hg up -C; mv tests tests.new hg --time addremove -s100 --dry-run before: real 176.350 secs (user 128.890+0.000 sys 47.430+0.000) after: real 2.130 secs (user 1.890+0.000 sys 0.240+0.000) addremove 75% similarity: rm -rf *; hg up -C; mv tests tests.new; \ for i in tests.new/*; do echo x >> $i; done hg --time addremove -s75 --dry-run before: real 264.560 secs (user 215.130+0.000 sys 49.410+0.000) after: real 218.710 secs (user 172.790+0.000 sys 45.870+0.000)
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
4122
306055f5b65c Unified #! paths for python scripts and removed them for test modules.
Thomas Arendsen Hein <thomas@intevation.de>
parents: 3223
diff changeset
     1
#!/usr/bin/env python
1928
50e1c90b0fcf clarify license on md5sum.py
Peter van Dijk <peter@dataloss.nl>
parents: 1924
diff changeset
     2
#
50e1c90b0fcf clarify license on md5sum.py
Peter van Dijk <peter@dataloss.nl>
parents: 1924
diff changeset
     3
# Based on python's Tools/scripts/md5sum.py
50e1c90b0fcf clarify license on md5sum.py
Peter van Dijk <peter@dataloss.nl>
parents: 1924
diff changeset
     4
#
50e1c90b0fcf clarify license on md5sum.py
Peter van Dijk <peter@dataloss.nl>
parents: 1924
diff changeset
     5
# This software may be used and distributed according to the terms
50e1c90b0fcf clarify license on md5sum.py
Peter van Dijk <peter@dataloss.nl>
parents: 1924
diff changeset
     6
# of the PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2, which is
50e1c90b0fcf clarify license on md5sum.py
Peter van Dijk <peter@dataloss.nl>
parents: 1924
diff changeset
     7
# GPL-compatible.
50e1c90b0fcf clarify license on md5sum.py
Peter van Dijk <peter@dataloss.nl>
parents: 1924
diff changeset
     8
7080
a6477aa893b8 tests: Windows compatibility fixes
Patrick Mezard <pmezard@gmail.com>
parents: 6470
diff changeset
     9
import sys, os
6470
ac0bcd951c2c python 2.6 compatibility: compatibility wrappers for hash functions
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 6212
diff changeset
    10
ac0bcd951c2c python 2.6 compatibility: compatibility wrappers for hash functions
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 6212
diff changeset
    11
try:
ac0bcd951c2c python 2.6 compatibility: compatibility wrappers for hash functions
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 6212
diff changeset
    12
    from hashlib import md5
ac0bcd951c2c python 2.6 compatibility: compatibility wrappers for hash functions
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 6212
diff changeset
    13
except ImportError:
ac0bcd951c2c python 2.6 compatibility: compatibility wrappers for hash functions
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 6212
diff changeset
    14
    from md5 import md5
1924
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    15
7080
a6477aa893b8 tests: Windows compatibility fixes
Patrick Mezard <pmezard@gmail.com>
parents: 6470
diff changeset
    16
try:
a6477aa893b8 tests: Windows compatibility fixes
Patrick Mezard <pmezard@gmail.com>
parents: 6470
diff changeset
    17
    import msvcrt
a6477aa893b8 tests: Windows compatibility fixes
Patrick Mezard <pmezard@gmail.com>
parents: 6470
diff changeset
    18
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
a6477aa893b8 tests: Windows compatibility fixes
Patrick Mezard <pmezard@gmail.com>
parents: 6470
diff changeset
    19
    msvcrt.setmode(sys.stderr.fileno(), os.O_BINARY)
a6477aa893b8 tests: Windows compatibility fixes
Patrick Mezard <pmezard@gmail.com>
parents: 6470
diff changeset
    20
except ImportError:
a6477aa893b8 tests: Windows compatibility fixes
Patrick Mezard <pmezard@gmail.com>
parents: 6470
diff changeset
    21
    pass
a6477aa893b8 tests: Windows compatibility fixes
Patrick Mezard <pmezard@gmail.com>
parents: 6470
diff changeset
    22
1924
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    23
for filename in sys.argv[1:]:
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    24
    try:
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    25
        fp = open(filename, 'rb')
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    26
    except IOError, msg:
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    27
        sys.stderr.write('%s: Can\'t open: %s\n' % (filename, msg))
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    28
        sys.exit(1)
3223
53e843840349 Whitespace/Tab cleanup
Thomas Arendsen Hein <thomas@intevation.de>
parents: 1928
diff changeset
    29
6470
ac0bcd951c2c python 2.6 compatibility: compatibility wrappers for hash functions
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 6212
diff changeset
    30
    m = md5()
1924
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    31
    try:
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    32
        while 1:
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    33
            data = fp.read(8192)
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    34
            if not data:
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    35
                break
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    36
            m.update(data)
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    37
    except IOError, msg:
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    38
        sys.stderr.write('%s: I/O error: %s\n' % (filename, msg))
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    39
        sys.exit(1)
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    40
    sys.stdout.write('%s  %s\n' % (m.hexdigest(), filename))
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    41
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    42
sys.exit(0)