tests/md5sum.py
author Na'Tosha Bard <natosha@unity3d.com>
Mon, 13 Feb 2012 18:37:07 +0100
changeset 16120 47ee41fcf42b
parent 14494 1ffeeb91c55d
child 25660 328739ea70c3
permissions -rwxr-xr-x
largefiles: optimize update speed by only updating changed largefiles Historically, during 'hg update', every largefile in the working copy was hashed (which is a very expensive operation on big files) and any largefiles that did not have a hash that matched their standin were updated. This patch optimizes 'hg update' by keeping track of what standins have changed between the old and new revisions, and only updating the largefiles that have changed. This saves a lot of time by avoiding the unecessary calculation of a list of sha1 hashes for big files. With this patch, the time 'hg update' takes to complete is a function of how many largefiles need to be updated and what their size is. Performance tests on a repository with about 80 largefiles ranging from a few MB to about 97 MB are shown below. The tests show how long it takes to run 'hg update' with no changes actually being updated. Mercurial 2.1 release: $ time hg update 0 files updated, 0 files merged, 0 files removed, 0 files unresolved getting changed largefiles 0 largefiles updated, 0 removed real 0m10.045s user 0m9.367s sys 0m0.674s With this patch: $ time hg update 0 files updated, 0 files merged, 0 files removed, 0 files unresolved real 0m0.965s user 0m0.845s sys 0m0.115s The same repsoitory, without the largefiles extension enabled: $ time hg update 0 files updated, 0 files merged, 0 files removed, 0 files unresolved real 0m0.799s user 0m0.684s sys 0m0.111s So before the patch, 'hg update' with no changes was approximately 9.25s slower with largefiles enabled. With this patch, it is approximately 0.165s slower.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
4122
306055f5b65c Unified #! paths for python scripts and removed them for test modules.
Thomas Arendsen Hein <thomas@intevation.de>
parents: 3223
diff changeset
     1
#!/usr/bin/env python
1928
50e1c90b0fcf clarify license on md5sum.py
Peter van Dijk <peter@dataloss.nl>
parents: 1924
diff changeset
     2
#
50e1c90b0fcf clarify license on md5sum.py
Peter van Dijk <peter@dataloss.nl>
parents: 1924
diff changeset
     3
# Based on python's Tools/scripts/md5sum.py
50e1c90b0fcf clarify license on md5sum.py
Peter van Dijk <peter@dataloss.nl>
parents: 1924
diff changeset
     4
#
50e1c90b0fcf clarify license on md5sum.py
Peter van Dijk <peter@dataloss.nl>
parents: 1924
diff changeset
     5
# This software may be used and distributed according to the terms
50e1c90b0fcf clarify license on md5sum.py
Peter van Dijk <peter@dataloss.nl>
parents: 1924
diff changeset
     6
# of the PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2, which is
50e1c90b0fcf clarify license on md5sum.py
Peter van Dijk <peter@dataloss.nl>
parents: 1924
diff changeset
     7
# GPL-compatible.
50e1c90b0fcf clarify license on md5sum.py
Peter van Dijk <peter@dataloss.nl>
parents: 1924
diff changeset
     8
7080
a6477aa893b8 tests: Windows compatibility fixes
Patrick Mezard <pmezard@gmail.com>
parents: 6470
diff changeset
     9
import sys, os
6470
ac0bcd951c2c python 2.6 compatibility: compatibility wrappers for hash functions
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 6212
diff changeset
    10
ac0bcd951c2c python 2.6 compatibility: compatibility wrappers for hash functions
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 6212
diff changeset
    11
try:
ac0bcd951c2c python 2.6 compatibility: compatibility wrappers for hash functions
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 6212
diff changeset
    12
    from hashlib import md5
ac0bcd951c2c python 2.6 compatibility: compatibility wrappers for hash functions
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 6212
diff changeset
    13
except ImportError:
ac0bcd951c2c python 2.6 compatibility: compatibility wrappers for hash functions
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 6212
diff changeset
    14
    from md5 import md5
1924
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    15
7080
a6477aa893b8 tests: Windows compatibility fixes
Patrick Mezard <pmezard@gmail.com>
parents: 6470
diff changeset
    16
try:
a6477aa893b8 tests: Windows compatibility fixes
Patrick Mezard <pmezard@gmail.com>
parents: 6470
diff changeset
    17
    import msvcrt
a6477aa893b8 tests: Windows compatibility fixes
Patrick Mezard <pmezard@gmail.com>
parents: 6470
diff changeset
    18
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
a6477aa893b8 tests: Windows compatibility fixes
Patrick Mezard <pmezard@gmail.com>
parents: 6470
diff changeset
    19
    msvcrt.setmode(sys.stderr.fileno(), os.O_BINARY)
a6477aa893b8 tests: Windows compatibility fixes
Patrick Mezard <pmezard@gmail.com>
parents: 6470
diff changeset
    20
except ImportError:
a6477aa893b8 tests: Windows compatibility fixes
Patrick Mezard <pmezard@gmail.com>
parents: 6470
diff changeset
    21
    pass
a6477aa893b8 tests: Windows compatibility fixes
Patrick Mezard <pmezard@gmail.com>
parents: 6470
diff changeset
    22
1924
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    23
for filename in sys.argv[1:]:
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    24
    try:
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    25
        fp = open(filename, 'rb')
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    26
    except IOError, msg:
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    27
        sys.stderr.write('%s: Can\'t open: %s\n' % (filename, msg))
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    28
        sys.exit(1)
3223
53e843840349 Whitespace/Tab cleanup
Thomas Arendsen Hein <thomas@intevation.de>
parents: 1928
diff changeset
    29
6470
ac0bcd951c2c python 2.6 compatibility: compatibility wrappers for hash functions
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 6212
diff changeset
    30
    m = md5()
1924
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    31
    try:
14494
1ffeeb91c55d check-code: flag 0/1 used as constant Boolean expression
Martin Geisler <mg@lazybytes.net>
parents: 7080
diff changeset
    32
        while True:
1924
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    33
            data = fp.read(8192)
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    34
            if not data:
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    35
                break
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    36
            m.update(data)
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    37
    except IOError, msg:
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    38
        sys.stderr.write('%s: I/O error: %s\n' % (filename, msg))
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    39
        sys.exit(1)
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    40
    sys.stdout.write('%s  %s\n' % (m.hexdigest(), filename))
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    41
46fb38ef9a91 add md5sum.py required by fix in previous changeset
Peter van Dijk <peter@dataloss.nl>
parents:
diff changeset
    42
sys.exit(0)