contrib/casesmash.py
author Siddharth Agarwal <sid0@fb.com>
Fri, 24 Oct 2014 11:39:39 -0700
branchstable
changeset 23097 30124c40d11f
parent 19378 9de689d20230
child 28351 42a7301fb4d5
permissions -rw-r--r--
util.fspath: use a dict rather than a linear scan for lookups Previously, we'd scan through the entire directory listing looking for a normalized match. This is O(N) in the number of files in the directory. If we decide to call util.fspath on each file in it, the overall complexity works out to O(N^2). This becomes a problem with directories a few thousand files or larger. Switch to using a dictionary instead. There is a slightly higher upfront cost to pay, but for cases like the above this is amortized O(1). Plus there is a lower constant factor because generator comprehensions are faster than for loops, so overall it works out to be a very small loss in performance for 1 file, and a huge gain when there's more. For a large repo with around 200k files in it on a case-insensitive file system, for a large directory with over 30,000 files in it, the following command was tested: ls | shuf -n $COUNT | xargs hg status This command leads to util.fspath being called on $COUNT files in the directory. COUNT before after 1 0.77s 0.78s 100 1.42s 0.80s 1000 6.3s 0.96s I also tested with COUNT=10000, but before took too long so I gave up.

import os, __builtin__
from mercurial import util

def lowerwrap(scope, funcname):
    f = getattr(scope, funcname)
    def wrap(fname, *args, **kwargs):
        d, base = os.path.split(fname)
        try:
            files = os.listdir(d or '.')
        except OSError:
            files = []
        if base in files:
            return f(fname, *args, **kwargs)
        for fn in files:
            if fn.lower() == base.lower():
                return f(os.path.join(d, fn), *args, **kwargs)
        return f(fname, *args, **kwargs)
    scope.__dict__[funcname] = wrap

def normcase(path):
    return path.lower()

os.path.normcase = normcase

for f in 'file open'.split():
    lowerwrap(__builtin__, f)

for f in "chmod chown open lstat stat remove unlink".split():
    lowerwrap(os, f)

for f in "exists lexists".split():
    lowerwrap(os.path, f)

lowerwrap(util, 'posixfile')