view contrib/check-commit @ 35114:db5038525718

bundle2: implement consume() API on unbundlepart We want bundle parts to not be seekable by default. That means eliminating the generic seek() method. A common pattern in bundle2.py is to seek to the end of the part data. This is mainly used by the part iteration code to ensure the underlying stream is advanced to the next bundle part. In this commit, we establish a dedicated API for consuming a bundle2 part data. We switch users of seek() to it. The old implementation of seek(0, os.SEEK_END) would effectively call self.read(). The new implementation calls self.read(32768) in a loop. The old implementation would therefore assemble a buffer to hold all remaining data being seeked over. For seeking over large bundle parts, this would involve a large allocation and a lot of overhead to collect intermediate data! This overhead can be seen in the results for `hg perfbundleread`: ! bundle2 iterparts() ! wall 10.891305 comb 10.820000 user 7.990000 sys 2.830000 (best of 3) ! wall 8.070791 comb 8.060000 user 7.180000 sys 0.880000 (best of 3) ! bundle2 part seek() ! wall 12.991478 comb 10.390000 user 7.720000 sys 2.670000 (best of 3) ! wall 10.370142 comb 10.350000 user 7.430000 sys 2.920000 (best of 3) Of course, skipping over large payload data isn't likely very common. So I doubt the performance wins will be observed in the wild. Differential Revision: https://phab.mercurial-scm.org/D1388
author Gregory Szorc <gregory.szorc@gmail.com>
date Mon, 13 Nov 2017 20:03:02 -0800
parents 2fb3ae89e4e1
children 47084b5ffd80
line wrap: on
line source

#!/usr/bin/env python
#
# Copyright 2014 Matt Mackall <mpm@selenic.com>
#
# A tool/hook to run basic sanity checks on commits/patches for
# submission to Mercurial. Install by adding the following to your
# .hg/hgrc:
#
# [hooks]
# pretxncommit = contrib/check-commit
#
# The hook can be temporarily bypassed with:
#
# $ BYPASS= hg commit
#
# See also: https://mercurial-scm.org/wiki/ContributingChanges

from __future__ import absolute_import, print_function

import os
import re
import sys

commitheader = r"^(?:# [^\n]*\n)*"
afterheader = commitheader + r"(?!#)"
beforepatch = afterheader + r"(?!\n(?!@@))"

errors = [
    (beforepatch + r".*[(]bc[)]", "(BC) needs to be uppercase"),
    (beforepatch + r".*[(]issue \d\d\d",
     "no space allowed between issue and number"),
    (beforepatch + r".*[(]bug(\d|\s)", "use (issueDDDD) instead of bug"),
    (commitheader + r"# User [^@\n]+\n", "username is not an email address"),
    (commitheader + r"(?!merge with )[^#]\S+[^:] ",
     "summary line doesn't start with 'topic: '"),
    (afterheader + r"[A-Z][a-z]\S+", "don't capitalize summary lines"),
    (afterheader + r"[^\n]*: *[A-Z][a-z]\S+", "don't capitalize summary lines"),
    (afterheader + r"\S*[^A-Za-z0-9-_]\S*: ",
     "summary keyword should be most user-relevant one-word command or topic"),
    (afterheader + r".*\.\s*\n", "don't add trailing period on summary line"),
    (afterheader + r".{79,}", "summary line too long (limit is 78)"),
    (r"\n\+\n( |\+)\n", "adds double empty line"),
    (r"\n \n\+\n", "adds double empty line"),
    # Forbid "_" in function name.
    #
    # We skip the check for cffi related functions. They use names mapping the
    # name of the C function. C function names may contain "_".
    (r"\n\+[ \t]+def (?!cffi)[a-z]+_[a-z]",
     "adds a function with foo_bar naming"),
]

word = re.compile('\S')
def nonempty(first, second):
    if word.search(first):
        return first
    return second

def checkcommit(commit, node=None):
    exitcode = 0
    printed = node is None
    hits = []
    signtag = (afterheader +
          r'Added (tag [^ ]+|signature) for changeset [a-f0-9]{12}')
    if re.search(signtag, commit):
        return 0
    for exp, msg in errors:
        for m in re.finditer(exp, commit):
            end = m.end()
            trailing = re.search(r'(\\n)+$', exp)
            if trailing:
                end -= len(trailing.group()) / 2
            hits.append((end, exp, msg))
    if hits:
        hits.sort()
        pos = 0
        last = ''
        for n, l in enumerate(commit.splitlines(True)):
            pos += len(l)
            while len(hits):
                end, exp, msg = hits[0]
                if pos < end:
                    break
                if not printed:
                    printed = True
                    print("node: %s" % node)
                print("%d: %s" % (n, msg))
                print(" %s" % nonempty(l, last)[:-1])
                if "BYPASS" not in os.environ:
                    exitcode = 1
                del hits[0]
            last = nonempty(l, last)

    return exitcode

def readcommit(node):
    return os.popen("hg export %s" % node).read()

if __name__ == "__main__":
    exitcode = 0
    node = os.environ.get("HG_NODE")

    if node:
        commit = readcommit(node)
        exitcode = checkcommit(commit)
    elif sys.argv[1:]:
        for node in sys.argv[1:]:
            exitcode |= checkcommit(readcommit(node), node)
    else:
        commit = sys.stdin.read()
        exitcode = checkcommit(commit)
    sys.exit(exitcode)