doc/docchecker
author Pulkit Goyal <pulkit@yandex-team.ru>
Fri, 23 Nov 2018 18:58:16 +0300
changeset 40770 0728d87a8631
parent 29169 c9ab5a0bc7c5
child 41016 9bfbb9fc5871
permissions -rwxr-xr-x
store: append to fncache if there are only new files to write Before this patch, if we have to add a new entry to fncache, we write the whole fncache again which slows things down on large fncache which have millions of entries. Addition of a new entry is common operation while pulling new files or commiting a new file. This patch adds a new fncache.addls set which keeps track of the additions happening and store them. When we write the fncache, we will just read the addls set and append those entries at the end of fncache. We make sure that the entries are new entries by loading the fncache and making sure entry does not exists there. In future if we can check if an entry is new without loading the fncache, that will speed up things more. Performance numbers for commiting a new file: mercurial repo before: 0.08784651756286621 after: 0.08474504947662354 mozilla-central before: 1.83314049243927 after: 1.7054164409637451 netbeans before: 0.7953150272369385 after: 0.7202838659286499 pypy before: 0.17805707454681396 after: 0.13431048393249512 In our internal repo, the performance improvement is in seconds. I have used octobus's ASV perf benchmark thing to get the above numbers. I also see some minute perf improvements related to creating a new commit without a new file, but I believe that's just some noise. Differential Revision: https://phab.mercurial-scm.org/D5301
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
27730
ed86fb2a4187 docchecker: introduce a way to check for poor markup
timeless <timeless@mozdev.org>
parents:
diff changeset
     1
#!/usr/bin/env python
ed86fb2a4187 docchecker: introduce a way to check for poor markup
timeless <timeless@mozdev.org>
parents:
diff changeset
     2
#
ed86fb2a4187 docchecker: introduce a way to check for poor markup
timeless <timeless@mozdev.org>
parents:
diff changeset
     3
# docchecker - look for problematic markup
ed86fb2a4187 docchecker: introduce a way to check for poor markup
timeless <timeless@mozdev.org>
parents:
diff changeset
     4
#
ed86fb2a4187 docchecker: introduce a way to check for poor markup
timeless <timeless@mozdev.org>
parents:
diff changeset
     5
# Copyright 2016 timeless <timeless@mozdev.org> and others
ed86fb2a4187 docchecker: introduce a way to check for poor markup
timeless <timeless@mozdev.org>
parents:
diff changeset
     6
#
ed86fb2a4187 docchecker: introduce a way to check for poor markup
timeless <timeless@mozdev.org>
parents:
diff changeset
     7
# This software may be used and distributed according to the terms of the
ed86fb2a4187 docchecker: introduce a way to check for poor markup
timeless <timeless@mozdev.org>
parents:
diff changeset
     8
# GNU General Public License version 2 or any later version.
29168
8f2805ce93d9 py3: make doc/docchecker use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 28811
diff changeset
     9
29169
c9ab5a0bc7c5 py3: make doc/docchecker use print_function
Pulkit Goyal <7895pulkit@gmail.com>
parents: 29168
diff changeset
    10
from __future__ import absolute_import, print_function
29168
8f2805ce93d9 py3: make doc/docchecker use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 28811
diff changeset
    11
8f2805ce93d9 py3: make doc/docchecker use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 28811
diff changeset
    12
import re
27730
ed86fb2a4187 docchecker: introduce a way to check for poor markup
timeless <timeless@mozdev.org>
parents:
diff changeset
    13
import sys
ed86fb2a4187 docchecker: introduce a way to check for poor markup
timeless <timeless@mozdev.org>
parents:
diff changeset
    14
ed86fb2a4187 docchecker: introduce a way to check for poor markup
timeless <timeless@mozdev.org>
parents:
diff changeset
    15
leadingline = re.compile(r'(^\s*)(\S.*)$')
28810
9934362978e1 docchecker: report context line at most once
timeless <timeless@mozdev.org>
parents: 28049
diff changeset
    16
9934362978e1 docchecker: report context line at most once
timeless <timeless@mozdev.org>
parents: 28049
diff changeset
    17
checks = [
9934362978e1 docchecker: report context line at most once
timeless <timeless@mozdev.org>
parents: 28049
diff changeset
    18
  (r""":hg:`[^`]*'[^`]*`""",
9934362978e1 docchecker: report context line at most once
timeless <timeless@mozdev.org>
parents: 28049
diff changeset
    19
    """warning: please avoid nesting ' in :hg:`...`"""),
9934362978e1 docchecker: report context line at most once
timeless <timeless@mozdev.org>
parents: 28049
diff changeset
    20
  (r'\w:hg:`',
9934362978e1 docchecker: report context line at most once
timeless <timeless@mozdev.org>
parents: 28049
diff changeset
    21
    'warning: please have a space before :hg:'),
28811
1a623585a658 docchecker: try to reject single quotes
timeless <timeless@mozdev.org>
parents: 28810
diff changeset
    22
  (r"""(?:[^a-z][^'.])hg ([^,;"`]*'(?!hg)){2}""",
1a623585a658 docchecker: try to reject single quotes
timeless <timeless@mozdev.org>
parents: 28810
diff changeset
    23
    '''warning: please use " instead of ' for hg ... "..."'''),
28810
9934362978e1 docchecker: report context line at most once
timeless <timeless@mozdev.org>
parents: 28049
diff changeset
    24
]
27730
ed86fb2a4187 docchecker: introduce a way to check for poor markup
timeless <timeless@mozdev.org>
parents:
diff changeset
    25
ed86fb2a4187 docchecker: introduce a way to check for poor markup
timeless <timeless@mozdev.org>
parents:
diff changeset
    26
def check(line):
28810
9934362978e1 docchecker: report context line at most once
timeless <timeless@mozdev.org>
parents: 28049
diff changeset
    27
    messages = []
9934362978e1 docchecker: report context line at most once
timeless <timeless@mozdev.org>
parents: 28049
diff changeset
    28
    for match, msg in checks:
9934362978e1 docchecker: report context line at most once
timeless <timeless@mozdev.org>
parents: 28049
diff changeset
    29
        if re.search(match, line):
9934362978e1 docchecker: report context line at most once
timeless <timeless@mozdev.org>
parents: 28049
diff changeset
    30
            messages.append(msg)
9934362978e1 docchecker: report context line at most once
timeless <timeless@mozdev.org>
parents: 28049
diff changeset
    31
    if messages:
28049
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    32
        print(line)
28810
9934362978e1 docchecker: report context line at most once
timeless <timeless@mozdev.org>
parents: 28049
diff changeset
    33
        for msg in messages:
9934362978e1 docchecker: report context line at most once
timeless <timeless@mozdev.org>
parents: 28049
diff changeset
    34
            print(msg)
27730
ed86fb2a4187 docchecker: introduce a way to check for poor markup
timeless <timeless@mozdev.org>
parents:
diff changeset
    35
ed86fb2a4187 docchecker: introduce a way to check for poor markup
timeless <timeless@mozdev.org>
parents:
diff changeset
    36
def work(file):
28049
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    37
    (llead, lline) = ('', '')
27730
ed86fb2a4187 docchecker: introduce a way to check for poor markup
timeless <timeless@mozdev.org>
parents:
diff changeset
    38
28049
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    39
    for line in file:
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    40
        # this section unwraps lines
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    41
        match = leadingline.match(line)
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    42
        if not match:
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    43
            check(lline)
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    44
            (llead, lline) = ('', '')
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    45
            continue
27730
ed86fb2a4187 docchecker: introduce a way to check for poor markup
timeless <timeless@mozdev.org>
parents:
diff changeset
    46
28049
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    47
        lead, line = match.group(1), match.group(2)
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    48
        if (lead == llead):
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    49
            if (lline != ''):
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    50
                lline += ' ' + line
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    51
            else:
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    52
                lline = line
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    53
        else:
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    54
            check(lline)
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    55
            (llead, lline) = (lead, line)
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    56
    check(lline)
27730
ed86fb2a4187 docchecker: introduce a way to check for poor markup
timeless <timeless@mozdev.org>
parents:
diff changeset
    57
ed86fb2a4187 docchecker: introduce a way to check for poor markup
timeless <timeless@mozdev.org>
parents:
diff changeset
    58
def main():
28049
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    59
    for f in sys.argv[1:]:
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    60
        try:
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    61
            with open(f) as file:
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    62
                work(file)
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    63
        except BaseException as e:
c00f67c15c5a docchecker: use indentation of 4 spaces
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 28048
diff changeset
    64
            print("failed to process %s: %s" % (f, e))
27730
ed86fb2a4187 docchecker: introduce a way to check for poor markup
timeless <timeless@mozdev.org>
parents:
diff changeset
    65
ed86fb2a4187 docchecker: introduce a way to check for poor markup
timeless <timeless@mozdev.org>
parents:
diff changeset
    66
main()