doc/docchecker
author Gregory Szorc <gregory.szorc@gmail.com>
Sat, 13 May 2017 16:26:43 -0700
changeset 32332 0ad0d26ff703
parent 29169 c9ab5a0bc7c5
child 41016 9bfbb9fc5871
permissions -rwxr-xr-x
changelog: load pending file directly When changelogs are written, a copy of the index (or inline revlog) may be written to an 00changelog.i.a file to facilitate hooks and other processes having access to the pending data before it is finalized. The way it works today, the localrepo class loads the changelog like normal. Then, if it detects a pending transaction, it asks the changelog class to load a pending changelog. The changelog class looks for a 00changelog.i.a file. If it exists, it is loaded and internal data structures on the new revlog class are copied to the original instance. The existing mechanism is inefficient because it loads 2 revlog files. The index, node map, and chunk cache for 00changelog.i are thrown away and replaced by those for 00changelog.i.a. The existing mechanism is also brittle because it is a layering violation to access the data structures being accessed. For example, the code copies the "chunk cache" because for inline revlogs this cache contains the raw revision chunks and allows the original changelog/revlog instance to access revision data for these pending revisions. This whole behavior of course relies on the revlog constructor reading the entirety of an inline revlog into memory and caching it. That's why it is brittle. (I discovered all this as part of modifying behavior of the chunk cache.) This patch streamlines the loading of a pending 00changelog.i.a revlog by doing it directly in the changelog constructor if told to do so. When this code path is active, we no longer load the 00changelog.i file at all. The only negative outcome I see from this change is if loading 00changelog.i was somehow facilitating a role. But I can't imagine what that would be because we throw away its data (the index data structures are replaced and inline revision data is replaced via the chunk cache) and since 00changelog.i.a is a copy of 00changelog.i, file content should be identical, so there should be no meaninful file integrity checking at play. I think this was all just sub-optimal code.

#!/usr/bin/env python
#
# docchecker - look for problematic markup
#
# Copyright 2016 timeless <timeless@mozdev.org> and others
#
# This software may be used and distributed according to the terms of the
# GNU General Public License version 2 or any later version.

from __future__ import absolute_import, print_function

import re
import sys

leadingline = re.compile(r'(^\s*)(\S.*)$')

checks = [
  (r""":hg:`[^`]*'[^`]*`""",
    """warning: please avoid nesting ' in :hg:`...`"""),
  (r'\w:hg:`',
    'warning: please have a space before :hg:'),
  (r"""(?:[^a-z][^'.])hg ([^,;"`]*'(?!hg)){2}""",
    '''warning: please use " instead of ' for hg ... "..."'''),
]

def check(line):
    messages = []
    for match, msg in checks:
        if re.search(match, line):
            messages.append(msg)
    if messages:
        print(line)
        for msg in messages:
            print(msg)

def work(file):
    (llead, lline) = ('', '')

    for line in file:
        # this section unwraps lines
        match = leadingline.match(line)
        if not match:
            check(lline)
            (llead, lline) = ('', '')
            continue

        lead, line = match.group(1), match.group(2)
        if (lead == llead):
            if (lline != ''):
                lline += ' ' + line
            else:
                lline = line
        else:
            check(lline)
            (llead, lline) = (lead, line)
    check(lline)

def main():
    for f in sys.argv[1:]:
        try:
            with open(f) as file:
                work(file)
        except BaseException as e:
            print("failed to process %s: %s" % (f, e))

main()