tests/svnxml.py
author FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
Mon, 18 May 2015 02:52:55 +0900
changeset 25174 86298718b01c
parent 16512 c58bdecdb800
child 28947 812eb3b7dc43
permissions -rw-r--r--
import-checker: make imported_modules yield absolute dotted_name_of_path This patch makes `imported_modules()` always yield absolute `dotted_name_of_path()`-ed name by strict detection with `fromlocal()`. This change improves circular detection in some points: - locally defined modules, of which name collides against one of standard library, can be examined correctly For example, circular import related to `commands` is overlooked before this patch. - names not useful for circular detection are ignored Names below are also yielded before this patch: - module names of standard library (= not locally defined one) - non-module names (e.g. `node.nullid` of `from node import nullid`) These redundant names decrease performance of circular detection. For example, with files at 1ef96a3b8b89, average loops per file in `checkmod()` is reduced from 165 to 109. - `__init__` can be handled correctly in `checkmod()` For example, current implementation has problems below: - `from xxx import yyy` doesn't recognize `xxx.__init__` as imported - `xxx.__init__` imported via `import xxx` is treated as `xxx`, and circular detection is aborted, because `key` of such module name is not `xxx` but `xxx.__init__` - it is easy to enhance for `from . import xxx` style or so (in the future) Module name detection in `imported_modules()` can use information in `ast.ImportFrom` fully. It is assumed that all locally defined modules are correctly specified to `import-checker.py` at once. Strictly speaking, when `from foo.bar.baz import module1` imports `foo.bar.baz.module1` module, current `imported_modules()` yields only `foo.bar.baz.__init__`, even though also `foo.__init__` and `foo.bar.__init__` should be yielded to detect circular import exactly. But this limitation is reasonable one for improvement in this patch, because current `__init__` files in Mercurial seems to be implemented carefully.

# Read the output of a "svn log --xml" command on stdin, parse it and
# print a subset of attributes common to all svn versions tested by
# hg.
import xml.dom.minidom, sys

def xmltext(e):
    return ''.join(c.data for c
                   in e.childNodes
                   if c.nodeType == c.TEXT_NODE)

def parseentry(entry):
    e = {}
    e['revision'] = entry.getAttribute('revision')
    e['author'] = xmltext(entry.getElementsByTagName('author')[0])
    e['msg'] = xmltext(entry.getElementsByTagName('msg')[0])
    e['paths'] = []
    paths = entry.getElementsByTagName('paths')
    if paths:
        paths = paths[0]
        for p in paths.getElementsByTagName('path'):
            action = p.getAttribute('action')
            path = xmltext(p)
            frompath = p.getAttribute('copyfrom-path')
            fromrev = p.getAttribute('copyfrom-rev')
            e['paths'].append((path, action, frompath, fromrev))
    return e

def parselog(data):
    entries = []
    doc = xml.dom.minidom.parseString(data)
    for e in doc.getElementsByTagName('logentry'):
        entries.append(parseentry(e))
    return entries

def printentries(entries):
    fp = sys.stdout
    for e in entries:
        for k in ('revision', 'author', 'msg'):
            fp.write(('%s: %s\n' % (k, e[k])).encode('utf-8'))
        for path, action, fpath, frev in sorted(e['paths']):
            frominfo = ''
            if frev:
                frominfo = ' (from %s@%s)' % (fpath, frev)
            p = ' %s %s%s\n' % (action, path, frominfo)
            fp.write(p.encode('utf-8'))

if __name__ == '__main__':
    data = sys.stdin.read()
    entries = parselog(data)
    printentries(entries)