tests/svnxml.py
author Gregory Szorc <gregory.szorc@gmail.com>
Mon, 22 Aug 2016 21:48:50 -0700
changeset 29830 92ac2baaea86
parent 28947 812eb3b7dc43
child 40216 c17d73bf6a4d
permissions -rw-r--r--
revlog: use an LRU cache for delta chain bases Profiling using statprof revealed a hotspot during changegroup application calculating delta chain bases on generaldelta repos. Essentially, revlog._addrevision() was performing a lot of redundant work tracing the delta chain as part of determining when the chain distance was acceptable. This was most pronounced when adding revisions to manifests, which can have delta chains thousands of revisions long. There was a delta chain base cache on revlogs before, but it only captured a single revision. This was acceptable before generaldelta, when _addrevision would build deltas from the previous revision and thus we'd pretty much guarantee a cache hit when resolving the delta chain base on a subsequent _addrevision call. However, it isn't suitable for generaldelta because parent revisions aren't necessarily the last processed revision. This patch converts the delta chain base cache to an LRU dict cache. The cache can hold multiple entries, so generaldelta repos have a higher chance of getting a cache hit. The impact of this change when processing changegroup additions is significant. On a generaldelta conversion of the "mozilla-unified" repo (which contains heads of the main Firefox repositories in chronological order - this means there are lots of transitions between heads in revlog order), this change has the following impact when performing an `hg unbundle` of an uncompressed bundle of the repo: before: 5:42 CPU time after: 4:34 CPU time Most of this time is saved when applying the changelog and manifest revlogs: before: 2:30 CPU time after: 1:17 CPU time That nearly a 50% reduction in CPU time applying changesets and manifests! Applying a gzipped bundle of the same repo (effectively simulating a `hg clone` over HTTP) showed a similar speedup: before: 5:53 CPU time after: 4:46 CPU time Wall time improvements were basically the same as CPU time. I didn't measure explicitly, but it feels like most of the time is saved when processing manifests. This makes sense, as large manifests tend to have very long delta chains and thus benefit the most from this cache. So, this change effectively makes changegroup application (which is used by `hg unbundle`, `hg clone`, `hg pull`, `hg unshelve`, and various other commands) significantly faster when delta chains are long (which can happen on repos with large numbers of files and thus large manifests). In theory, this change can result in more memory utilization. However, we're caching a dict of ints. At most we have 200 ints + Python object overhead per revlog. And, the cache is really only populated when performing read-heavy operations, such as adding changegroups or scanning an individual revlog. For memory bloat to be an issue, we'd need to scan/read several revisions from several revlogs all while having active references to several revlogs. I don't think there are many operations that do this, so I don't think memory bloat from the cache will be an issue.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
16512
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
     1
# Read the output of a "svn log --xml" command on stdin, parse it and
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
     2
# print a subset of attributes common to all svn versions tested by
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
     3
# hg.
28947
812eb3b7dc43 py3: use absolute_import in svnxml.py
Robert Stanca <robert.stanca7@gmail.com>
parents: 16512
diff changeset
     4
from __future__ import absolute_import
812eb3b7dc43 py3: use absolute_import in svnxml.py
Robert Stanca <robert.stanca7@gmail.com>
parents: 16512
diff changeset
     5
import sys
812eb3b7dc43 py3: use absolute_import in svnxml.py
Robert Stanca <robert.stanca7@gmail.com>
parents: 16512
diff changeset
     6
import xml.dom.minidom
16512
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
     7
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
     8
def xmltext(e):
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
     9
    return ''.join(c.data for c
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    10
                   in e.childNodes
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    11
                   if c.nodeType == c.TEXT_NODE)
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    12
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    13
def parseentry(entry):
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    14
    e = {}
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    15
    e['revision'] = entry.getAttribute('revision')
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    16
    e['author'] = xmltext(entry.getElementsByTagName('author')[0])
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    17
    e['msg'] = xmltext(entry.getElementsByTagName('msg')[0])
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    18
    e['paths'] = []
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    19
    paths = entry.getElementsByTagName('paths')
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    20
    if paths:
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    21
        paths = paths[0]
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    22
        for p in paths.getElementsByTagName('path'):
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    23
            action = p.getAttribute('action')
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    24
            path = xmltext(p)
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    25
            frompath = p.getAttribute('copyfrom-path')
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    26
            fromrev = p.getAttribute('copyfrom-rev')
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    27
            e['paths'].append((path, action, frompath, fromrev))
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    28
    return e
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    29
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    30
def parselog(data):
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    31
    entries = []
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    32
    doc = xml.dom.minidom.parseString(data)
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    33
    for e in doc.getElementsByTagName('logentry'):
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    34
        entries.append(parseentry(e))
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    35
    return entries
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    36
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    37
def printentries(entries):
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    38
    fp = sys.stdout
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    39
    for e in entries:
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    40
        for k in ('revision', 'author', 'msg'):
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    41
            fp.write(('%s: %s\n' % (k, e[k])).encode('utf-8'))
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    42
        for path, action, fpath, frev in sorted(e['paths']):
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    43
            frominfo = ''
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    44
            if frev:
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    45
                frominfo = ' (from %s@%s)' % (fpath, frev)
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    46
            p = ' %s %s%s\n' % (action, path, frominfo)
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    47
            fp.write(p.encode('utf-8'))
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    48
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    49
if __name__ == '__main__':
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    50
    data = sys.stdin.read()
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    51
    entries = parselog(data)
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    52
    printentries(entries)
c58bdecdb800 test-convert-svn-sink: add helper to smooth svn xml output
Patrick Mezard <patrick@mezard.eu>
parents:
diff changeset
    53