Mercurial > hg
view hgext/strip.py @ 29830:92ac2baaea86
revlog: use an LRU cache for delta chain bases
Profiling using statprof revealed a hotspot during changegroup
application calculating delta chain bases on generaldelta repos.
Essentially, revlog._addrevision() was performing a lot of redundant
work tracing the delta chain as part of determining when the chain
distance was acceptable. This was most pronounced when adding
revisions to manifests, which can have delta chains thousands of
revisions long.
There was a delta chain base cache on revlogs before, but it only
captured a single revision. This was acceptable before generaldelta,
when _addrevision would build deltas from the previous revision and
thus we'd pretty much guarantee a cache hit when resolving the delta
chain base on a subsequent _addrevision call. However, it isn't
suitable for generaldelta because parent revisions aren't necessarily
the last processed revision.
This patch converts the delta chain base cache to an LRU dict cache.
The cache can hold multiple entries, so generaldelta repos have a
higher chance of getting a cache hit.
The impact of this change when processing changegroup additions is
significant. On a generaldelta conversion of the "mozilla-unified"
repo (which contains heads of the main Firefox repositories in
chronological order - this means there are lots of transitions between
heads in revlog order), this change has the following impact when
performing an `hg unbundle` of an uncompressed bundle of the repo:
before: 5:42 CPU time
after: 4:34 CPU time
Most of this time is saved when applying the changelog and manifest
revlogs:
before: 2:30 CPU time
after: 1:17 CPU time
That nearly a 50% reduction in CPU time applying changesets and
manifests!
Applying a gzipped bundle of the same repo (effectively simulating a
`hg clone` over HTTP) showed a similar speedup:
before: 5:53 CPU time
after: 4:46 CPU time
Wall time improvements were basically the same as CPU time.
I didn't measure explicitly, but it feels like most of the time
is saved when processing manifests. This makes sense, as large
manifests tend to have very long delta chains and thus benefit the
most from this cache.
So, this change effectively makes changegroup application (which is
used by `hg unbundle`, `hg clone`, `hg pull`, `hg unshelve`, and
various other commands) significantly faster when delta chains are
long (which can happen on repos with large numbers of files and thus
large manifests).
In theory, this change can result in more memory utilization. However,
we're caching a dict of ints. At most we have 200 ints + Python object
overhead per revlog. And, the cache is really only populated when
performing read-heavy operations, such as adding changegroups or
scanning an individual revlog. For memory bloat to be an issue, we'd
need to scan/read several revisions from several revlogs all while
having active references to several revlogs. I don't think there are
many operations that do this, so I don't think memory bloat from the
cache will be an issue.
author | Gregory Szorc <gregory.szorc@gmail.com> |
---|---|
date | Mon, 22 Aug 2016 21:48:50 -0700 |
parents | a0939666b836 |
children | d5883fd055c6 |
line wrap: on
line source
"""strip changesets and their descendants from history This extension allows you to strip changesets and all their descendants from the repository. See the command help for details. """ from __future__ import absolute_import from mercurial.i18n import _ from mercurial import ( bookmarks as bookmarksmod, cmdutil, error, hg, lock as lockmod, merge, node as nodemod, repair, scmutil, util, ) nullid = nodemod.nullid release = lockmod.release cmdtable = {} command = cmdutil.command(cmdtable) # Note for extension authors: ONLY specify testedwith = 'internal' for # extensions which SHIP WITH MERCURIAL. Non-mainline extensions should # be specifying the version(s) of Mercurial they are tested with, or # leave the attribute unspecified. testedwith = 'internal' def checksubstate(repo, baserev=None): '''return list of subrepos at a different revision than substate. Abort if any subrepos have uncommitted changes.''' inclsubs = [] wctx = repo[None] if baserev: bctx = repo[baserev] else: bctx = wctx.parents()[0] for s in sorted(wctx.substate): wctx.sub(s).bailifchanged(True) if s not in bctx.substate or bctx.sub(s).dirty(): inclsubs.append(s) return inclsubs def checklocalchanges(repo, force=False, excsuffix=''): cmdutil.checkunfinished(repo) s = repo.status() if not force: if s.modified or s.added or s.removed or s.deleted: _("local changes found") # i18n tool detection raise error.Abort(_("local changes found" + excsuffix)) if checksubstate(repo): _("local changed subrepos found") # i18n tool detection raise error.Abort(_("local changed subrepos found" + excsuffix)) return s def strip(ui, repo, revs, update=True, backup=True, force=None, bookmarks=None): wlock = lock = None try: wlock = repo.wlock() lock = repo.lock() if update: checklocalchanges(repo, force=force) urev, p2 = repo.changelog.parents(revs[0]) if (util.safehasattr(repo, 'mq') and p2 != nullid and p2 in [x.node for x in repo.mq.applied]): urev = p2 hg.clean(repo, urev) repo.dirstate.write(repo.currenttransaction()) repair.strip(ui, repo, revs, backup) repomarks = repo._bookmarks if bookmarks: with repo.transaction('strip') as tr: if repo._activebookmark in bookmarks: bookmarksmod.deactivate(repo) for bookmark in bookmarks: del repomarks[bookmark] repomarks.recordchange(tr) for bookmark in sorted(bookmarks): ui.write(_("bookmark '%s' deleted\n") % bookmark) finally: release(lock, wlock) @command("strip", [ ('r', 'rev', [], _('strip specified revision (optional, ' 'can specify revisions without this ' 'option)'), _('REV')), ('f', 'force', None, _('force removal of changesets, discard ' 'uncommitted changes (no backup)')), ('', 'no-backup', None, _('no backups')), ('', 'nobackup', None, _('no backups (DEPRECATED)')), ('n', '', None, _('ignored (DEPRECATED)')), ('k', 'keep', None, _("do not modify working directory during " "strip")), ('B', 'bookmark', [], _("remove revs only reachable from given" " bookmark"))], _('hg strip [-k] [-f] [-B bookmark] [-r] REV...')) def stripcmd(ui, repo, *revs, **opts): """strip changesets and all their descendants from the repository The strip command removes the specified changesets and all their descendants. If the working directory has uncommitted changes, the operation is aborted unless the --force flag is supplied, in which case changes will be discarded. If a parent of the working directory is stripped, then the working directory will automatically be updated to the most recent available ancestor of the stripped parent after the operation completes. Any stripped changesets are stored in ``.hg/strip-backup`` as a bundle (see :hg:`help bundle` and :hg:`help unbundle`). They can be restored by running :hg:`unbundle .hg/strip-backup/BUNDLE`, where BUNDLE is the bundle file created by the strip. Note that the local revision numbers will in general be different after the restore. Use the --no-backup option to discard the backup bundle once the operation completes. Strip is not a history-rewriting operation and can be used on changesets in the public phase. But if the stripped changesets have been pushed to a remote repository you will likely pull them again. Return 0 on success. """ backup = True if opts.get('no_backup') or opts.get('nobackup'): backup = False cl = repo.changelog revs = list(revs) + opts.get('rev') revs = set(scmutil.revrange(repo, revs)) with repo.wlock(): bookmarks = set(opts.get('bookmark')) if bookmarks: repomarks = repo._bookmarks if not bookmarks.issubset(repomarks): raise error.Abort(_("bookmark '%s' not found") % ','.join(sorted(bookmarks - set(repomarks.keys())))) # If the requested bookmark is not the only one pointing to a # a revision we have to only delete the bookmark and not strip # anything. revsets cannot detect that case. nodetobookmarks = {} for mark, node in repomarks.iteritems(): nodetobookmarks.setdefault(node, []).append(mark) for marks in nodetobookmarks.values(): if bookmarks.issuperset(marks): rsrevs = repair.stripbmrevset(repo, marks[0]) revs.update(set(rsrevs)) if not revs: lock = tr = None try: lock = repo.lock() tr = repo.transaction('bookmark') for bookmark in bookmarks: del repomarks[bookmark] repomarks.recordchange(tr) tr.close() for bookmark in sorted(bookmarks): ui.write(_("bookmark '%s' deleted\n") % bookmark) finally: release(lock, tr) if not revs: raise error.Abort(_('empty revision set')) descendants = set(cl.descendants(revs)) strippedrevs = revs.union(descendants) roots = revs.difference(descendants) update = False # if one of the wdir parent is stripped we'll need # to update away to an earlier revision for p in repo.dirstate.parents(): if p != nullid and cl.rev(p) in strippedrevs: update = True break rootnodes = set(cl.node(r) for r in roots) q = getattr(repo, 'mq', None) if q is not None and q.applied: # refresh queue state if we're about to strip # applied patches if cl.rev(repo.lookup('qtip')) in strippedrevs: q.applieddirty = True start = 0 end = len(q.applied) for i, statusentry in enumerate(q.applied): if statusentry.node in rootnodes: # if one of the stripped roots is an applied # patch, only part of the queue is stripped start = i break del q.applied[start:end] q.savedirty() revs = sorted(rootnodes) if update and opts.get('keep'): urev, p2 = repo.changelog.parents(revs[0]) if (util.safehasattr(repo, 'mq') and p2 != nullid and p2 in [x.node for x in repo.mq.applied]): urev = p2 uctx = repo[urev] # only reset the dirstate for files that would actually change # between the working context and uctx descendantrevs = repo.revs("%s::." % uctx.rev()) changedfiles = [] for rev in descendantrevs: # blindly reset the files, regardless of what actually changed changedfiles.extend(repo[rev].files()) # reset files that only changed in the dirstate too dirstate = repo.dirstate dirchanges = [f for f in dirstate if dirstate[f] != 'n'] changedfiles.extend(dirchanges) repo.dirstate.rebuild(urev, uctx.manifest(), changedfiles) repo.dirstate.write(repo.currenttransaction()) # clear resolve state merge.mergestate.clean(repo, repo['.'].node()) update = False strip(ui, repo, revs, backup=backup, update=update, force=opts.get('force'), bookmarks=bookmarks) return 0