view mercurial/bookmarks.py @ 30442:41a8106789ca

util: implement zstd compression engine Now that zstd is vendored and being built (in some configurations), we can implement a compression engine for zstd! The zstd engine is a little different from existing engines. Because it may not always be present, we have to defer load the module in case importing it fails. We facilitate this via a cached property that holds a reference to the module or None. The "available" method is implemented to reflect reality. The zstd engine declares its ability to handle bundles using the "zstd" human name and the "ZS" internal name. The latter was chosen because internal names are 2 characters (by only convention I think) and "ZS" seems reasonable. The engine, like others, supports specifying the compression level. However, there are no consumers of this API that yet pass in that argument. I have plans to change that, so stay tuned. Since all we need to do to support bundle generation with a new compression engine is implement and register the compression engine, bundle generation with zstd "just works!" Tests demonstrating this have been added. How does performance of zstd for bundle generation compare? On the mozilla-unified repo, `hg bundle --all -t <engine>-v2` yields the following on my i7-6700K on Linux: engine CPU time bundle size vs orig size throughput none 97.0s 4,054,405,584 100.0% 41.8 MB/s bzip2 (l=9) 393.6s 975,343,098 24.0% 10.3 MB/s gzip (l=6) 184.0s 1,140,533,074 28.1% 22.0 MB/s zstd (l=1) 108.2s 1,119,434,718 27.6% 37.5 MB/s zstd (l=2) 111.3s 1,078,328,002 26.6% 36.4 MB/s zstd (l=3) 113.7s 1,011,823,727 25.0% 35.7 MB/s zstd (l=4) 116.0s 1,008,965,888 24.9% 35.0 MB/s zstd (l=5) 121.0s 977,203,148 24.1% 33.5 MB/s zstd (l=6) 131.7s 927,360,198 22.9% 30.8 MB/s zstd (l=7) 139.0s 912,808,505 22.5% 29.2 MB/s zstd (l=12) 198.1s 854,527,714 21.1% 20.5 MB/s zstd (l=18) 681.6s 789,750,690 19.5% 5.9 MB/s On compression, zstd for bundle generation delivers: * better compression than gzip with significantly less CPU utilization * better than bzip2 compression ratios while still being significantly faster than gzip * ability to aggressively tune compression level to achieve significantly smaller bundles That last point is important. With clone bundles, a server can pre-generate a bundle file, upload it to a static file server, and redirect clients to transparently download it during clone. The server could choose to produce a zstd bundle with the highest compression settings possible. This would take a very long time - a magnitude longer than a typical zstd bundle generation - but the result would be hundreds of megabytes smaller! For the clone volume we do at Mozilla, this could translate to petabytes of bandwidth savings per year and faster clones (due to smaller transfer size). I don't have detailed numbers to report on decompression. However, zstd decompression is fast: >1 GB/s output throughput on this machine, even through the Python bindings. And it can do that regardless of the compression level of the input. By the time you have enough data to worry about overhead of decompression, you have plenty of other things to worry about performance wise. zstd is wins all around. I can't wait to implement support for it on the wire protocol and in revlogs.
author Gregory Szorc <gregory.szorc@gmail.com>
date Fri, 11 Nov 2016 01:10:07 -0800
parents af849596752c
children 0a3b11a7489a
line wrap: on
line source

# Mercurial bookmark support code
#
# Copyright 2008 David Soria Parra <dsp@php.net>
#
# This software may be used and distributed according to the terms of the
# GNU General Public License version 2 or any later version.

from __future__ import absolute_import

import errno
import os

from .i18n import _
from .node import (
    bin,
    hex,
)
from . import (
    encoding,
    error,
    lock as lockmod,
    obsolete,
    util,
)

def _getbkfile(repo):
    """Hook so that extensions that mess with the store can hook bm storage.

    For core, this just handles wether we should see pending
    bookmarks or the committed ones. Other extensions (like share)
    may need to tweak this behavior further.
    """
    bkfile = None
    if 'HG_PENDING' in os.environ:
        try:
            bkfile = repo.vfs('bookmarks.pending')
        except IOError as inst:
            if inst.errno != errno.ENOENT:
                raise
    if bkfile is None:
        bkfile = repo.vfs('bookmarks')
    return bkfile


class bmstore(dict):
    """Storage for bookmarks.

    This object should do all bookmark-related reads and writes, so
    that it's fairly simple to replace the storage underlying
    bookmarks without having to clone the logic surrounding
    bookmarks. This type also should manage the active bookmark, if
    any.

    This particular bmstore implementation stores bookmarks as
    {hash}\s{name}\n (the same format as localtags) in
    .hg/bookmarks. The mapping is stored as {name: nodeid}.
    """

    def __init__(self, repo):
        dict.__init__(self)
        self._repo = repo
        try:
            bkfile = _getbkfile(repo)
            for line in bkfile:
                line = line.strip()
                if not line:
                    continue
                if ' ' not in line:
                    repo.ui.warn(_('malformed line in .hg/bookmarks: %r\n')
                                 % line)
                    continue
                sha, refspec = line.split(' ', 1)
                refspec = encoding.tolocal(refspec)
                try:
                    self[refspec] = repo.changelog.lookup(sha)
                except LookupError:
                    pass
        except IOError as inst:
            if inst.errno != errno.ENOENT:
                raise
        self._clean = True
        self._active = _readactive(repo, self)
        self._aclean = True

    @property
    def active(self):
        return self._active

    @active.setter
    def active(self, mark):
        if mark is not None and mark not in self:
            raise AssertionError('bookmark %s does not exist!' % mark)

        self._active = mark
        self._aclean = False

    def __setitem__(self, *args, **kwargs):
        self._clean = False
        return dict.__setitem__(self, *args, **kwargs)

    def __delitem__(self, key):
        self._clean = False
        return dict.__delitem__(self, key)

    def recordchange(self, tr):
        """record that bookmarks have been changed in a transaction

        The transaction is then responsible for updating the file content."""
        tr.addfilegenerator('bookmarks', ('bookmarks',), self._write,
                            location='plain')
        tr.hookargs['bookmark_moved'] = '1'

    def _writerepo(self, repo):
        """Factored out for extensibility"""
        rbm = repo._bookmarks
        if rbm.active not in self:
            rbm.active = None
            rbm._writeactive()

        with repo.wlock():
            file_ = repo.vfs('bookmarks', 'w', atomictemp=True,
                             checkambig=True)
            try:
                self._write(file_)
            except: # re-raises
                file_.discard()
                raise
            finally:
                file_.close()

    def _writeactive(self):
        if self._aclean:
            return
        with self._repo.wlock():
            if self._active is not None:
                f = self._repo.vfs('bookmarks.current', 'w', atomictemp=True,
                                   checkambig=True)
                try:
                    f.write(encoding.fromlocal(self._active))
                finally:
                    f.close()
            else:
                try:
                    self._repo.vfs.unlink('bookmarks.current')
                except OSError as inst:
                    if inst.errno != errno.ENOENT:
                        raise
        self._aclean = True

    def _write(self, fp):
        for name, node in self.iteritems():
            fp.write("%s %s\n" % (hex(node), encoding.fromlocal(name)))
        self._clean = True
        self._repo.invalidatevolatilesets()

    def expandname(self, bname):
        if bname == '.':
            if self.active:
                return self.active
            else:
                raise error.Abort(_("no active bookmark"))
        return bname

def _readactive(repo, marks):
    """
    Get the active bookmark. We can have an active bookmark that updates
    itself as we commit. This function returns the name of that bookmark.
    It is stored in .hg/bookmarks.current
    """
    mark = None
    try:
        file = repo.vfs('bookmarks.current')
    except IOError as inst:
        if inst.errno != errno.ENOENT:
            raise
        return None
    try:
        # No readline() in osutil.posixfile, reading everything is
        # cheap.
        # Note that it's possible for readlines() here to raise
        # IOError, since we might be reading the active mark over
        # static-http which only tries to load the file when we try
        # to read from it.
        mark = encoding.tolocal((file.readlines() or [''])[0])
        if mark == '' or mark not in marks:
            mark = None
    except IOError as inst:
        if inst.errno != errno.ENOENT:
            raise
        return None
    finally:
        file.close()
    return mark

def activate(repo, mark):
    """
    Set the given bookmark to be 'active', meaning that this bookmark will
    follow new commits that are made.
    The name is recorded in .hg/bookmarks.current
    """
    repo._bookmarks.active = mark
    repo._bookmarks._writeactive()

def deactivate(repo):
    """
    Unset the active bookmark in this repository.
    """
    repo._bookmarks.active = None
    repo._bookmarks._writeactive()

def isactivewdirparent(repo):
    """
    Tell whether the 'active' bookmark (the one that follows new commits)
    points to one of the parents of the current working directory (wdir).

    While this is normally the case, it can on occasion be false; for example,
    immediately after a pull, the active bookmark can be moved to point
    to a place different than the wdir. This is solved by running `hg update`.
    """
    mark = repo._activebookmark
    marks = repo._bookmarks
    parents = [p.node() for p in repo[None].parents()]
    return (mark in marks and marks[mark] in parents)

def deletedivergent(repo, deletefrom, bm):
    '''Delete divergent versions of bm on nodes in deletefrom.

    Return True if at least one bookmark was deleted, False otherwise.'''
    deleted = False
    marks = repo._bookmarks
    divergent = [b for b in marks if b.split('@', 1)[0] == bm.split('@', 1)[0]]
    for mark in divergent:
        if mark == '@' or '@' not in mark:
            # can't be divergent by definition
            continue
        if mark and marks[mark] in deletefrom:
            if mark != bm:
                del marks[mark]
                deleted = True
    return deleted

def calculateupdate(ui, repo, checkout):
    '''Return a tuple (targetrev, movemarkfrom) indicating the rev to
    check out and where to move the active bookmark from, if needed.'''
    movemarkfrom = None
    if checkout is None:
        activemark = repo._activebookmark
        if isactivewdirparent(repo):
            movemarkfrom = repo['.'].node()
        elif activemark:
            ui.status(_("updating to active bookmark %s\n") % activemark)
            checkout = activemark
    return (checkout, movemarkfrom)

def update(repo, parents, node):
    deletefrom = parents
    marks = repo._bookmarks
    update = False
    active = marks.active
    if not active:
        return False

    if marks[active] in parents:
        new = repo[node]
        divs = [repo[b] for b in marks
                if b.split('@', 1)[0] == active.split('@', 1)[0]]
        anc = repo.changelog.ancestors([new.rev()])
        deletefrom = [b.node() for b in divs if b.rev() in anc or b == new]
        if validdest(repo, repo[marks[active]], new):
            marks[active] = new.node()
            update = True

    if deletedivergent(repo, deletefrom, active):
        update = True

    if update:
        lock = tr = None
        try:
            lock = repo.lock()
            tr = repo.transaction('bookmark')
            marks.recordchange(tr)
            tr.close()
        finally:
            lockmod.release(tr, lock)
    return update

def listbookmarks(repo):
    # We may try to list bookmarks on a repo type that does not
    # support it (e.g., statichttprepository).
    marks = getattr(repo, '_bookmarks', {})

    d = {}
    hasnode = repo.changelog.hasnode
    for k, v in marks.iteritems():
        # don't expose local divergent bookmarks
        if hasnode(v) and ('@' not in k or k.endswith('@')):
            d[k] = hex(v)
    return d

def pushbookmark(repo, key, old, new):
    w = l = tr = None
    try:
        w = repo.wlock()
        l = repo.lock()
        tr = repo.transaction('bookmarks')
        marks = repo._bookmarks
        existing = hex(marks.get(key, ''))
        if existing != old and existing != new:
            return False
        if new == '':
            del marks[key]
        else:
            if new not in repo:
                return False
            marks[key] = repo[new].node()
        marks.recordchange(tr)
        tr.close()
        return True
    finally:
        lockmod.release(tr, l, w)

def compare(repo, srcmarks, dstmarks,
            srchex=None, dsthex=None, targets=None):
    '''Compare bookmarks between srcmarks and dstmarks

    This returns tuple "(addsrc, adddst, advsrc, advdst, diverge,
    differ, invalid)", each are list of bookmarks below:

    :addsrc:  added on src side (removed on dst side, perhaps)
    :adddst:  added on dst side (removed on src side, perhaps)
    :advsrc:  advanced on src side
    :advdst:  advanced on dst side
    :diverge: diverge
    :differ:  changed, but changeset referred on src is unknown on dst
    :invalid: unknown on both side
    :same:    same on both side

    Each elements of lists in result tuple is tuple "(bookmark name,
    changeset ID on source side, changeset ID on destination
    side)". Each changeset IDs are 40 hexadecimal digit string or
    None.

    Changeset IDs of tuples in "addsrc", "adddst", "differ" or
     "invalid" list may be unknown for repo.

    This function expects that "srcmarks" and "dstmarks" return
    changeset ID in 40 hexadecimal digit string for specified
    bookmark. If not so (e.g. bmstore "repo._bookmarks" returning
    binary value), "srchex" or "dsthex" should be specified to convert
    into such form.

    If "targets" is specified, only bookmarks listed in it are
    examined.
    '''
    if not srchex:
        srchex = lambda x: x
    if not dsthex:
        dsthex = lambda x: x

    if targets:
        bset = set(targets)
    else:
        srcmarkset = set(srcmarks)
        dstmarkset = set(dstmarks)
        bset = srcmarkset | dstmarkset

    results = ([], [], [], [], [], [], [], [])
    addsrc = results[0].append
    adddst = results[1].append
    advsrc = results[2].append
    advdst = results[3].append
    diverge = results[4].append
    differ = results[5].append
    invalid = results[6].append
    same = results[7].append

    for b in sorted(bset):
        if b not in srcmarks:
            if b in dstmarks:
                adddst((b, None, dsthex(dstmarks[b])))
            else:
                invalid((b, None, None))
        elif b not in dstmarks:
            addsrc((b, srchex(srcmarks[b]), None))
        else:
            scid = srchex(srcmarks[b])
            dcid = dsthex(dstmarks[b])
            if scid == dcid:
                same((b, scid, dcid))
            elif scid in repo and dcid in repo:
                sctx = repo[scid]
                dctx = repo[dcid]
                if sctx.rev() < dctx.rev():
                    if validdest(repo, sctx, dctx):
                        advdst((b, scid, dcid))
                    else:
                        diverge((b, scid, dcid))
                else:
                    if validdest(repo, dctx, sctx):
                        advsrc((b, scid, dcid))
                    else:
                        diverge((b, scid, dcid))
            else:
                # it is too expensive to examine in detail, in this case
                differ((b, scid, dcid))

    return results

def _diverge(ui, b, path, localmarks, remotenode):
    '''Return appropriate diverged bookmark for specified ``path``

    This returns None, if it is failed to assign any divergent
    bookmark name.

    This reuses already existing one with "@number" suffix, if it
    refers ``remotenode``.
    '''
    if b == '@':
        b = ''
    # try to use an @pathalias suffix
    # if an @pathalias already exists, we overwrite (update) it
    if path.startswith("file:"):
        path = util.url(path).path
    for p, u in ui.configitems("paths"):
        if u.startswith("file:"):
            u = util.url(u).path
        if path == u:
            return '%s@%s' % (b, p)

    # assign a unique "@number" suffix newly
    for x in range(1, 100):
        n = '%s@%d' % (b, x)
        if n not in localmarks or localmarks[n] == remotenode:
            return n

    return None

def updatefromremote(ui, repo, remotemarks, path, trfunc, explicit=()):
    ui.debug("checking for updated bookmarks\n")
    localmarks = repo._bookmarks
    (addsrc, adddst, advsrc, advdst, diverge, differ, invalid, same
     ) = compare(repo, remotemarks, localmarks, dsthex=hex)

    status = ui.status
    warn = ui.warn
    if ui.configbool('ui', 'quietbookmarkmove', False):
        status = warn = ui.debug

    explicit = set(explicit)
    changed = []
    for b, scid, dcid in addsrc:
        if scid in repo: # add remote bookmarks for changes we already have
            changed.append((b, bin(scid), status,
                            _("adding remote bookmark %s\n") % (b)))
        elif b in explicit:
            explicit.remove(b)
            ui.warn(_("remote bookmark %s points to locally missing %s\n")
                    % (b, scid[:12]))

    for b, scid, dcid in advsrc:
        changed.append((b, bin(scid), status,
                        _("updating bookmark %s\n") % (b)))
    # remove normal movement from explicit set
    explicit.difference_update(d[0] for d in changed)

    for b, scid, dcid in diverge:
        if b in explicit:
            explicit.discard(b)
            changed.append((b, bin(scid), status,
                            _("importing bookmark %s\n") % (b)))
        else:
            snode = bin(scid)
            db = _diverge(ui, b, path, localmarks, snode)
            if db:
                changed.append((db, snode, warn,
                                _("divergent bookmark %s stored as %s\n") %
                                (b, db)))
            else:
                warn(_("warning: failed to assign numbered name "
                       "to divergent bookmark %s\n") % (b))
    for b, scid, dcid in adddst + advdst:
        if b in explicit:
            explicit.discard(b)
            changed.append((b, bin(scid), status,
                            _("importing bookmark %s\n") % (b)))
    for b, scid, dcid in differ:
        if b in explicit:
            explicit.remove(b)
            ui.warn(_("remote bookmark %s points to locally missing %s\n")
                    % (b, scid[:12]))

    if changed:
        tr = trfunc()
        for b, node, writer, msg in sorted(changed):
            localmarks[b] = node
            writer(msg)
        localmarks.recordchange(tr)

def incoming(ui, repo, other):
    '''Show bookmarks incoming from other to repo
    '''
    ui.status(_("searching for changed bookmarks\n"))

    r = compare(repo, other.listkeys('bookmarks'), repo._bookmarks,
                dsthex=hex)
    addsrc, adddst, advsrc, advdst, diverge, differ, invalid, same = r

    incomings = []
    if ui.debugflag:
        getid = lambda id: id
    else:
        getid = lambda id: id[:12]
    if ui.verbose:
        def add(b, id, st):
            incomings.append("   %-25s %s %s\n" % (b, getid(id), st))
    else:
        def add(b, id, st):
            incomings.append("   %-25s %s\n" % (b, getid(id)))
    for b, scid, dcid in addsrc:
        # i18n: "added" refers to a bookmark
        add(b, scid, _('added'))
    for b, scid, dcid in advsrc:
        # i18n: "advanced" refers to a bookmark
        add(b, scid, _('advanced'))
    for b, scid, dcid in diverge:
        # i18n: "diverged" refers to a bookmark
        add(b, scid, _('diverged'))
    for b, scid, dcid in differ:
        # i18n: "changed" refers to a bookmark
        add(b, scid, _('changed'))

    if not incomings:
        ui.status(_("no changed bookmarks found\n"))
        return 1

    for s in sorted(incomings):
        ui.write(s)

    return 0

def outgoing(ui, repo, other):
    '''Show bookmarks outgoing from repo to other
    '''
    ui.status(_("searching for changed bookmarks\n"))

    r = compare(repo, repo._bookmarks, other.listkeys('bookmarks'),
                srchex=hex)
    addsrc, adddst, advsrc, advdst, diverge, differ, invalid, same = r

    outgoings = []
    if ui.debugflag:
        getid = lambda id: id
    else:
        getid = lambda id: id[:12]
    if ui.verbose:
        def add(b, id, st):
            outgoings.append("   %-25s %s %s\n" % (b, getid(id), st))
    else:
        def add(b, id, st):
            outgoings.append("   %-25s %s\n" % (b, getid(id)))
    for b, scid, dcid in addsrc:
        # i18n: "added refers to a bookmark
        add(b, scid, _('added'))
    for b, scid, dcid in adddst:
        # i18n: "deleted" refers to a bookmark
        add(b, ' ' * 40, _('deleted'))
    for b, scid, dcid in advsrc:
        # i18n: "advanced" refers to a bookmark
        add(b, scid, _('advanced'))
    for b, scid, dcid in diverge:
        # i18n: "diverged" refers to a bookmark
        add(b, scid, _('diverged'))
    for b, scid, dcid in differ:
        # i18n: "changed" refers to a bookmark
        add(b, scid, _('changed'))

    if not outgoings:
        ui.status(_("no changed bookmarks found\n"))
        return 1

    for s in sorted(outgoings):
        ui.write(s)

    return 0

def summary(repo, other):
    '''Compare bookmarks between repo and other for "hg summary" output

    This returns "(# of incoming, # of outgoing)" tuple.
    '''
    r = compare(repo, other.listkeys('bookmarks'), repo._bookmarks,
                dsthex=hex)
    addsrc, adddst, advsrc, advdst, diverge, differ, invalid, same = r
    return (len(addsrc), len(adddst))

def validdest(repo, old, new):
    """Is the new bookmark destination a valid update from the old one"""
    repo = repo.unfiltered()
    if old == new:
        # Old == new -> nothing to update.
        return False
    elif not old:
        # old is nullrev, anything is valid.
        # (new != nullrev has been excluded by the previous check)
        return True
    elif repo.obsstore:
        return new.node() in obsolete.foreground(repo, [old.node()])
    else:
        # still an independent clause as it is lazier (and therefore faster)
        return old.descendant(new)