view mercurial/namespaces.py @ 42285:65b3ef162b39

automation: initial support for running Linux tests Building on top of our Windows automation support, this commit implements support for performing automated tasks on remote Linux machines. Specifically, we implement support for running tests on ephemeral EC2 instances. This seems to be a worthwhile place to start, as building packages on Linux is more or less a solved problem because we already have facilities for building in Docker containers, which provide "good enough" reproducibility guarantees. The new `run-tests-linux` command works similarly to `run-tests-windows`: it ensures an AMI with hg dependencies is available, provisions a temporary EC2 instance with this AMI, pushes local changes to that instance via SSH, then invokes `run-tests.py`. Using this new command, I am able to run the entire test harness substantially faster then I am on my local machine courtesy of access to massive core EC2 instances: wall: 16:20 ./run-tests.py -l (i7-6700K) wall: 14:00 automation.py run-tests-linux --ec2-instance c5.2xlarge wall: 8:30 automation.py run-tests-linux --ec2-instance m5.4xlarge wall: 8:04 automation.py run-tests-linux --ec2-instance c5.4xlarge wall: 4:30 automation.py run-tests-linux --ec2-instance c5.9xlarge wall: 3:57 automation.py run-tests-linux --ec2-instance m5.12xlarge wall: 3:05 automation.py run-tests-linux --ec2-instance m5.24xlarge wall: 3:02 automation.py run-tests-linux --ec2-instance c5.18xlarge ~3 minute wall time to run pretty much the entire test harness is not too bad! The AMIs install multiple versions of Python. And the run-tests-linux command specifies which one to use: automation.py run-tests-linux --python system3 automation.py run-tests-linux --python 3.5 automation.py run-tests-linux --python pypy2.7 By default, the system Python 2.7 is used. Using this functionality, I was able to identity some unexpected test failures on PyPy! Included in the feature is support for running with alternate filesystems. You can simply pass --filesystem to the command to specify the type of filesystem to run tests on. When the ephemeral instance is started, a new filesystem will be created and tests will run from it: wall: 4:30 automation.py run-tests-linux --ec2-instance c5.9xlarge wall: 4:20 automation.py run-tests-linux --ec2-instance c5d.9xlarge --filesystem xfs wall: 4:24 automation.py run-tests-linux --ec2-instance c5d.9xlarge --filesystem tmpfs wall: 4:26 automation.py run-tests-linux --ec2-instance c5d.9xlarge --filesystem ext4 We also support multiple Linux distributions: $ automation.py run-tests-linux --distro debian9 total time: 298.1s; setup: 60.7s; tests: 237.5s; setup overhead: 20.4% $ automation.py run-tests-linux --distro ubuntu18.04 total time: 286.1s; setup: 61.3s; tests: 224.7s; setup overhead: 21.4% $ automation.py run-tests-linux --distro ubuntu18.10 total time: 278.5s; setup: 58.2s; tests: 220.3s; setup overhead: 20.9% $ automation.py run-tests-linux --distro ubuntu19.04 total time: 265.8s; setup: 42.5s; tests: 223.3s; setup overhead: 16.0% Debian and Ubuntu are supported because those are what I use and am most familiar with. It should be easy enough to add support for other distros. Unlike the Windows AMIs, Linux EC2 instances bill per second. So the cost to instantiating an ephemeral instance isn't as severe. That being said, there is some overhead, as it takes several dozen seconds for the instance to boot, push local changes, and build Mercurial. During this time, the instance is largely CPU idle and wasting money. Even with this inefficiency, running tests is relatively cheap: $0.15-$0.25 per full test run. A machine running tests as efficiently as these EC2 instances would cost say $6,000, so you can run the test harness a >20,000 times for the cost of an equivalent machine. Running tests in EC2 is almost certainly cheaper than buying a beefy machine for developers to use :) # no-check-commit because foo_bar function names Differential Revision: https://phab.mercurial-scm.org/D6319
author Gregory Szorc <gregory.szorc@gmail.com>
date Sat, 27 Apr 2019 11:48:26 -0700
parents 4c0683655599
children 2372284d9457
line wrap: on
line source

from __future__ import absolute_import

from .i18n import _
from . import (
    registrar,
    templatekw,
    util,
)

def tolist(val):
    """
    a convenience method to return an empty list instead of None
    """
    if val is None:
        return []
    else:
        return [val]

class namespaces(object):
    """provides an interface to register and operate on multiple namespaces. See
    the namespace class below for details on the namespace object.

    """

    _names_version = 0

    def __init__(self):
        self._names = util.sortdict()
        columns = templatekw.getlogcolumns()

        # we need current mercurial named objects (bookmarks, tags, and
        # branches) to be initialized somewhere, so that place is here
        bmknames = lambda repo: repo._bookmarks.keys()
        bmknamemap = lambda repo, name: tolist(repo._bookmarks.get(name))
        bmknodemap = lambda repo, node: repo.nodebookmarks(node)
        n = namespace("bookmarks", templatename="bookmark",
                      logfmt=columns['bookmark'],
                      listnames=bmknames,
                      namemap=bmknamemap, nodemap=bmknodemap,
                      builtin=True)
        self.addnamespace(n)

        tagnames = lambda repo: [t for t, n in repo.tagslist()]
        tagnamemap = lambda repo, name: tolist(repo._tagscache.tags.get(name))
        tagnodemap = lambda repo, node: repo.nodetags(node)
        n = namespace("tags", templatename="tag",
                      logfmt=columns['tag'],
                      listnames=tagnames,
                      namemap=tagnamemap, nodemap=tagnodemap,
                      deprecated={'tip'},
                      builtin=True)
        self.addnamespace(n)

        bnames = lambda repo: repo.branchmap().keys()
        bnamemap = lambda repo, name: tolist(repo.branchtip(name, True))
        bnodemap = lambda repo, node: [repo[node].branch()]
        n = namespace("branches", templatename="branch",
                      logfmt=columns['branch'],
                      listnames=bnames,
                      namemap=bnamemap, nodemap=bnodemap,
                      builtin=True)
        self.addnamespace(n)

    def __getitem__(self, namespace):
        """returns the namespace object"""
        return self._names[namespace]

    def __iter__(self):
        return self._names.__iter__()

    def items(self):
        return self._names.iteritems()

    iteritems = items

    def addnamespace(self, namespace, order=None):
        """register a namespace

        namespace: the name to be registered (in plural form)
        order: optional argument to specify the order of namespaces
               (e.g. 'branches' should be listed before 'bookmarks')

        """
        if order is not None:
            self._names.insert(order, namespace.name, namespace)
        else:
            self._names[namespace.name] = namespace

        # we only generate a template keyword if one does not already exist
        if namespace.name not in templatekw.keywords:
            templatekeyword = registrar.templatekeyword(templatekw.keywords)
            @templatekeyword(namespace.name, requires={'repo', 'ctx'})
            def generatekw(context, mapping):
                return templatekw.shownames(context, mapping, namespace.name)

    def singlenode(self, repo, name):
        """
        Return the 'best' node for the given name. What's best is defined
        by the namespace's singlenode() function. The first match returned by
        a namespace in the defined precedence order is used.

        Raises a KeyError if there is no such node.
        """
        for ns, v in self._names.iteritems():
            n = v.singlenode(repo, name)
            if n:
                return n
        raise KeyError(_('no such name: %s') % name)

class namespace(object):
    """provides an interface to a namespace

    Namespaces are basically generic many-to-many mapping between some
    (namespaced) names and nodes. The goal here is to control the pollution of
    jamming things into tags or bookmarks (in extension-land) and to simplify
    internal bits of mercurial: log output, tab completion, etc.

    More precisely, we define a mapping of names to nodes, and a mapping from
    nodes to names. Each mapping returns a list.

    Furthermore, each name mapping will be passed a name to lookup which might
    not be in its domain. In this case, each method should return an empty list
    and not raise an error.

    This namespace object will define the properties we need:
      'name': the namespace (plural form)
      'templatename': name to use for templating (usually the singular form
                      of the plural namespace name)
      'listnames': list of all names in the namespace (usually the keys of a
                   dictionary)
      'namemap': function that takes a name and returns a list of nodes
      'nodemap': function that takes a node and returns a list of names
      'deprecated': set of names to be masked for ordinary use
      'builtin': bool indicating if this namespace is supported by core
                 Mercurial.
    """

    def __init__(self, name, templatename=None, logname=None, colorname=None,
                 logfmt=None, listnames=None, namemap=None, nodemap=None,
                 deprecated=None, builtin=False, singlenode=None):
        """create a namespace

        name: the namespace to be registered (in plural form)
        templatename: the name to use for templating
        logname: the name to use for log output; if not specified templatename
                 is used
        colorname: the name to use for colored log output; if not specified
                   logname is used
        logfmt: the format to use for (i18n-ed) log output; if not specified
                it is composed from logname
        listnames: function to list all names
        namemap: function that inputs a name, output node(s)
        nodemap: function that inputs a node, output name(s)
        deprecated: set of names to be masked for ordinary use
        builtin: whether namespace is implemented by core Mercurial
        singlenode: function that inputs a name, output best node (or None)
        """
        self.name = name
        self.templatename = templatename
        self.logname = logname
        self.colorname = colorname
        self.logfmt = logfmt
        self.listnames = listnames
        self.namemap = namemap
        self.nodemap = nodemap
        if singlenode:
            self.singlenode = singlenode

        # if logname is not specified, use the template name as backup
        if self.logname is None:
            self.logname = self.templatename

        # if colorname is not specified, just use the logname as a backup
        if self.colorname is None:
            self.colorname = self.logname

        # if logfmt is not specified, compose it from logname as backup
        if self.logfmt is None:
            # i18n: column positioning for "hg log"
            self.logfmt = ("%s:" % self.logname).ljust(13) + "%s\n"

        if deprecated is None:
            self.deprecated = set()
        else:
            self.deprecated = deprecated

        self.builtin = builtin

    def names(self, repo, node):
        """method that returns a (sorted) list of names in a namespace that
        match a given node"""
        return sorted(self.nodemap(repo, node))

    def nodes(self, repo, name):
        """method that returns a list of nodes in a namespace that
        match a given name.

        """
        return sorted(self.namemap(repo, name))

    def singlenode(self, repo, name):
        """returns the best node for the given name

        By default, the best node is the node from nodes() with the highest
        revision number. It can be overriden by the namespace."""
        n = self.namemap(repo, name)
        if n:
            # return max revision number
            if len(n) > 1:
                cl = repo.changelog
                maxrev = max(cl.rev(node) for node in n)
                return cl.node(maxrev)
            return n[0]
        return None