contrib/automation/README.rst
author Gregory Szorc <gregory.szorc@gmail.com>
Fri, 15 Mar 2019 11:24:08 -0700
changeset 42024 b05a3e28cf24
child 42285 65b3ef162b39
permissions -rw-r--r--
automation: perform tasks on remote machines Sometimes you don't have access to a machine in order to do something. For example, you may not have access to a Windows machine required to build Windows binaries or run tests on that platform. This commit introduces a pile of code intended to help "automate" common tasks, like building release artifacts. In its current form, the automation code provides functionality for performing tasks on Windows EC2 instances. The hgautomation.aws module provides functionality for integrating with AWS. It manages EC2 resources such as IAM roles, EC2 security groups, AMIs, and instances. The hgautomation.windows module provides a higher-level interface for performing tasks on remote Windows machines. The hgautomation.cli module provides a command-line interface to these higher-level primitives. I attempted to structure Windows remote machine interaction around Windows Remoting / PowerShell. This is kinda/sorta like SSH + shell, but for Windows. In theory, most of the functionality is cloud provider agnostic, as we should be able to use any established WinRM connection to interact with a remote. In reality, we're tightly coupled to AWS at the moment because I didn't want to prematurely add abstractions for a 2nd cloud provider. (1 was hard enough to implement.) In the aws module is code for creating an image with a fully functional Mercurial development environment. It contains VC9, VC2017, msys, and other dependencies. The image is fully capable of building all the existing Mercurial release artifacts and running tests. There are a few things that don't work. For example, running Windows tests with Python 3. But building the Windows release artifacts does work. And that was an impetus for this work. (Although we don't yet support code signing.) Getting this functionality to work was extremely time consuming. It took hours debugging permissions failures and other wonky behavior due to PowerShell Remoting. (The permissions model for PowerShell is crazy and you brush up against all kinds of issues because of the user/privileges of the user running the PowerShell and the permissions of the PowerShell session itself.) The functionality around AWS resource management could use some improving. In theory we support shared tenancy via resource name prefixing. In reality, we don't offer a way to configure this. Speaking of AWS resource management, I thought about using a tool like Terraform to manage resources. But at our scale, writing a few dozen lines of code to manage resources seemed acceptable. Maybe we should reconsider this if things grow out of control. Time will tell. Currently, emphasis is placed on Windows. But I only started there because it was likely to be the most difficult to implement. It should be relatively trivial to automate tasks on remote Linux machines. In fact, I have a ~1 year old script to run tests on a remote EC2 instance. I will likely be porting that to this new "framework" in the near future. # no-check-commit because foo_bar functions Differential Revision: https://phab.mercurial-scm.org/D6142

====================
Mercurial Automation
====================

This directory contains code and utilities for building and testing Mercurial
on remote machines.

The ``automation.py`` Script
============================

``automation.py`` is an executable Python script (requires Python 3.5+)
that serves as a driver to common automation tasks.

When executed, the script will *bootstrap* a virtualenv in
``<source-root>/build/venv-automation`` then re-execute itself using
that virtualenv. So there is no need for the caller to have a virtualenv
explicitly activated. This virtualenv will be populated with various
dependencies (as defined by the ``requirements.txt`` file).

To see what you can do with this script, simply run it::

   $ ./automation.py

Local State
===========

By default, local state required to interact with remote servers is stored
in the ``~/.hgautomation`` directory.

We attempt to limit persistent state to this directory. Even when
performing tasks that may have side-effects, we try to limit those
side-effects so they don't impact the local system. e.g. when we SSH
into a remote machine, we create a temporary directory for the SSH
config so the user's known hosts file isn't updated.

AWS Integration
===============

Various automation tasks integrate with AWS to provide access to
resources such as EC2 instances for generic compute.

This obviously requires an AWS account and credentials to work.

We use the ``boto3`` library for interacting with AWS APIs. We do not employ
any special functionality for telling ``boto3`` where to find AWS credentials. See
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
for how ``boto3`` works. Once you have configured your environment such
that ``boto3`` can find credentials, interaction with AWS should *just work*.

.. hint::

   Typically you have a ``~/.aws/credentials`` file containing AWS
   credentials. If you manage multiple credentials, you can override which
   *profile* to use at run-time by setting the ``AWS_PROFILE`` environment
   variable.

Resource Management
-------------------

Depending on the task being performed, various AWS services will be accessed.
This of course requires AWS credentials with permissions to access these
services.

The following AWS services can be accessed by automation tasks:

* EC2
* IAM
* Simple Systems Manager (SSM)

Various resources will also be created as part of performing various tasks.
This also requires various permissions.

The following AWS resources can be created by automation tasks:

* EC2 key pairs
* EC2 security groups
* EC2 instances
* IAM roles and instance profiles
* SSM command invocations

When possible, we prefix resource names with ``hg-`` so they can easily
be identified as belonging to Mercurial.

.. important::

   We currently assume that AWS accounts utilized by *us* are single
   tenancy. Attempts to have discrete users of ``automation.py`` (including
   sharing credentials across machines) using the same AWS account can result
   in them interfering with each other and things breaking.

Cost of Operation
-----------------

``automation.py`` tries to be frugal with regards to utilization of remote
resources. Persistent remote resources are minimized in order to keep costs
in check. For example, EC2 instances are often ephemeral and only live as long
as the operation being performed.

Under normal operation, recurring costs are limited to:

* Storage costs for AMI / EBS snapshots. This should be just a few pennies
  per month.

When running EC2 instances, you'll be billed accordingly. By default, we
use *small* instances, like ``t3.medium``. This instance type costs ~$0.07 per
hour.

.. note::

   When running Windows EC2 instances, AWS bills at the full hourly cost, even
   if the instance doesn't run for a full hour (per-second billing doesn't
   apply to Windows AMIs).

Managing Remote Resources
-------------------------

Occassionally, there may be an error purging a temporary resource. Or you
may wish to forcefully purge remote state. Commands can be invoked to manually
purge remote resources.

To terminate all EC2 instances that we manage::

   $ automation.py terminate-ec2-instances

To purge all EC2 resources that we manage::

   $ automation.py purge-ec2-resources