Mercurial > hg-stable
changeset 42307:e570106beda1
automation: shore up rebooting behavior
There was a race condition in the old code. Use
instance.stop()/instance.start() to eliminate it.
As part of debugging this, I also found another race condition
related to PowerShell permissions after the reboot. Unfortunately,
I'm not sure the best way to work around it. I've added a comment
for now.
Differential Revision: https://phab.mercurial-scm.org/D6288
author | Gregory Szorc <gregory.szorc@gmail.com> |
---|---|
date | Fri, 19 Apr 2019 07:34:55 -0700 |
parents | f30184484dd1 |
children | 4274b1369b75 |
files | contrib/automation/hgautomation/aws.py |
diffstat | 1 files changed, 18 insertions(+), 2 deletions(-) [+] |
line wrap: on
line diff
--- a/contrib/automation/hgautomation/aws.py Fri Apr 19 06:07:00 2019 -0700 +++ b/contrib/automation/hgautomation/aws.py Fri Apr 19 07:34:55 2019 -0700 @@ -808,10 +808,26 @@ ) # Reboot so all updates are fully applied. + # + # We don't use instance.reboot() here because it is asynchronous and + # we don't know when exactly the instance has rebooted. It could take + # a while to stop and we may start trying to interact with the instance + # before it has rebooted. print('rebooting instance %s' % instance.id) - ec2client.reboot_instances(InstanceIds=[instance.id]) + instance.stop() + ec2client.get_waiter('instance_stopped').wait( + InstanceIds=[instance.id], + WaiterConfig={ + 'Delay': 5, + }) - time.sleep(15) + instance.start() + wait_for_ip_addresses([instance]) + + # There is a race condition here between the User Data PS script running + # and us connecting to WinRM. This can manifest as + # "AuthorizationManager check failed" failures during run_powershell(). + # TODO figure out a workaround. print('waiting for Windows Remote Management to come back...') client = wait_for_winrm(instance.public_ip_address, 'Administrator',