server: add an error feedback mechanism for when the daemon fails to launch
There's a recurring problem on Windows where `hg serve -d` will randomly fail to
spawn a detached process. The reason for the failure is completely hidden, and
it takes hours to get a single failure on my laptop. All this does is redirect
stdout/stderr of the child to a file until the lock file is freed, and then the
parent dumps it out if it fails to spawn.
I chose to put the output into the lock file because that is always cleaned up.
There's no way to report errors after that anyway. On Windows, killdaemons.py
is roughly `kill -9`, so this ensures that junk won't pile up.
This may end up being a case of EADDRINUSE. At least that's what I saw spit out
a few times (among other odd errors and missing output on Windows). But I also
managed to get the same thing on Fedora 26 by running test-hgwebdir.t with
--loop -j10 for several hours. Running `netstat` immediately after killing that
run printed a wall of sockets in the TIME_WAIT state, which were gone a couple
seconds later. I couldn't match up ports that failed, because --loop doesn't
print out the message about the port that was used. So maybe the fix is to
rotate the use of HGPORT[12] in the tests. But, let's collect some more data
first.
--- a/mercurial/server.py Fri Mar 30 20:53:36 2018 -0400
+++ b/mercurial/server.py Wed Mar 28 00:11:09 2018 -0400
@@ -30,6 +30,27 @@
runargs=None, appendpid=False):
'''Run a command as a service.'''
+ # When daemonized on Windows, redirect stdout/stderr to the lockfile (which
+ # gets cleaned up after the child is up and running), so that the parent can
+ # read and print the error if this child dies early. See 594dd384803c. On
+ # other platforms, the child can write to the parent's stdio directly, until
+ # it is redirected prior to runfn().
+ if pycompat.iswindows and opts['daemon_postexec']:
+ for inst in opts['daemon_postexec']:
+ if inst.startswith('unlink:'):
+ lockpath = inst[7:]
+ if os.path.exists(lockpath):
+ procutil.stdout.flush()
+ procutil.stderr.flush()
+
+ fd = os.open(lockpath,
+ os.O_WRONLY | os.O_APPEND | os.O_BINARY)
+ try:
+ os.dup2(fd, 1)
+ os.dup2(fd, 2)
+ finally:
+ os.close(fd)
+
def writepid(pid):
if opts['pid_file']:
if appendpid:
@@ -61,6 +82,12 @@
return not os.path.exists(lockpath)
pid = procutil.rundetached(runargs, condfn)
if pid < 0:
+ # If the daemonized process managed to write out an error msg,
+ # report it.
+ if pycompat.iswindows and os.path.exists(lockpath):
+ with open(lockpath) as log:
+ for line in log:
+ procutil.stderr.write(line)
raise error.Abort(_('child process failed to start'))
writepid(pid)
finally:
@@ -81,10 +108,11 @@
os.setsid()
except AttributeError:
pass
+
+ lockpath = None
for inst in opts['daemon_postexec']:
if inst.startswith('unlink:'):
lockpath = inst[7:]
- os.unlink(lockpath)
elif inst.startswith('chdir:'):
os.chdir(inst[6:])
elif inst != 'none':
@@ -107,6 +135,11 @@
if logfile and logfilefd not in (0, 1, 2):
os.close(logfilefd)
+ # Only unlink after redirecting stdout/stderr, so Windows doesn't
+ # complain about a sharing violation.
+ if lockpath:
+ os.unlink(lockpath)
+
if runfn:
return runfn()