Mercurial > hg-stable
changeset 37214:f09a2eab11cf
server: add an error feedback mechanism for when the daemon fails to launch
There's a recurring problem on Windows where `hg serve -d` will randomly fail to
spawn a detached process. The reason for the failure is completely hidden, and
it takes hours to get a single failure on my laptop. All this does is redirect
stdout/stderr of the child to a file until the lock file is freed, and then the
parent dumps it out if it fails to spawn.
I chose to put the output into the lock file because that is always cleaned up.
There's no way to report errors after that anyway. On Windows, killdaemons.py
is roughly `kill -9`, so this ensures that junk won't pile up.
This may end up being a case of EADDRINUSE. At least that's what I saw spit out
a few times (among other odd errors and missing output on Windows). But I also
managed to get the same thing on Fedora 26 by running test-hgwebdir.t with
--loop -j10 for several hours. Running `netstat` immediately after killing that
run printed a wall of sockets in the TIME_WAIT state, which were gone a couple
seconds later. I couldn't match up ports that failed, because --loop doesn't
print out the message about the port that was used. So maybe the fix is to
rotate the use of HGPORT[12] in the tests. But, let's collect some more data
first.
author | Matt Harbison <matt_harbison@yahoo.com> |
---|---|
date | Wed, 28 Mar 2018 00:11:09 -0400 |
parents | 77f9e95fe3c4 |
children | 893ff8c3bc57 |
files | mercurial/server.py |
diffstat | 1 files changed, 34 insertions(+), 1 deletions(-) [+] |
line wrap: on
line diff
--- a/mercurial/server.py Fri Mar 30 20:53:36 2018 -0400 +++ b/mercurial/server.py Wed Mar 28 00:11:09 2018 -0400 @@ -30,6 +30,27 @@ runargs=None, appendpid=False): '''Run a command as a service.''' + # When daemonized on Windows, redirect stdout/stderr to the lockfile (which + # gets cleaned up after the child is up and running), so that the parent can + # read and print the error if this child dies early. See 594dd384803c. On + # other platforms, the child can write to the parent's stdio directly, until + # it is redirected prior to runfn(). + if pycompat.iswindows and opts['daemon_postexec']: + for inst in opts['daemon_postexec']: + if inst.startswith('unlink:'): + lockpath = inst[7:] + if os.path.exists(lockpath): + procutil.stdout.flush() + procutil.stderr.flush() + + fd = os.open(lockpath, + os.O_WRONLY | os.O_APPEND | os.O_BINARY) + try: + os.dup2(fd, 1) + os.dup2(fd, 2) + finally: + os.close(fd) + def writepid(pid): if opts['pid_file']: if appendpid: @@ -61,6 +82,12 @@ return not os.path.exists(lockpath) pid = procutil.rundetached(runargs, condfn) if pid < 0: + # If the daemonized process managed to write out an error msg, + # report it. + if pycompat.iswindows and os.path.exists(lockpath): + with open(lockpath) as log: + for line in log: + procutil.stderr.write(line) raise error.Abort(_('child process failed to start')) writepid(pid) finally: @@ -81,10 +108,11 @@ os.setsid() except AttributeError: pass + + lockpath = None for inst in opts['daemon_postexec']: if inst.startswith('unlink:'): lockpath = inst[7:] - os.unlink(lockpath) elif inst.startswith('chdir:'): os.chdir(inst[6:]) elif inst != 'none': @@ -107,6 +135,11 @@ if logfile and logfilefd not in (0, 1, 2): os.close(logfilefd) + # Only unlink after redirecting stdout/stderr, so Windows doesn't + # complain about a sharing violation. + if lockpath: + os.unlink(lockpath) + if runfn: return runfn()