Sat, 12 Nov 2016 03:06:07 +0000 worker: stop using a separate thread waiting for children
Jun Wu <quark@fb.com> [Sat, 12 Nov 2016 03:06:07 +0000] rev 30416
worker: stop using a separate thread waiting for children Now that we have a SIGCHLD hander, and it could get executed when waiting for I/O. It's no longer necessary to have a separated waitpid thread. So just remove it.
Sat, 12 Nov 2016 03:07:22 +0000 worker: add a SIGCHLD handler to collect worker immediately
Jun Wu <quark@fb.com> [Sat, 12 Nov 2016 03:07:22 +0000] rev 30415
worker: add a SIGCHLD handler to collect worker immediately As planned by previous patches, add a SIGCHLD handler to get notifications about worker exits, and deals with worker failure immediately. Note that the SIGCHLD handler gets unregistered before killworkers(), so SIGCHLD won't interrupt "killworkers" - making it harder to send kill signals to waited processes.
Tue, 15 Nov 2016 02:12:16 +0000 worker: make waitforworkers reentrant
Jun Wu <quark@fb.com> [Tue, 15 Nov 2016 02:12:16 +0000] rev 30414
worker: make waitforworkers reentrant We are going to use it in the SIGCHLD handler. The handler will be executed in the main thread with the non-blocking version of waitpid, while the waitforworkers thread runs the blocking version. It's possible that one of them collects a worker and makes the other error out (no child to wait). This patch handles these errors: ECHILD is ignored. EINTR needs a retry. The "pids" set is designed to be only modifiable by "waitforworkers". And we only remove items after a successful waitpid. Since a child process can only be "waitpid"-ed once. It's guaranteed that "pids.remove(p)" won't be called with duplicated "p"s. And once a "p" is removed from "pids", that "p" does not need to be killed or waited any more.
Tue, 15 Nov 2016 02:10:40 +0000 worker: change "pids" to a set
Jun Wu <quark@fb.com> [Tue, 15 Nov 2016 02:10:40 +0000] rev 30413
worker: change "pids" to a set There is no need to keep any order of the "pids" array. A set is more efficient for the "remove" operation. And the following patch will use that.
Thu, 28 Jul 2016 20:57:07 +0100 worker: allow waitforworkers to be non-blocking
Jun Wu <quark@fb.com> [Thu, 28 Jul 2016 20:57:07 +0100] rev 30412
worker: allow waitforworkers to be non-blocking This patch adds a boolean flag to waitforworkers and makes it non-blocking if set to True. This is to make it possible that we can reap our workers while keep other unrelated children untouched, after receiving SIGCHLD.
Thu, 28 Jul 2016 20:51:20 +0100 worker: wait worker pid explicitly
Jun Wu <quark@fb.com> [Thu, 28 Jul 2016 20:51:20 +0100] rev 30411
worker: wait worker pid explicitly Before this patch, waitforworkers uses os.wait() to collect child workers, and only wait len(pids) processes. This can have serious issues if other code spawns new processes and does not reap them: 1. worker.py may get wrong exit code and kill innocent workers. 2. worker.py may continue without waiting for all workers to complete. This patch fixes the issue by using waitpid to wait worker pid explicitly. However, this patch introduces a new issue: worker failure may not be handled immediately. The issue will be addressed in next patches.
(0) -30000 -10000 -3000 -1000 -300 -100 -30 -10 -6 +6 +10 +30 +100 +300 +1000 +3000 +10000 tip