Mercurial > hg-stable
changeset 29602:4fc4b8cc9957
chg: handle EOF reading data block
We recently discovered a case in production that chg uses 100% CPU and is
trying to read data forever:
recvfrom(4, "", 1814012019, 0, NULL, NULL) = 0
Using gdb, apparently readchannel() got wrong data. It was reading in an
infinite loop because rsize == 0 does not exit the loop, while the server
process had ended.
(gdb) bt
#0 ... in recv () at /lib64/libc.so.6
#1 ... in readchannel (...) at /usr/include/bits/socket2.h:45
#2 ... in readchannel (hgc=...) at hgclient.c:129
#3 ... in handleresponse (hgc=...) at hgclient.c:255
#4 ... in hgc_runcommand (hgc=..., args=<optimized>, argsize=<optimized>)
#5 ... in main (argc=...486922636, argv=..., envp=...) at chg.c:661
(gdb) frame 2
(gdb) p *hgc
$1 = {sockfd = 4, pid = 381152, ctx = {ch = 108 'l',
data = 0x7fb05164f010 "st):\nTraceback (most recent call last):\n"
"Traceback (most recent call last):\ne", maxdatasize = 1814065152,"
" datasize = 1814064225}, capflags = 16131}
This patch addresses the infinite loop issue by detecting continuously empty
responses and abort in that case.
Note that datasize can be translated to ['l', ' ', 'l', 'a']. Concatenate
datasize and data, it forms part of "Traceback (most recent call last):".
This may indicate a server-side channeledoutput issue. If it is a race
condition, we may want to use flock to protect the channels.
author | Jun Wu <quark@fb.com> |
---|---|
date | Mon, 18 Jul 2016 18:55:06 +0100 |
parents | 6cff2ac0ccb9 |
children | b181a650a886 |
files | contrib/chg/hgclient.c |
diffstat | 1 files changed, 6 insertions(+), 1 deletions(-) [+] |
line wrap: on
line diff
--- a/contrib/chg/hgclient.c Mon Jul 18 11:27:27 2016 -0700 +++ b/contrib/chg/hgclient.c Mon Jul 18 18:55:06 2016 +0100 @@ -126,10 +126,15 @@ return; /* assumes input request */ size_t cursize = 0; + int emptycount = 0; while (cursize < hgc->ctx.datasize) { rsize = recv(hgc->sockfd, hgc->ctx.data + cursize, hgc->ctx.datasize - cursize, 0); - if (rsize < 0) + /* rsize == 0 normally indicates EOF, while it's also a valid + * packet size for unix socket. treat it as EOF and abort if + * we get many empty responses in a row. */ + emptycount = (rsize == 0 ? emptycount + 1 : 0); + if (rsize < 0 || emptycount > 20) abortmsg("failed to read data block"); cursize += rsize; }