X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org From: "Williams, David" To: "cygwin AT cygwin DOT com" Date: Wed, 30 Apr 2008 10:16:02 -0700 Subject: RE: Problem with cygserver and sysv message queues: msgsnd() blocks forever. Message-ID: References: <20080430105846 DOT GO23852 AT calimero DOT vinschen DOT de> In-Reply-To: <20080430105846.GO23852@calimero.vinschen.de> Accept-Language: en-US Content-Language: en-US acceptlanguage: en-US Content-Type: text/plain; charset="Windows-1252" MIME-Version: 1.0 X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id m3UHGYR4022601 Yes, I can patch and build the sources, and will test the patch. I can see that this will work, and is probably the least disruptive way to fix it. I'm bothered a little bit by the fixed timeout value, although this is an exceptional case, which shouldn't occur in a properly tuned and managed system. My thoughts for a fix were centered around replacing the msqptr ident parameter with a resource specific identifier that would allow freeing a resource by one queue to wake another. However, such a fix would require much regression testing, and STILL might need a timeout like this as an ultimate safety net. Besides, we likely want to continue tracking the BSD source. I'm currently building and testing using the cygwin-1.5.25-12 release tarball. Would it be more helpful for me to pull the CVS head down to test this? Thanks for the quick reply. I'm glad to be of some help. Dave Williams -----Original Message----- From: cygwin-owner AT cygwin DOT com [mailto:cygwin-owner AT cygwin DOT com] On Behalf Of Corinna Vinschen Sent: Wednesday, April 30, 2008 3:59 AM To: cygwin AT cygwin DOT com Subject: Re: Problem with cygserver and sysv message queues: msgsnd() blocks forever. On Apr 29 17:57, Williams, David wrote: > I've been debugging a problem with msgsnd() hanging. If > there are no free msghdrs available, msgsnd() blocks with > msleep(). Unfortunately, the only way it can unblock is > if that specific queue frees a msghdr. If the queue in > question is empty, this never occurs. > [...] > It's possible to work around this by using the flag IPC_NOWAIT > in msgsnd, and polling until the message is sent, but my feeling > is that the library call should not hang like this. > [...] > The call to msleep() above passes msqptr (the queue handle) > as the Ident pointer. Each of the calls to wakeup() in > sysv_msg.cc also passes msgptr as the ident. This means that > if the msghdr resource is free'd by a queue other than the one > blocked, it won't wake up msgsnd(). Since doqueue's queue is > empty, there is no way to wake up msgsnd(). > [...] > I haven't been able to spot a way to fix this behavior without > significantly changing the block/release mechanism. Has anyone > seen this before? Have I missed something? Is this simply a known > limitation, with IPC_NOWAIT the only way to deal with it? Right now, yes. As you have probably seen when examining the sources, the code is pretty much the FreeBSD version, just with a thin and almost tasteless Cygwin topping. The code is basically the version 1.52 of the original FreeBSD code with a few patches applied up to version 1.60. FreeBSD is at 1.71. I inspected the FreeBSD ChangeLogs and found this change in version 1.65: Fix msgsnd(3)/msgrcv(3) deadlock under heavy resource pressure by timing out msgsnd and rechecking resources. This problem was found while I was running Linux Test Project test suite (test cases: msgctl08, msgctl09). [...] This appears to be their solution to the above problem. The basic change is the call to msleep. The last parameter is changed from 0 (no timeout) to a value called hz. See http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/sysv_msg.c.diff?r1=1.64;r2=1.65 hz is an external variable in the code which is the system's clock frequency. Are you set up to build the Cygwin sources? Would you mind to rebuild cygserver with this patch applied and test without IPC_NOWAIT again? Index: sysv_msg.cc =================================================================== RCS file: /cvs/src/src/winsup/cygserver/sysv_msg.cc,v retrieving revision 1.3 diff -u -p -r1.3 sysv_msg.cc --- sysv_msg.cc 9 Jan 2006 15:10:14 -0000 1.3 +++ sysv_msg.cc 30 Apr 2008 10:57:58 -0000 @@ -722,10 +722,14 @@ msgsnd(struct thread *td, struct msgsnd_ } DPRINTF(("goodnight\n")); error = msleep(msqptr, &msq_mtx, (PZERO - 4) | PCATCH, - "msgwait", 0); + "msgsnd", 50); DPRINTF(("good morning, error=%d\n", error)); if (we_own_it) msqptr->msg_perm.mode &= ~MSG_LOCKED; + if (error == EWOULDBLOCK) { + DPRINTF(("timed out\n")); + continue; + } if (error != 0) { DPRINTF(("msgsnd: interrupted system call\n")); #ifdef __CYGWIN__ @@ -1079,11 +1083,11 @@ msgrcv(struct thread *td, struct msgrcv_ DPRINTF(("msgrcv: goodnight\n")); error = msleep(msqptr, &msq_mtx, (PZERO - 4) | PCATCH, - "msgwait", 0); + "msgrcv", 0); DPRINTF(("msgrcv: good morning (error=%d)\n", error)); if (error != 0) { - DPRINTF(("msgsnd: interrupted system call\n")); + DPRINTF(("msgrcv: interrupted system call\n")); #ifdef __CYGWIN__ if (error != EIDRM) #endif /* __CYGWIN__ */ Thanks for the report, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/ -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/