Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Message-ID: <400D106E.7040503@hhschmidt.de> Date: Tue, 20 Jan 2004 12:26:38 +0100 From: "H. Henning Schmidt" User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0 MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: cygwin source-patch fixing deadlock while writing to serial port Content-Type: multipart/mixed; boundary="------------070903060102040708090002" X-Provags-ID: kundenserver.de abuse AT kundenserver DOT de auth:e9b0f294f4b6d6da7efd4652948020da --------------070903060102040708090002 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit > > > > On Mon, Jan 19, 2004 at 10:08:39PM +0100, H. Henning Schmidt wrote: > > > > >I found a potential deadlock while writing to a serial port (e.g. > > > > >/dev/com1) that has been opened as O_RDWR. The deadlock occurs from time > > > > >to time (not sure about exact conditions) when I write to that port, > > > > >while there is data coming in (e.g. from an external device) and I do > > > > >not read away that data fast enough from the port. > > > > > > > > > >I did provide a test case a while ago in > > > > >http://sources.redhat.com/ml/cygwin/2003-03/msg01529.html. I digged into > > > > >the issue some more now and found that the executing thread got > > > > >sometimes deadlocked in fhandler_serial::raw_write(). It basically ends > > > > >up in a for(;;) loop and just never hits the break; > > > > > > > > Exactly. When the input buffer overflows, all serial communications cease > > > and calls exit with ERROR_OPERATION_ABORTED. If you only call write, then > > > the ClearCommError() necessary to start things up again is never called, > > > and you stick in that infinite loop. > > > > Ok, I will be happy to learn what the proper fix looks like in your opinion. > > > When I can get to that code, I'll post it. > > > However, I have a hard time accepting the fact that I cannot write OUT the > > serial port because my INput buffer overflows. At least when I have switched > > off all kinds of flow control (which I have), then I consider these to be two > > independent streams that happen to share one filedesc. Or should I perhaps > > open two independent filedescriptors in this case, one write-only, one > > read-only? > > > > Are you saying that the underlying Win-API enforces what you state above, or does > > this really make sense in some way and I just didn't get it yet? Your statement > > sounds to me like I am forced to keep a second thread around, and if only to get > > me out of that trap, once I get hung there ... does not sound too elegant to me ... > > > Yes. I was stating the limitations of the Win-API. > > > > > >The applied patch adds a safety exit to that for(;;) loop. > > > > >This fixes the testcase referenced above. > > > > > > > > Yuck! No, this is not the proper fix. > > > > > > > >This might not be the last problem lingering in the serial access code > > > > >(there are some FIXME tokens still around ...), but it is definitely an > > > > >improvement for me. I thought I'd share that with you. > > > > > > > > Can you convince me that this isn't just a band-aid? I don't understand > > > > why cygwin *shouldn't* hang in a situation like this. There are > > > > certainly similar situations where this happens on linux. > > > > > > > > Perhaps we need a low_priority_sleep (10) in the loop in that situation > > > > or something. > > > > > > > No. I have a partial patch for the above, but I am in the process of > > > getting a new Windows box and shuffling all my data. I'll try to submit > > > it when things settle if no one beats me to it. > > > > who am I to beat you ... > > but be assured that I'll be waiting eagerly for this patch. > > This issue has been kicking me for a long time (letting my app hang up every once in > > a while), and I really want get this fixed. So now that I know that there is a better > > way to do it ... that's what I'm longing for then :-) > > > While you're waiting, why don't you try what I said. Put a ClearCommError > call in the switch for the error stated. I bet you get 90+% of what you > want. ok. did this. it works. thanks a lot for the tip. With this change, my deadlock-prevention (which I still left in there, paranoyed as I am) just never comes into action. I attach my modified patch to this message for reference (to be applied to the original 1.5.5-1 sources) ;Henning --------------070903060102040708090002 Content-Type: text/plain; name="patch.cygwin-1.5.5-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="patch.cygwin-1.5.5-1" diff -cr cygwin-1.5.5-1.orig/winsup/cygwin/fhandler_serial.cc cygwin-1.5.5-1/winsup/cygwin/fhandler_serial.cc *** cygwin-1.5.5-1.orig/winsup/cygwin/fhandler_serial.cc Sat Jun 21 02:12:35 2003 --- cygwin-1.5.5-1/winsup/cygwin/fhandler_serial.cc Tue Jan 20 10:57:31 2004 *************** *** 153,174 **** int fhandler_serial::raw_write (const void *ptr, size_t len) { ! DWORD bytes_written; OVERLAPPED write_status; memset (&write_status, 0, sizeof (write_status)); write_status.hEvent = CreateEvent (&sec_none_nih, TRUE, FALSE, NULL); ProtectHandle (write_status.hEvent); ! for (;;) { if (WriteFile (get_handle (), ptr, len, &bytes_written, &write_status)) break; switch (GetLastError ()) { ! case ERROR_OPERATION_ABORTED: ! continue; case ERROR_IO_PENDING: break; default: --- 153,179 ---- int fhandler_serial::raw_write (const void *ptr, size_t len) { ! DWORD bytes_written = 0; OVERLAPPED write_status; memset (&write_status, 0, sizeof (write_status)); write_status.hEvent = CreateEvent (&sec_none_nih, TRUE, FALSE, NULL); ProtectHandle (write_status.hEvent); ! for (int prevent_deadlock=0 ; prevent_deadlock<10 ; prevent_deadlock++) { if (WriteFile (get_handle (), ptr, len, &bytes_written, &write_status)) break; switch (GetLastError ()) { ! case ERROR_OPERATION_ABORTED: { ! DWORD ev; ! COMSTAT st; ! if (!ClearCommError (get_handle (), &ev, &st)) ! __seterrno (); ! bytes_written = 0; ! continue; } case ERROR_IO_PENDING: break; default: --------------070903060102040708090002 Content-Type: text/plain; charset=us-ascii -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/ --------------070903060102040708090002--