delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2006/02/23/16:33:58

X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
Subject: RE: Shells hang during script execution
Date: Thu, 23 Feb 2006 16:33:45 -0500
Message-ID: <A7E7241463A43B46B90F37197A667AE3055185@STEELPO.steeleye.com>
From: "Ernie Coskrey" <Ernie DOT Coskrey AT steeleye DOT com>
To: "Ernie Coskrey" <Ernie DOT Coskrey AT steeleye DOT com>, <cygwin AT cygwin DOT com>
Cc: "Paul Clements" <Paul DOT Clements AT steeleye DOT com>
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id k1NLXvDF010469

There are two hang conditions that we've identified and have developed fixes for.  This is a description of the first of the two along with a patch; I'll follow up with a description and patch for the second.


If a signal can't be handled because it is blocked, it gets queued (on 
the process's "sigq") to be handled later. Now, whenever the process's 
signal mask changes (e.g., the signal in question gets unblocked), an 
attempt is made to handle all the queued signals (i.e., a signal flush 
occurs). However, if the queueing of the blocked signal happens right 
after the signal mask change, then we miss the signal. This causes the 
process to hang. The signal is on the queue, but the process doesn't 
know to check for it. The process just hangs until another signal gets 
sent to it.

The workaround is basically to force the signal queue to be rescanned 
(flushed) whenever we add something to it, so a queued signal is never 
missed.


--- sigproc.cc.ORIG	2006-02-16 14:02:42.814320000 -0500
+++ sigproc.cc	2006-02-22 10:55:20.327209900 -0500
@@ -1130,6 +1130,7 @@
 	case __SIGNOHOLD:
 	case __SIGFLUSH:
 	case __SIGFLUSHFAST:
+flush:
 	  sigq.reset ();
 	  while ((q = sigq.next ()))
 	    {
@@ -1150,6 +1151,8 @@
 	  else
 	    {
 	      int sig = pack.si.si_signo;
+	      if (sig == SIGCHLD)
+		clearwait = true;
 	      // FIXME: REALLY not right when taking threads into consideration.
 	      // We need a per-thread queue since each thread can have its own
 	      // list of blocked signals.  CGF 2005-08-24
@@ -1165,10 +1168,11 @@
 			system_printf ("Failed to arm signal %d from pid %d", pack.sig, pack.pid);
 #endif
 		      sigq.add (pack);	// FIXME: Shouldn't add this in !sh condition
+		      goto flush; // signal may have become unblocked while
+		                  // we were processing it (before we added
+			          // it to the sigq) -- flush sigq to be sure	
 		    }
 		}
-	      if (sig == SIGCHLD)
-		clearwait = true;
 	    }
 	  break;
 	}

> -----Original Message-----
> From: Ernie Coskrey 
> Sent: Friday, February 10, 2006 1:31 PM
> To: Ernie Coskrey; 'cygwin AT cygwin DOT com'
> Subject: RE: Shells hang during script execution
> 
> 
> We've been able to narrow this down some more.  The shell 
> gets hung in sigsuspend(), waiting for SIGCHLD.  We've 
> verified that the process that's executed as part of the 
> command substitution does complete, and returns EOF, and the 
> shell (we're testing with pdksh) goes into sigsuspend and 
> never comes out.
> 
> If we execute "kill -CHLD <pid>", the shell resumes its processing.
> 
> I'm going to continue to look into this - if anybody has any 
> insight into how SIGCHLD might be getting lost, please let me 
> know.  Thanks!
> 
> Ernie Coskrey
> 
> 
> -----Original Message-----
> From: Ernie Coskrey
> Sent: Wed 2/1/2006 3:27 PM
> To: 'cygwin AT cygwin DOT com'
> Subject: Shells hang during script execution
>  
> I've run into problems with shell scripts hanging during 
> execution for no apparent reason.  I've narrowed down my test 
> case to two simple shell scripts.  To reproduce the problem, 
> I ran three instances of the "top.sh" script included here, 
> and after a bit (30 minutes to an hour or so) I'll see that 
> one or two of the shells have just stopped in their tracks.
> 
> Here are the scripts:
> 
> ----<top.sh>----
> dir=$1
> loops=$2
> 
> for loop in `seq 1 $loops`
> do
>         x=`./subtest.sh $dir`
>         date
>         echo loop $loop
> done
> 
> ----<subtest.sh>----
> for j in `ls $1`
> do
>         if [ `echo $j | egrep -i "A|B" | wc -l` -ne 0 ]
>         then
>                 echo $j
>         fi
> done
> echo subtest1 done >&2
> 
> --------
> 
> I then ran three bash shells.  The commands I ran, 
> simultaneously, were:
> 
> 1) ./top.sh C:/ 600
> 2) ./top.sh C:/windows 300
> 3) ./top.sh C:/windows/system32 100
> 
> These ran for about 45 minutes, and then I noticed that two 
> of them (1 and 2 above) had stopped printing any output.  The 
> third was still moving along.  The third completed, but the 
> first two never progressed any further.  I used Process 
> Explorer from ntinternals.com, and saw that the two hung 
> shells were not using any CPU, and did not have any child 
> processes created; they were simply stopped.  If a process 
> dump would be helpful, I can generate one with Windbg or gdb.
> 
> -----
> Ernie Coskrey       SteelEye Technology, Inc.    803-461-3875
> 
> 

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019