delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2006/02/23/16:35:22

X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
Subject: RE: Shells hang during script execution
Date: Thu, 23 Feb 2006 16:35:12 -0500
Message-ID: <A7E7241463A43B46B90F37197A667AE3055186@STEELPO.steeleye.com>
From: "Ernie Coskrey" <Ernie DOT Coskrey AT steeleye DOT com>
To: "Ernie Coskrey" <Ernie DOT Coskrey AT steeleye DOT com>, <cygwin AT cygwin DOT com>
Cc: "Paul Clements" <Paul DOT Clements AT steeleye DOT com>
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id k1NLZLEB010614

Here's a description of a second hang condition we were encountering, along with a patch for it.


The application (pdksh in this case) does a read on a pipe, which eventually calls pipe.cc fhandler_pipe::read in Thread 1.  This creates a new cygthread with "read_pipe()" as the function.  Then it calls th->detach(read_state).

When the hang occurs, the new thread gets terminated early, before cygthread::stub() can call "callfunc()".  You see the error message "erroneous thread activation".  I'm not sure what's causing the thread to fail activation, but the result is, the read_state semaphore never gets signalled.

So Thread 1 goes into cygthread::detach(read_state).  The first thing that happens is signal_arrived is set.  The old code would then set n=1, but leave howlong=INFINITE.  My change sets howlong=100 in this case.  Then, when TIMEOUT occurs, we look to see if __name is not NULL.  Since the thread was terminated, its name is now NULL, so it doesn't decrement i, and eventually you break out of the loop and clean up as expected.



--- cygthread.cc.ORIG	2006-02-22 10:57:42.123931300 -0500
+++ cygthread.cc	2006-02-23 15:50:23.894461500 -0500
@@ -374,10 +374,12 @@
 		break;
 	      case WAIT_OBJECT_0 + 1:
 		n = 1;
-		if (i--)
-		  howlong = 50;
+		i--;
+		howlong = 100;
 		break;
 	      case WAIT_TIMEOUT:
+		if(!i && __name)
+			i--;
 		break;
 	      default:
 		if (!exiting)

> -----Original Message-----
> From: Ernie Coskrey 
> Sent: Friday, February 10, 2006 1:31 PM
> To: Ernie Coskrey; 'cygwin AT cygwin DOT com'
> Subject: RE: Shells hang during script execution
> 
> 
> We've been able to narrow this down some more.  The shell 
> gets hung in sigsuspend(), waiting for SIGCHLD.  We've 
> verified that the process that's executed as part of the 
> command substitution does complete, and returns EOF, and the 
> shell (we're testing with pdksh) goes into sigsuspend and 
> never comes out.
> 
> If we execute "kill -CHLD <pid>", the shell resumes its processing.
> 
> I'm going to continue to look into this - if anybody has any 
> insight into how SIGCHLD might be getting lost, please let me 
> know.  Thanks!
> 
> Ernie Coskrey
> 
> 
> -----Original Message-----
> From: Ernie Coskrey
> Sent: Wed 2/1/2006 3:27 PM
> To: 'cygwin AT cygwin DOT com'
> Subject: Shells hang during script execution
>  
> I've run into problems with shell scripts hanging during 
> execution for no apparent reason.  I've narrowed down my test 
> case to two simple shell scripts.  To reproduce the problem, 
> I ran three instances of the "top.sh" script included here, 
> and after a bit (30 minutes to an hour or so) I'll see that 
> one or two of the shells have just stopped in their tracks.
> 
> Here are the scripts:
> 
> ----<top.sh>----
> dir=$1
> loops=$2
> 
> for loop in `seq 1 $loops`
> do
>         x=`./subtest.sh $dir`
>         date
>         echo loop $loop
> done
> 
> ----<subtest.sh>----
> for j in `ls $1`
> do
>         if [ `echo $j | egrep -i "A|B" | wc -l` -ne 0 ]
>         then
>                 echo $j
>         fi
> done
> echo subtest1 done >&2
> 
> --------
> 
> I then ran three bash shells.  The commands I ran, 
> simultaneously, were:
> 
> 1) ./top.sh C:/ 600
> 2) ./top.sh C:/windows 300
> 3) ./top.sh C:/windows/system32 100
> 
> These ran for about 45 minutes, and then I noticed that two 
> of them (1 and 2 above) had stopped printing any output.  The 
> third was still moving along.  The third completed, but the 
> first two never progressed any further.  I used Process 
> Explorer from ntinternals.com, and saw that the two hung 
> shells were not using any CPU, and did not have any child 
> processes created; they were simply stopped.  If a process 
> dump would be helpful, I can generate one with Windbg or gdb.
> 
> 
> -----
> Ernie Coskrey       SteelEye Technology, Inc.    803-461-3875
> 
> 

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019