delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2006/02/21/16:56:35

X-Spam-Check-By: sourceware.org
In-Reply-To: <00fc01c6371d$789be620$a501a8c0@CAM.ARTIMI.COM>
References: <00fc01c6371d$789be620$a501a8c0 AT CAM DOT ARTIMI DOT COM>
Mime-Version: 1.0 (Apple Message framework v746.2)
Message-Id: <897AB4F9-DECD-4714-8C45-5DA39C096F99@rehley.net>
Cc: "'Cygwin List'" <cygwin AT cygwin DOT com>
From: Peter Rehley <peter AT rehley DOT net>
Subject: Re: Hanging at GetModuleFileName in inside_kernel function
Date: Tue, 21 Feb 2006 13:56:18 -0800
To: "Dave Korn" <dave DOT korn AT artimi DOT com>
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On Feb 21, 2006, at 11:31 AM, Dave Korn wrote:

> On 21 February 2006 19:06, Peter Rehley wrote:
>
>> Hi,
>>
>> Well, for my particular hang issue cygwin is hanging inside the
>> inside_kernel function on the GetModuleFileName call.  I tracked this
>> down by adding debug statements (strace.prntf) until I got to the
>> point where the debug print before GetModuleFileName would appear and
>> the ones after it didn't. This is consistent.  Each hang is happening
>> at this spot.
>>
>> However, this doesn't explain what is happening, but only where.
>>
>> I also observed that the times it hung were the only times
>> inside_kernel was actually called.
>>
>> I'm still trying to get more information.
>> Peter
>>
>> p.s. using cygwin snapshot 1.5.19-20060205.
>
>
>   http://cygwin.com/acronyms#PPAST.
>
>   Seriously.  Nobody can debug your code by ESP or remote control.   
> We can't
> even be sure that what you report is correct if we can't reproduce it.
Dang it.  I forgot to include the reference.  This is in reference to  
the hanging issue I mentioned earlier. http://cygwin.com/ml/cygwin/ 
2006-01/msg00549.html

Basically, when I use a configure script in a loop, at some point one  
of the subshells launched will hang and never return.  Usually it  
takes several hours for the script to hang, but when I run another  
configure script in a different bash window I can get the first  
script to hang within a few minutes.

When the script hangs it can't be stopped by using ctrl-c, and can't  
be killed using the cygwin kill.  It can be killed using task  
manager, and it can be resumed using the process program from http://  
www.beyondlogic.org/solutions/processutil/processutil.htm.

And this only happens on our dual pentium windows 2000 with sp4  
machines.  The other windows machines we use never hang.  These are  
windows xp pro sp1, windows xp pro sp2, and windows 2000 sp4 machines.

> If you
> don't show us your code, we don't even know if you've literally  
> bracketed the
> GetModuleFileName call with debug prints or if you've just placed  
> one before
> and one after the if...else if .. ladder, in which case maybe it's
> strncasematch going wrong.
Here is my modified inside_kernel function.  I did have to rearrange  
the conditional so I could add additional debugging information.

static bool
inside_kernel (CONTEXT *cx)
{
   int res;
   MEMORY_BASIC_INFORMATION m;

   strace.prntf (_STRACE_SYSTEM, NULL, "\tChecking virtual");
   memset (&m, 0, sizeof m);
   if (!VirtualQuery ((LPCVOID) cx->Eip, &m, sizeof m))
     sigproc_printf ("couldn't get memory info, pc %p, %E", cx->Eip);
   strace.prntf (_STRACE_SYSTEM, NULL, "\tDone virtual check");

   char *checkdir = (char *) alloca (windows_system_directory_length  
+ 4);
   memset (checkdir, 0, sizeof (checkdir));
   strace.prntf (_STRACE_SYSTEM, NULL, "\tDone alloca");

# define h ((HMODULE) m.AllocationBase)
   /* Apparently Windows 95 can sometimes return bogus addresses from
      GetThreadContext.  These resolve to a strange allocation base.
      These should *never* be treated as interruptible. */
   if (!h || m.State != MEM_COMMIT)
     {
     strace.prntf (_STRACE_SYSTEM, NULL, "\tno h or not MEM_COMMIT");
     res = false;
     }
   else
     {
     strace.prntf (_STRACE_SYSTEM, NULL, "\tchecking module");
     if (h == user_data->hmodule)
       {
       strace.prntf (_STRACE_SYSTEM, NULL, "\th == user_date->hmodule");
       res = true;
       }
     else
       {
       strace.prntf (_STRACE_SYSTEM, NULL, "\tchecking getmodulename");
       if (!GetModuleFileName (h, checkdir,  
windows_system_directory_length + 2))
         {
         strace.prntf (_STRACE_SYSTEM, NULL, "\tGetModuleFileName % 
d",res);
         res = true;
         }
       else
         {
         strace.prntf (_STRACE_SYSTEM, NULL, "\tnone of the above");
         res = !strncasematch (windows_system_directory, checkdir,
                           windows_system_directory_length);
         }
       }
     }

   sigproc_printf ("pc %p, h %p, interruptible %d", cx->Eip, h, res);
   strace.prntf (_STRACE_SYSTEM, NULL, "\tDone inside_kernel");
# undef h
   return res;
}


> How do you know it hung in the function, rather
> than returning from the function and then going wrong, just as a  
> for-instance?
> How do you know it's really hung, rather than taking a long time to  
> time-out
> querying a no-longer-present network drive or something like that?
It's sortof hung.  It won't return even after a few days, but using  
the process program to resume will let the hung program continue.   
However, when it resumes it won't print the next debug line.  I don't  
know what happens after that point, but the script continues with no  
errors.

> How do we
> know whether something earlier in your code hasn't trashed the  
> contents of
> memory so that GetModuleFileName goes off into lala-land?
>
>   This is why posting a testcase is worthwhile, and a report that  
> says "Umm it
> don't work" is no use at all.  What, were you really expecting  
> someone to pipe
> up with "Oh, GetModuleFileName just doesn't work, that's well known"?
>
>   I mean, ultimately, either Cygwin is calling the function  
> correctly with
> valid parameters, in which case it's a bug in windows, or it isn't,  
> in which
> case the bug is in cygwin.  You should have used some %-specifiers  
> in those
> printfs to dump the values of some of the variables, then you might  
> have some
> information to go on.  Or run the whole thing under a debugger and / 
> see/ where
> it actually goes.

I'll check the parameters with the %-specifiers.  I've tried gdb  
already and didn't make any progress with it.  When I attach gdb to  
the hung program gdb hangs too.  At least until I resume the attached  
program.  I've tried setting breakpoint at points around areas I  
think are hang locations, but either the program exits without  
hitting the breakpoint or I'm in code that I can't step through.  In  
the latest area gdb exits without hitting breakpoints even though I  
set them on the hang line and the debug statements after the hang.

Thanks for your feedback.
Peter


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019