X-Spam-Check-By: sourceware.org In-Reply-To: <00fc01c6371d$789be620$a501a8c0@CAM.ARTIMI.COM> References: <00fc01c6371d$789be620$a501a8c0 AT CAM DOT ARTIMI DOT COM> Mime-Version: 1.0 (Apple Message framework v746.2) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <897AB4F9-DECD-4714-8C45-5DA39C096F99@rehley.net> Cc: "'Cygwin List'" Content-Transfer-Encoding: 7bit From: Peter Rehley Subject: Re: Hanging at GetModuleFileName in inside_kernel function Date: Tue, 21 Feb 2006 13:56:18 -0800 To: "Dave Korn" X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Feb 21, 2006, at 11:31 AM, Dave Korn wrote: > On 21 February 2006 19:06, Peter Rehley wrote: > >> Hi, >> >> Well, for my particular hang issue cygwin is hanging inside the >> inside_kernel function on the GetModuleFileName call. I tracked this >> down by adding debug statements (strace.prntf) until I got to the >> point where the debug print before GetModuleFileName would appear and >> the ones after it didn't. This is consistent. Each hang is happening >> at this spot. >> >> However, this doesn't explain what is happening, but only where. >> >> I also observed that the times it hung were the only times >> inside_kernel was actually called. >> >> I'm still trying to get more information. >> Peter >> >> p.s. using cygwin snapshot 1.5.19-20060205. > > > http://cygwin.com/acronyms#PPAST. > > Seriously. Nobody can debug your code by ESP or remote control. > We can't > even be sure that what you report is correct if we can't reproduce it. Dang it. I forgot to include the reference. This is in reference to the hanging issue I mentioned earlier. http://cygwin.com/ml/cygwin/ 2006-01/msg00549.html Basically, when I use a configure script in a loop, at some point one of the subshells launched will hang and never return. Usually it takes several hours for the script to hang, but when I run another configure script in a different bash window I can get the first script to hang within a few minutes. When the script hangs it can't be stopped by using ctrl-c, and can't be killed using the cygwin kill. It can be killed using task manager, and it can be resumed using the process program from http:// www.beyondlogic.org/solutions/processutil/processutil.htm. And this only happens on our dual pentium windows 2000 with sp4 machines. The other windows machines we use never hang. These are windows xp pro sp1, windows xp pro sp2, and windows 2000 sp4 machines. > If you > don't show us your code, we don't even know if you've literally > bracketed the > GetModuleFileName call with debug prints or if you've just placed > one before > and one after the if...else if .. ladder, in which case maybe it's > strncasematch going wrong. Here is my modified inside_kernel function. I did have to rearrange the conditional so I could add additional debugging information. static bool inside_kernel (CONTEXT *cx) { int res; MEMORY_BASIC_INFORMATION m; strace.prntf (_STRACE_SYSTEM, NULL, "\tChecking virtual"); memset (&m, 0, sizeof m); if (!VirtualQuery ((LPCVOID) cx->Eip, &m, sizeof m)) sigproc_printf ("couldn't get memory info, pc %p, %E", cx->Eip); strace.prntf (_STRACE_SYSTEM, NULL, "\tDone virtual check"); char *checkdir = (char *) alloca (windows_system_directory_length + 4); memset (checkdir, 0, sizeof (checkdir)); strace.prntf (_STRACE_SYSTEM, NULL, "\tDone alloca"); # define h ((HMODULE) m.AllocationBase) /* Apparently Windows 95 can sometimes return bogus addresses from GetThreadContext. These resolve to a strange allocation base. These should *never* be treated as interruptible. */ if (!h || m.State != MEM_COMMIT) { strace.prntf (_STRACE_SYSTEM, NULL, "\tno h or not MEM_COMMIT"); res = false; } else { strace.prntf (_STRACE_SYSTEM, NULL, "\tchecking module"); if (h == user_data->hmodule) { strace.prntf (_STRACE_SYSTEM, NULL, "\th == user_date->hmodule"); res = true; } else { strace.prntf (_STRACE_SYSTEM, NULL, "\tchecking getmodulename"); if (!GetModuleFileName (h, checkdir, windows_system_directory_length + 2)) { strace.prntf (_STRACE_SYSTEM, NULL, "\tGetModuleFileName % d",res); res = true; } else { strace.prntf (_STRACE_SYSTEM, NULL, "\tnone of the above"); res = !strncasematch (windows_system_directory, checkdir, windows_system_directory_length); } } } sigproc_printf ("pc %p, h %p, interruptible %d", cx->Eip, h, res); strace.prntf (_STRACE_SYSTEM, NULL, "\tDone inside_kernel"); # undef h return res; } > How do you know it hung in the function, rather > than returning from the function and then going wrong, just as a > for-instance? > How do you know it's really hung, rather than taking a long time to > time-out > querying a no-longer-present network drive or something like that? It's sortof hung. It won't return even after a few days, but using the process program to resume will let the hung program continue. However, when it resumes it won't print the next debug line. I don't know what happens after that point, but the script continues with no errors. > How do we > know whether something earlier in your code hasn't trashed the > contents of > memory so that GetModuleFileName goes off into lala-land? > > This is why posting a testcase is worthwhile, and a report that > says "Umm it > don't work" is no use at all. What, were you really expecting > someone to pipe > up with "Oh, GetModuleFileName just doesn't work, that's well known"? > > I mean, ultimately, either Cygwin is calling the function > correctly with > valid parameters, in which case it's a bug in windows, or it isn't, > in which > case the bug is in cygwin. You should have used some %-specifiers > in those > printfs to dump the values of some of the variables, then you might > have some > information to go on. Or run the whole thing under a debugger and / > see/ where > it actually goes. I'll check the parameters with the %-specifiers. I've tried gdb already and didn't make any progress with it. When I attach gdb to the hung program gdb hangs too. At least until I resume the attached program. I've tried setting breakpoint at points around areas I think are hang locations, but either the program exits without hitting the breakpoint or I'm in code that I can't step through. In the latest area gdb exits without hitting breakpoints even though I set them on the hang line and the debug statements after the hang. Thanks for your feedback. Peter -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/