Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Subject: Race condition spawning childs/pipe stuff? Date: Wed, 19 Oct 2005 10:43:26 +0200 Message-ID: From: "Michelsen, Robert" To: X-IsSubscribed: yes Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id j9J8hala022127 Hello, I seem to encounter a race condition when running large recursive build processes (make). Occasionally, the build process hangs with a spawned child (sh.exe) eating with 100% user cpu. It seems the build command itself (spawned make) finished but child/parent? shell doesnt exit. When i kill sh.exe manually, the (recursive) build process continues and finishes. I suspect some kind of race condition somewhere in pipe stuff. The condition itself is not reproducable. Cygwin dll is: 1.5.19, api ver: 0.138, build date: 2005-10-03 13:32 I attached gdb to process and examined threads: ----------- snip ---- $ ./gdb GNU gdb 6.3.50.20050926 .... (gdb) attach 3048 Attaching to process 3048 [Switching to thread 3048.0xca8] (gdb) info threads * 3 thread 3048.0xca8 0x7c911231 in ntdll!DbgUiConnectToDbg () from /cygdrive/c/WINDOWS/system32/ntdll.dll 2 thread 3048.0xd04 0x7c91eb94 in ntdll!LdrAccessResource () from /cygdrive/c/WINDOWS/system32/ntdll.dll 1 thread 3048.0xf90 0x7c91eb94 in ntdll!LdrAccessResource () from /cygdrive/c/WINDOWS/system32/ntdll.dll (gdb) thread 1 [Switching to thread 1 (thread 3048.0xf90)]#0 0x7c91eb94 in ntdll!LdrAccessResource () from /cygdrive/c/WINDOWS/system32/ntdll.dll (gdb) bt #0 0x7c91eb94 in ntdll!LdrAccessResource () from /cygdrive/c/WINDOWS/system32/ntdll.dll #1 0x7c91ea53 in ntdll!ZwYieldExecution () from /cygdrive/c/WINDOWS/system32/ntdll.dll #2 0x7c81e956 in SwitchToThread () from /cygdrive/c/WINDOWS/system32/kernel32.dll #3 0x61054215 in low_priority_sleep (secs=0) at /netrel/src/cygwin-snapshot-20051003-1/winsup/cygwin/miscfuncs.cc:245 #4 0xfffffffe in ?? () (gdb) thread 2 [Switching to thread 2 (thread 3048.0xd04)]#0 0x7c91eb94 in ntdll!LdrAccessResource () from /cygdrive/c/WINDOWS/system32/ntdll.dll (gdb) bt #0 0x7c91eb94 in ntdll!LdrAccessResource () from /cygdrive/c/WINDOWS/system32/ntdll.dll #1 0x7c91e288 in ntdll!ZwReadFile () from /cygdrive/c/WINDOWS/system32/ntdll.dll #2 0x7c801875 in ReadFile () from /cygdrive/c/WINDOWS/system32/kernel32.dll #3 0x0000074c in ?? () (gdb) thread 3 [Switching to thread 3 (thread 3048.0xca8)]#0 0x7c911231 in ntdll!DbgUiConnectToDbg () from /cygdrive/c/WINDOWS/system32/ntdll.dll (gdb) bt #0 0x7c911231 in ntdll!DbgUiConnectToDbg () from /cygdrive/c/WINDOWS/system32/ntdll.dll #1 0x7c9607a8 in ntdll!KiIntSystemCall () from /cygdrive/c/WINDOWS/system32/ntdll.dll #2 0x00000005 in ?? () (gdb) q The program is running. Quit anyway (and detach it)? (y or n) y Detaching from program: , Pid 3048 ----------- snip ---- Thread 1 seems to be the eater. Gdb doesnt reveal much info so i used my favorite win32 user mode debugger, ollydbg: ----------- snip ---- Threads Ident Entry Data block Last error Status Priority User time System time 00000388 7C96077B 7FFDD000 ERROR_SUCCESS (00000000) Active 32 + 0 0.0000 s 0.0000 s 00000D04 7C810856 7FFDE000 ERROR_SUCCESS (00000000) Active 32 + 0 0.0000 s 0.0000 s 00000F90 00000000 7FFDF000 ERROR_SUCCESS (00000000) Active 32 + 0 52.8437 s 94.5156 s ----------- snip ---- You see (main) thread 0xf90 is eating all the cpu. I examined the call stack and used gdb's "l/info" commands to get symbols (i have appropriate .dbg file) I manually added the symbols as comments "(xxxx)": ----------- snip ---- Call stack of main thread Address Stack Procedure Called from Frame 0022DD84 7C91EA53 Includes ntdll.KiFastSystemCallRet ntdll.7C91EA51 0022DD88 7C81E956 ntdll.ZwYieldExecution kernel32.7C81E950 0022DD8C 61054215 cygwin1.610F5138 cygwin1.61054210 (low_priority_sleep + 80) 0022DDAC 6106DF57 cygwin1.610541C0 (low_priority_sleep, miscfuncs.cc:230) cygwin1.6106DF52 (_pinfo::sync_proc_pipe() + 34) 0022DDBC 61095984 cygwin1.6106DF30 (_pinfo::sync_proc_pipe(), pinfo.cc:977) cygwin1.6109597F (spawn_guts(char const* ...) + 5263) 0022E99C 61095E35 ? cygwin1.610944F0 (spawn_guts(char const* ...), spawn.cc) cygwin1.61095E30 (spawnve + 224) 0022E998 0022E9CC 610188AB cygwin1.61095D50 (spawnve) cygwin1.610188A6 (execve + 38) 0022E9C8 ----------- snip ---- I searched the current cygwin sources and found following snippets ... ----- snip spawn.cc ---- static int __stdcall spawn_guts (const char * prog_arg, const char *const *argv, const char *const envp[], int mode) { ... /* If wr_proc_pipe doesn't exist then this process was not started by a cygwin process. So, we need to wait around until the process we've just "execed" dies. Use our own wait facility to wait for our own pid to exit (there is some minor special case code in proc_waiter and friends to accommodate this). If wr_proc_pipe exists, then it should be duplicated to the child. If the child has exited already, that's ok. The parent will pick up on this fact when we exit. dup_proc_pipe will close our end of the pipe. Note that wr_proc_pipe may also be == INVALID_HANDLE_VALUE. That will make dup_proc_pipe essentially a no-op. */ if (!newargv.win16_exe && myself->wr_proc_pipe) { myself->sync_proc_pipe (); /* Make sure that we own wr_proc_pipe just in case we've been previously execed. */ myself.zap_cwd (); myself->dup_proc_pipe (pi.hProcess); } ----- snip pinfo.cc ---- void _pinfo::sync_proc_pipe () { if (wr_proc_pipe && wr_proc_pipe != INVALID_HANDLE_VALUE) while (wr_proc_pipe_owner != GetCurrentProcessId ()) low_priority_sleep (0); } --------------------------- It seems "sync_proc_pipe" is looping forever because the condition "wr_proc_pipe_owner != GetCurrentProcessId ()" is satisfied but never left. I updated cygwin core several times but this kind of error persists. What gives? Regards, Robert Michelsen -- -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/