X-Spam-Check-By: sourceware.org Subject: Re: Race condition spawning childs/pipe stuff? From: Max Kaehn To: cygwin AT cygwin DOT com Content-Type: text/plain Date: Fri, 10 Feb 2006 10:27:13 -0800 Message-Id: <1139596033.4045.8.camel@fulgurite> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com In , Robert Michelson wrote: > I seem to encounter a race condition when running large recursive build > processes (make). > Occasionally, the build process hangs with a spawned child (sh.exe) > eating with 100% user cpu. > It seems the build command itself (spawned make) finished but > child/parent? shell doesnt exit. > > When i kill sh.exe manually, the (recursive) build process continues and > finishes. > I suspect some kind of race condition somewhere in pipe stuff. I've been getting the same thing with Cygwin 1.5.19-3, but it's extremely hard to reproduce: when doing endurance testing by repeatedly building Mozilla, it only happened a single time on build #518 out of 570 (thus far). Like Robert, I've found the problem to be in sync_proc_pipe(), looping on low_priority_sleep(). This is the stack trace once I munge $ebp and $eip to the highest-on-the-stack pair that makes any sense: #0 0x6106f0c7 in _pinfo::sync_proc_pipe () from /usr/bin/cygwin1.dll #1 0x610972a9 in spawn_guts () from /usr/bin/cygwin1.dll #2 0x61097655 in spawnve () from /usr/bin/cygwin1.dll #3 0x61018c6b in execve () from /usr/bin/cygwin1.dll #4 0x6108dd7f in _sigfe () from /usr/bin/cygwin1.dll #5 0x004714c8 in ?? () #6 0x004715ec in ?? () #7 0xfffffffd in ?? () #8 0x00000002 in ?? () #9 0x0022eb98 in ?? () #10 0x004035c1 in fhandler_pipe::get_guard () #11 0x004715ec in ?? () #12 0x00470a24 in ?? () #13 0x004714c8 in ?? () #14 0x0041007c in fhandler_pipe::get_guard () #15 0x00000124 in ?? () #16 0x00000000 in ?? () from Can anyone suggest other useful data to gather for the next time this happens? Thanks, Max -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/