X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-0.2 required=5.0 tests=AWL,BAYES_50,TW_DB,TW_GJ,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Message-ID: <4DA702B4.2070407@ece.cmu.edu> Date: Thu, 14 Apr 2011 10:20:36 -0400 From: Ryan Johnson User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.15) Gecko/20110303 Lightning/1.0b2 Thunderbird/3.1.9 MIME-Version: 1.0 To: Ryan Johnson CC: Jon TURNEY , cygwin AT cygwin DOT com Subject: Re: Debugging help for fork failure: resource temporarily unavailable References: <4DA621E7 DOT 8040905 AT ece DOT cmu DOT edu> In-Reply-To: <4DA621E7.8040905@ece.cmu.edu> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On 2:59 PM, Ryan Johnson wrote: > I wrote a very simple program whose main() prints out the contents of > /proc/self/maps, forks, calls foo() and bar(), and finally (if the > parent) calls wait(). > > The trick is, foo() and bar() reside in cygfoo.dll and cygbar.dll > respectively, which I compiled to have the same base address: 0x66000000. > > The running binary often, but not always, results in those annoying > "exception::handle: Exception: STATUS_ACCESS_VIOLATION" messages (the > process otherwise appears to complete normally most of the time). > However, once in a while the child fails to spawn, with no particular > error message to advertise that fact. After downloading windbg, I was able to capture the stack traces of access violations. It turns out there's no retry code at all; it's just some access violations cause the exception handler to crash with a variable number of additional violations. At this point I'm officially in over my head. The output below is hopefully useful to somebody who knows a lot more than I do about the code which is involved here. Help? Ideas? Ryan The first access violation occurred in la-la land: WARNING: FrameIP not in any known module. Following frames may be wrong. 610203ea 0x5c0073 6111ccca cygwin1!dlerror+0x6aa 610050ab cygwin1!wordfree+0x4d4a 61007040 cygwin1!setprogname+0x36bb 61004c96 cygwin1!dll_crt0__FP11per_process+0x710 61004d3b cygwin1!setprogname+0x32a6 004013c2 cygwin1!setprogname+0x334b 00401015 image00000000_00400000+0x13c2 756633ca image00000000_00400000+0x1015 77619ed2 kernel32!BaseThreadInitThunk+0x12 77619ea5 ntdll32!RtlInitializeExceptionChain+0x63 00000000 ntdll32!RtlInitializeExceptionChain+0x36 Passing the above through gdb's disassembler: 0x5c0073 <_ZN10per_module9run_dtorsEv+24>: call *%eax : call 0x61144970 <__call_exitprocs> : call 0x6111ccb0 <_cygwin_exit_return+526>: call 0x61005090 : call *0x405120 : call 0x4013a0 kernel32!BaseThreadInitThunk+0x12 ... ntdll calls ... Continuing with windbg from there gave the following: cYgSiGw00f 11 0x2030 0x28CE8CHEAP[fork.exe]: HEAP: Free Heap block 2b3ac0 modified at 2b4000 after it was freed ntdll!EtwpCreateEtwThread+0x1721 Continuing on again: WARNING: Stack unwind information not available. Following frames may be wrong. 660010d9 cyggcc_s_1!_deregister_frame_info_bases+0x48 610203ea cygbar+0x10d9 6111ccca cygwin1!dlerror+0x6aa 77638fba ntdll32!RtlQueryEnvironmentVariable+0x241 77638e5c ntdll32!LdrShutdownProcess+0x141 75667a25 ntdll32!RtlExitUserProcess+0x74 6109d683 kernel32!ExitProcess+0x15 61005360 cygwin1!cygwin32_posix_path_list_p+0xa2f3 610283e1 cygwin1!setprogname+0x3970 61029034 cygwin1!getenv+0x4611 77626ab9 cygwin1!getenv+0x5264 77626a8b ntdll32!RtlDosSearchPath_Ustr+0xada 775f0143 ntdll32!RtlDosSearchPath_Ustr+0xaac 6111ccca ntdll32!KiUserExceptionDispatcher+0xf 610050ab cygwin1!wordfree+0x4d4a 61007040 cygwin1!setprogname+0x36bb (rest continues as in the first stack trace) Again decoding with gdb: <__gcc_deregister_frame+48>: movl $0x66003000,(%esp) <_ZN10per_module9run_dtorsEv+24>: call *%eax : call 0x61144970 <__call_exitprocs> ... ntdll calls... kernel32!ExitProcess+0x15 <_ZN5pinfo4exitEm+270>: call 0x61159c90 <_Z7do_exiti AT 4+587>: call 0x6109d570 <_ZN5pinfo4exitEm> <_ZN7_cygtls11signal_exitEi+284>: call 0x61005110 <_Z7do_exiti AT 4> <_ZN9exception6handleEP17_EXCEPTION_RECORDP15_exception_listP8_CONTEXTPv+1295>: call 0x610282c0 <_ZN7_cygtls11signal_exitEi> ... ntdll calls ... ntdll32!KiUserExceptionDispatcher+0xf ... below this is the stack trace from the original access violation ... Continuing yet again gives a third access violation inside ntdll32, due to a second call to KiUserExceptionDispatcher: WARNING: Stack unwind information not available. Following frames may be wrong. 0028c080 77626ab9 ntdll32!RtlUlonglongByteSwap+0x1605d 0028c0a4 77626a8b ntdll32!RtlDosSearchPath_Ustr+0xada 0028c154 775f0143 ntdll32!RtlDosSearchPath_Ustr+0xaac 0028c4bc 660010d9 ntdll32!KiUserExceptionDispatcher+0xf 0028c4cc 610203ea cygbar+0x10d9 ... continues as before ... At this point we're in an endless loop of access violations in the user exception dispatcher leading to the exception dispatcher raising an access violation... I figured this would lead to a stack overflow and detached. Surprisingly, the output of the cygwin app contained no mention of access violations. However, the errors prevented the child process from running global destructors in its dlls: Child exiting * * * fork.cpp fini Parent after fork (child: 7944) Parent exiting * * * fork.cpp fini * * * foo.cpp fini * * * bar.cpp fini Rerunning the process a few times turned up a different access violation, this one definitely fork-related: WARNING: Stack unwind information not available. Following frames may be wrong. 0028c7dc 61020d38 cygwin1!dlerror+0x74b 0028c7fc 61004c96 cygwin1!dlfork+0x808 0028ce64 00000000 cygwin1!setprogname+0x32a6 Decoded: <_ZN3dll4initEv+27>: mov %eax,(%edx) <_Z13dll_dllcrt0_1Pv+307>: call 0x61020470 <_ZN3dll4initEv> <_ZN7_cygtls5call2EPFmPvS0_ES0_S0_+52>: call *%ebp Continuing twice more: cYgSiGw00f 11 0x20B4 0x28CE8CModLoad: 00000000`76980000 00000000`76985000 C:\Windows\SysWOW64\psapi.dll (1918.20b4): Invalid handle - code c0000008 (first chance) ntdll!KiRaiseUserExceptionDispatcher+0x3a: 00000000`774512f7 8b8424c0000000 mov eax,dword ptr [rsp+0C0h] ss:00000000`0008e310=c0000008 ... (1918.20b4): Invalid handle - code c0000008 (!!! second chance !!!) ntdll32!ZwClose+0x12: 775ff9e2 83c404 add esp,4 Sure enough, cygwin noticed this time: 1 [main] fork 6424 exception::handle: Exception: STATUS_ACCESS_VIOLATION 2997 [main] fork 6424 open_stackdumpfile: Dumping stack trace to fork.exe.stackdump 1 [main] fork 9204 child_info::sync: wait failed, pid 6424, Win32 error 1812 221888192 [main] fork 9204 fork: child -1 - died waiting for longjmp before initialization, retry 10, exit code 0x1000000, errno 11 Parent after fork (child: -1) Parent exiting * * * fork.cpp fini * * * foo.cpp fini * * * bar.cpp fini -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple