X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-4.0 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_SPAMHAUS_DROP,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_NO,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,RCVD_IN_HOSTKARMA_YE,TW_CL X-Spam-Check-By: sourceware.org X-Forefront-Antispam-Report: CIP:157.56.238.5;KIP:(null);UIP:(null);IPV:NLI;H:BY2PRD0512HT004.namprd05.prod.outlook.com;RD:none;EFVD:NLI X-SpamScore: 0 X-BigFish: PS0(zzbb2dI98dI9371I936eI148cI1432Izz1de0h1202h1e76h1d1ah1d2ahzz75dfh8275bh177df4h17326ahb412mz32i2a8h668h839h947hd25he5bhf0ah1288h12a5h12a9h12bdh137ah13b6h1441h1504h1537h153bh162dh1631h1758h1765h1155h) Message-ID: <50E48743.5040401@coverity.com> Date: Wed, 2 Jan 2013 14:15:15 -0500 From: Tom Honermann User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Subject: Re: Intermittent failures retrieving process exit codes - snapshot test requested References: <20121221193620 DOT GA29203 AT ednor DOT casa DOT cgf DOT cx> <50D4E144 DOT 706 AT gmail DOT com> <20121222024943 DOT GA5773 AT ednor DOT casa DOT cgf DOT cx> <20121222031430 DOT GA8355 AT ednor DOT casa DOT cgf DOT cx> <50D57818 DOT 1070706 AT gmail DOT com> <20121222175041 DOT GA14475 AT ednor DOT casa DOT cgf DOT cx> <20121223165621 DOT GA9935 AT ednor DOT casa DOT cgf DOT cx> <50DCB454 DOT 9030400 AT coverity DOT com> <20121229215725 DOT GA18847 AT ednor DOT casa DOT cgf DOT cx> <50E23F98 DOT 1060004 AT coverity DOT com> <20130101053606 DOT GB18911 AT ednor DOT casa DOT cgf DOT cx> In-Reply-To: <20130101053606.GB18911@ednor.casa.cgf.cx> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-OriginatorOrg: coverity.com X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On 01/01/2013 12:36 AM, Christopher Faylor wrote: > On Mon, Dec 31, 2012 at 08:44:56PM -0500, Tom Honermann wrote: >> I'm still seeing hangs in the latest code from CVS. The stack traces >> below are from WinDbg. > > I'm not asking you to build this yourself. I have no way to know how > you are building this. Please just use the snapshots at > > http://cygwin.com/snapshots/ I was building it myself so that I could debug it without having to specify debug source paths and such. I believe my builds are not unconventional. I used options that disabled frame pointer omission so that the resulting binaries could be debugged with non-gcc debuggers. $ mkdir build $ cd build $ ../src/configure \ CFLAGS="-g" \ CXXFLAGS="-g" \ CFLAGS_FOR_TARGET="-g" \ CXXFLAGS_FOR_TARGET="-g" \ --enable-debugging \ --prefix=$HOME/src/cygwin-latest/install -v $ make $ make install >> I manually resolved the symbol references within >> the cygwin1 module using the linker generated .map file. Since the .map >> file does not include static functions, some of these may be incorrect - >> I didn't try and verify or correct for this. > > Thanks for trying, but the output below is garbled and not really > useful. If you are not going to dive in and attempt to fix code > yourself then all we normally need is a simple test case. WinDbg > is not really appropriate for debugging Cygwin applications. The output below is not garbled, but I didn't explain it clearly enough. Lines with frame numbers come directly from WinDbg. Since WinDbg is unable to resolve symbols to gcc generated debug info, the symbol references within the cygwin1 module are incorrect. In those cases, I manually resolved the instruction pointer address using the RetAddr value from the prior frame and searching the linker generated cygwin1.map file. I then pasted the mangled name on a line following the WinDbg line (with the incorrect symbol name) and, if the symbol is a C++ one, the unmangled name on an additional line. For the stack fragment below, address 610f1553 == strtosigno+0x357 == __ZN4muto7acquireEm == muto::acquire(unsigned long). I did not translate offsets for the functions as I resolved them, nor did I try and verify they are correct (ie, that the return address is not for a static function that is not represented in the .map file) >> # ChildEBP RetAddr >> 00 00288bd0 758d0a91 ntdll!ZwWaitForSingleObject+0x15 >> 01 00288c3c 76c11194 KERNELBASE!WaitForSingleObjectEx+0x98 >> 02 00288c54 76c11148 kernel32!WaitForSingleObjectExImplementation+0x75 >> 03 00288c68 610f1553 kernel32!WaitForSingleObject+0x12 >> 04 00288cb8 6118e54d cygwin1!strtosigno+0x357 >> __ZN4muto7acquireEm >> muto::acquire(unsigned long) >> [snip] The reason for using WinDbg is that, from what I understand, gdb is unable to produce accurate stack traces when the call stack includes frames for functions that omit the frame pointer and do not have debug info that gdb can process. I believe many Microsoft provided functions in ntdll, kernel32, kernelbase, etc... do omit the frame pointer and only provide debug info in the PDB format - which gdb is unable to use. Compiling Cygwin without frame pointer omission, and using WinDbg therefore provides the most accurate stack trace. If I am incorrect about any of this, I would very much appreciate a correction and/or explanation. I downloaded the latest snapshot (2012-12-31 18:44:57 UTC) and was able to reproduce several issues which are described below. All of these issues occur when using ctrl-c to interrupt the infinite loop in the test case(s) I've been using to debug inconsistent exit codes. When ctrl-c is pressed, I've observed the following: 1) Programs are (generally) terminated as expected. cmd.exe prompts to "Terminate batch job" as expected. 2) An access violation occurs and a processor context is dumped to the console. I do not yet have stack traces for these cases. 3) One of the processes hangs. access violations occur in ~20% of test runs. Hangs occur in ~5% of test runs. I did not provide a test case previously because I don't have an automated reproducer at present. All sources needed to reproduce the issues are below. The test case uses a .bat file to avoid dependencies on bash so as to minimally isolate the problem. To reproduce the issues, copy test.bat, false-cygwin32.exe, and expect-false-execve-cygwin32.exe to a Cygwin bin directory and run test.bat from a cmd.exe console. Press ctrl-c to interrupt the test. Repeat until problems are observed. I have not been able to reproduce these symptoms when running the test via a MinTTY console. I have been unable to get useful stack traces from hung processes using gdb. gdb reports that the debug information in cygwin1-20130102.dbg.bz2 does not match (CRC mismatch) the cygwin1.dll module in cygwin-inst-20130102.tar.bz2. $ cat expect-false-execve.c #include #include #include #include int main(int argc, char *argv[]) { pid_t child_pid, wait_pid; int result, child_status; if (argc != 2) { fprintf(stderr, "expect-false: Missing or too many arguments\n"); return 127; } child_pid = fork(); if (child_pid == -1) { fprintf(stderr, "expect-false: fork failed. errno=%d\n", errno); return 127; } else if (child_pid == 0) { result = execlp(argv[1], argv[1], NULL); if (result == -1) { fprintf(stderr, "expect-false: execlp failed. errno=%d\n", errno); } _exit(127); } do { wait_pid = waitpid(child_pid, &child_status, 0); } while( (wait_pid == -1 && errno == EINTR) || (wait_pid == child_pid && !(WIFEXITED(child_status) || WIFSIGNALED(child_status))) ); if (wait_pid == -1) { fprintf(stderr, "expect-false: waitpid failed. errno=%d\n", errno); return 127; } if (!WIFEXITED(child_status)) { fprintf(stderr, "expect-false: child process did not exit normally\n"); return 127; } if (WEXITSTATUS(child_status) != 1) { fprintf(stderr, "expect-false: unexpected exit code: %d\n", child_status); } return WEXITSTATUS(child_status); } $ cat false.c #include int main() { printf("myfalse\n"); return 1; } $ cat test.bat @echo off setlocal set PATH=%CD%;%PATH% :loop echo test... expect-false-execve-cygwin32.exe false-cygwin32 if not errorlevel 1 ( echo exiting... exit /B 1 ) goto loop $ gcc -o expect-false-execve-cygwin32.exe expect-false-execve.c $ gcc -o false-cygwin32.exe false.c From a cmd.exe console: (press ctrl-c once the test is running) C:\...\cygwin\bin>test test... myfalse test... myfalse ... Tom. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple