X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:message-id:date:from:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; q=dns; s=default; b=hEnAHm9nn/y6caGRtXtK/CKHkAX0BTJ1xttVyaG8aBZ HGlXxerjouz1XvCS7EUv0YPwEFZPPrhondt84rOpQgaDYBSEwM3BnayMOqjboPwj 5lBqPbQUGn3AlJisT0wHbKbxzppow8Cgaqj0MlNKVgPFg48C6MQzPpL4cdXRNat4 = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:message-id:date:from:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; s=default; bh=cx5kvO9m6GnRzQnyxu3QTH2p9RQ=; b=D+Y06agv7aCRVW4kM LBxrLV4NbutSgGQPejcjCZwi2u4pQfU9lsX0bQv4lhTHr8kHZgWNpSjkIJ51fpNc ZHBk0ojkZ7YPnG0Z7fJXnUh4S5GGmkY9ZUwku0d5r9rEM1Qp4E9BbVIDhldDBeGY A+iLfkwR2qTWwN2TWcLp2D6zFc= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 X-HELO: limerock02.mail.cornell.edu X-CornellRouted: This message has been Routed already. Message-ID: <55A55E83.8010800@cornell.edu> Date: Tue, 14 Jul 2015 15:09:55 -0400 From: Ken Brown User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: [ANNOUNCEMENT] TEST RELEASE: Cygwin 2.1.0-0.1 References: <20150627145259 DOT GB23036 AT calimero DOT vinschen DOT de> <20150630195547 DOT GG2918 AT calimero DOT vinschen DOT de> <5592F86E DOT 8070803 AT cornell DOT edu> <20150701104748 DOT GH2918 AT calimero DOT vinschen DOT de> <20150701135749 DOT GN2918 AT calimero DOT vinschen DOT de> <559449AF DOT 9010804 AT cornell DOT edu> <55949D9A DOT 7060900 AT cornell DOT edu> <20150702121301 DOT GA25423 AT calimero DOT vinschen DOT de> <20150702122047 DOT GS2918 AT calimero DOT vinschen DOT de> <55959036 DOT 8070300 AT cornell DOT edu> <20150703104741 DOT GZ2918 AT calimero DOT vinschen DOT de> <55968996 DOT 9030402 AT cornell DOT edu> In-Reply-To: <55968996.9030402@cornell.edu> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes On 7/3/2015 9:09 AM, Ken Brown wrote: > On 7/3/2015 6:47 AM, Corinna Vinschen wrote: >> On Jul 2 15:25, Ken Brown wrote: >>> On 7/2/2015 8:20 AM, Corinna Vinschen wrote: >>>> On Jul 2 14:13, Corinna Vinschen wrote: >>>>> On Jul 1 22:10, Ken Brown wrote: >>>>>> I may have spoken too soon. As I repeat the experiment on a >>>>>> different >>>>>> computer, with a build from a slightly different snapshot of the >>>>>> emacs >>>>>> trunk, emacs crashes when I type 'C-x d' with the following stack >>>>>> dump: >>>>>> >>>>>> Stack trace: >>>>>> Frame Function Args >>>>>> 00100A3E240 00180071CC3 (00000829630, 000008296D0, 00000000000, >>>>>> 0000082CE00) >>>>>> 00030000002 001800732BE (00000000000, 00000000002, 00100A48C80, >>>>>> 00000000002) >>>>>> 00000000000 00000006B40 (00000000002, 00100A48C80, 00000000002, >>>>>> 00100A48768) >>>>>> 00000000000 21000000003 (00000000002, 00100A48C80, 00000000002, >>>>>> 00100A48768) >>>>>> End of stack trace >>>>>> >>>>>> $ addr2line 00180071CC3 -e /usr/lib/debug/usr/bin/cygwin1.dbg >>>>>> /usr/src/debug/cygwin-2.1.0-0.3/winsup/cygwin/exception.h:175 >>>>>> >>>>>> $ addr2line 001800732BE -e /usr/lib/debug/usr/bin/cygwin1.dbg >>>>>> /usr/src/debug/cygwin-2.1.0-0.3/winsup/cygwin/exceptions.cc:1639 >>>>> >>>>> That points to a crash while setting up the alternate stack. This is >>>>> always a possibility because, in contrast to the kernel signal handler >>>>> in a real POSIX system, the Cygwin exception handler is still >>>>> running on >>>>> the stack which triggered the crash up to the point where we call the >>>>> signal handler function. Dependent on how the stack overflow occured, >>>>> this additional stack usage may be enough to kill the process for >>>>> good. >>>>> >>>>> Out of curiosity, can you add this to the init_sigsegv() function: >>>>> >>>>> #include >>>>> [...] >>>>> init_sigsegv (void) >>>>> { >>>>> [...] >>>>> SetThreadStackGuarantee (65536); >>>> >>>> Of course this only works "per thread", so if init_sigsegv is called >>>> for the main thread, only the main thread gets this treatment. For >>>> testing this should be enough, though. >>> >>> That didn't make any difference. >> >> It should have. If you don't also tweak STACK_DANGER_ZONE accordingly, >> handle_sigsegv should fail to call siglongjmp. Either way, I tested >> it locally as well, and it doesn't work. >> >> In the meantime I found that there's another problem. Assuming you >> longjmp out of handle_sigsegv, the stack will still be "broken". >> It doesn't have the usual guard pages anymore, and the next time >> you have a stack overflow, NTDLL will simply terminate the process. >> >> I create a wrapper function which resets the stack so it has valid guard >> pages again and then the stack overflow can be handled repeatedly. >> >> While I was at it, I found that the setup for pthread stacks is not >> quite right, either, so right now I'm hacking on this stuff to make >> it behave as expected in the usual cases. >> >>> But I do have a little more information. >>> I tried running emacs under gdb with a breakpoint at handle_sigsegv. >>> The >>> breakpoint is hit when I deliberately trigger the stack overflow. >>> Then I >>> continue, emacs says it has recovered from the stack overflow, and I >>> type >>> 'C-x d'. At this point there's a second SIGSEGV and handle_sigsegv is >>> called again. But this time garbage collection is in progress, and >>> handle_sigsegv just gives up. >> >> Sounds right to me. >> >>> I don't know what caused the second SIGSEGV but I'll try to figure >>> that out >>> when I next have a chance to look at this. I also don't know why the >>> stack >>> dump pointed to a crash while setting up the alternate stack, since the >>> fatal crash actually seems to have happened later. But maybe the >>> stack was >>> just completely messed up after the second SIGSEGV and the stack dump >>> can't >>> be trusted. > > I think I found the cause of that second SIGSEGV, and, if I'm right, it > has nothing to do with Cygwin. I think the problem was that in my > testing, I forgot to reset max-specpdl-size and max-lisp-eval-depth to > reasonable values after the recovery from stack overflow. If I do that, > then I can no longer reproduce the crash. Just for the sake of the archives, it turned out that I could reproduce that second crash after all. But it was an emacs bug, which has now been fixed: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=20996 So there are no loose ends; everything I know how to test involving the alternate stack works. Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple