Mail Archives: cygwin/2015/07/14/15:10:18
On 7/3/2015 9:09 AM, Ken Brown wrote:
> On 7/3/2015 6:47 AM, Corinna Vinschen wrote:
>> On Jul 2 15:25, Ken Brown wrote:
>>> On 7/2/2015 8:20 AM, Corinna Vinschen wrote:
>>>> On Jul 2 14:13, Corinna Vinschen wrote:
>>>>> On Jul 1 22:10, Ken Brown wrote:
>>>>>> I may have spoken too soon. As I repeat the experiment on a
>>>>>> different
>>>>>> computer, with a build from a slightly different snapshot of the
>>>>>> emacs
>>>>>> trunk, emacs crashes when I type 'C-x d' with the following stack
>>>>>> dump:
>>>>>>
>>>>>> Stack trace:
>>>>>> Frame Function Args
>>>>>> 00100A3E240 00180071CC3 (00000829630, 000008296D0, 00000000000,
>>>>>> 0000082CE00)
>>>>>> 00030000002 001800732BE (00000000000, 00000000002, 00100A48C80,
>>>>>> 00000000002)
>>>>>> 00000000000 00000006B40 (00000000002, 00100A48C80, 00000000002,
>>>>>> 00100A48768)
>>>>>> 00000000000 21000000003 (00000000002, 00100A48C80, 00000000002,
>>>>>> 00100A48768)
>>>>>> End of stack trace
>>>>>>
>>>>>> $ addr2line 00180071CC3 -e /usr/lib/debug/usr/bin/cygwin1.dbg
>>>>>> /usr/src/debug/cygwin-2.1.0-0.3/winsup/cygwin/exception.h:175
>>>>>>
>>>>>> $ addr2line 001800732BE -e /usr/lib/debug/usr/bin/cygwin1.dbg
>>>>>> /usr/src/debug/cygwin-2.1.0-0.3/winsup/cygwin/exceptions.cc:1639
>>>>>
>>>>> That points to a crash while setting up the alternate stack. This is
>>>>> always a possibility because, in contrast to the kernel signal handler
>>>>> in a real POSIX system, the Cygwin exception handler is still
>>>>> running on
>>>>> the stack which triggered the crash up to the point where we call the
>>>>> signal handler function. Dependent on how the stack overflow occured,
>>>>> this additional stack usage may be enough to kill the process for
>>>>> good.
>>>>>
>>>>> Out of curiosity, can you add this to the init_sigsegv() function:
>>>>>
>>>>> #include <windows.h>
>>>>> [...]
>>>>> init_sigsegv (void)
>>>>> {
>>>>> [...]
>>>>> SetThreadStackGuarantee (65536);
>>>>
>>>> Of course this only works "per thread", so if init_sigsegv is called
>>>> for the main thread, only the main thread gets this treatment. For
>>>> testing this should be enough, though.
>>>
>>> That didn't make any difference.
>>
>> It should have. If you don't also tweak STACK_DANGER_ZONE accordingly,
>> handle_sigsegv should fail to call siglongjmp. Either way, I tested
>> it locally as well, and it doesn't work.
>>
>> In the meantime I found that there's another problem. Assuming you
>> longjmp out of handle_sigsegv, the stack will still be "broken".
>> It doesn't have the usual guard pages anymore, and the next time
>> you have a stack overflow, NTDLL will simply terminate the process.
>>
>> I create a wrapper function which resets the stack so it has valid guard
>> pages again and then the stack overflow can be handled repeatedly.
>>
>> While I was at it, I found that the setup for pthread stacks is not
>> quite right, either, so right now I'm hacking on this stuff to make
>> it behave as expected in the usual cases.
>>
>>> But I do have a little more information.
>>> I tried running emacs under gdb with a breakpoint at handle_sigsegv.
>>> The
>>> breakpoint is hit when I deliberately trigger the stack overflow.
>>> Then I
>>> continue, emacs says it has recovered from the stack overflow, and I
>>> type
>>> 'C-x d'. At this point there's a second SIGSEGV and handle_sigsegv is
>>> called again. But this time garbage collection is in progress, and
>>> handle_sigsegv just gives up.
>>
>> Sounds right to me.
>>
>>> I don't know what caused the second SIGSEGV but I'll try to figure
>>> that out
>>> when I next have a chance to look at this. I also don't know why the
>>> stack
>>> dump pointed to a crash while setting up the alternate stack, since the
>>> fatal crash actually seems to have happened later. But maybe the
>>> stack was
>>> just completely messed up after the second SIGSEGV and the stack dump
>>> can't
>>> be trusted.
>
> I think I found the cause of that second SIGSEGV, and, if I'm right, it
> has nothing to do with Cygwin. I think the problem was that in my
> testing, I forgot to reset max-specpdl-size and max-lisp-eval-depth to
> reasonable values after the recovery from stack overflow. If I do that,
> then I can no longer reproduce the crash.
Just for the sake of the archives, it turned out that I could reproduce
that second crash after all. But it was an emacs bug, which has now
been fixed:
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=20996
So there are no loose ends; everything I know how to test involving the
alternate stack works.
Ken
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
- Raw text -