delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin-developers/2002/07/19/18:37:19

Mailing-List: contact cygwin-developers-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-developers-subscribe AT cygwin DOT com>
List-Archive: <http://sources.redhat.com/ml/cygwin-developers/>
List-Post: <mailto:cygwin-developers AT cygwin DOT com>
List-Help: <mailto:cygwin-developers-help AT cygwin DOT com>, <http://sources.redhat.com/ml/#faqs>
Sender: cygwin-developers-owner AT cygwin DOT com
Delivered-To: mailing list cygwin-developers AT cygwin DOT com
Message-ID: <3D38949C.3090200@hekimian.com>
Date: Fri, 19 Jul 2002 18:37:16 -0400
X-Sybari-Trust: f522ac42 b923d9bf 0879ee9b 00000109
From: Joe Buehler <jbuehler AT hekimian DOT com>
Reply-To: joseph DOT buehler AT spirentcom DOT com
Organization: Spirent Communications
User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:1.0.0) Gecko/20020530
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: cygwin-developers AT cygwin DOT com
Subject: Re: cygwin hang problem
References: <3D32FC00 DOT 5090108 AT hekimian DOT com> <20020719050925 DOT GA24259 AT redhat DOT com> <3D37F0E5 DOT 50F3669B AT yahoo DOT com> <20020719141242 DOT GB27697 AT redhat DOT com>

I have "hang" problems, and I have core dumps.
It's very inconsistent -- I have a dual-processor NT machine that
has been running continuous builds for about 3 days without stopping.
I have a single-processor XP machine that ran for less than a day
and hung with a core dump.

I spent the last few hours examining that core dump
(generated using dumper.exe) and it appears that the
cygwin dll jumped to never-never land after calling CopySid() in
cygsid::assign() in security.h.  Here's the trace from
gdb for the thread that caused the dump (I modified handle_exceptions
to wait for dumper to do the dump):

#0  0x77e72e9f in _libkernel32_a_iname ()
#1  0x61013ef5 in try_to_debug (waitloop=true)
     at /usr/local/cygwin-src/src/winsup/cygwin/exceptions.cc:396
#2  0x6101453e in handle_exceptions (e=0x72f6f8, in=0x72f714)
     at /usr/local/cygwin-src/src/winsup/cygwin/exceptions.cc:537
#3  0x77f833a0 in _libkernel32_a_iname ()
#4  0x77f83372 in _libkernel32_a_iname ()
#5  0x77f510a6 in _libkernel32_a_iname ()
#6  0x610d8b8c in cygsid::operator= (this=0x72fa5c, nsid=0x61610294)
     at /usr/local/cygwin-src/src/winsup/cygwin/security.h:47
#7  0x61070c37 in __sec_user (sa_buf=0x72fae8, sid2=0x0, inherit=0)
     at /usr/local/cygwin-src/src/winsup/cygwin/sec_helper.cc:473
#8  0x610db98c in sec_user_nih (sa_buf=0x72fae8 "", sid=0x0)
     at /usr/local/cygwin-src/src/winsup/cygwin/security.h:214
#9  0x61082da5 in getsem (p=0x0,
     str=0x610eb248 "cygwin1S3-2002-07-11 10:28.sigcatch.23002002-07-11 10:28", init=0,
     max=2147483647) at /usr/local/cygwin-src/src/winsup/cygwin/sigproc.cc:948
#10 0x6108352b in wait_sig () at /usr/local/cygwin-src/src/winsup/cygwin/sigproc.cc:1091
#11 0x61007961 in thread_stub (arg=0x610e23a0)
     at /usr/local/cygwin-src/src/winsup/cygwin/debug.cc:98
#12 0x77e802ed in _libkernel32_a_iname ()

Note that the "str" argument in frame 9 is not correct...

Here is a trace for the main thread:

#0  0x7ffe0304 in ?? ()
#1  0x77e79d6a in _libkernel32_a_iname ()
#2  0x610a000f in wait4 (intpid=-1, status=0x22cdbc, options=2, r=0x0)
     at /usr/local/cygwin-src/src/winsup/cygwin/wait.cc:86
#3  0x6109fd3d in waitpid (intpid=-1, status=0x22cdbc, options=2)
     at /usr/local/cygwin-src/src/winsup/cygwin/wait.cc:32
#4  0x00419148 in job_waitsafe (sig=0) at /usr/local/ast-src/src/cmd/ksh93/sh/jobs.c:201
#5  0x0041abcb in job_wait (pid=3568) at /usr/local/ast-src/src/cmd/ksh93/sh/jobs.c:1215
#6  0x00413c7d in sh_exec (t=0xa05e4d8, flags=0) at /usr/local/ast-src/src/cmd/ksh93/sh/xec.c:709
#7  0x004144f7 in sh_exec (t=0xa05e528, flags=0) at /usr/local/ast-src/src/cmd/ksh93/sh/xec.c:952
#8  0x004144db in sh_exec (t=0xa05e590, flags=0) at /usr/local/ast-src/src/cmd/ksh93/sh/xec.c:951
#9  0x00414f8e in sh_exec (t=0xa05e3e8, flags=4)
     at /usr/local/ast-src/src/cmd/ksh93/sh/xec.c:1193
#10 0x0041447f in sh_exec (t=0xa05e910, flags=4) at /usr/local/ast-src/src/cmd/ksh93/sh/xec.c:940
#11 0x00414faa in sh_exec (t=0xa05e298, flags=4)
     at /usr/local/ast-src/src/cmd/ksh93/sh/xec.c:1194
#12 0x0041447f in sh_exec (t=0xa05e998, flags=5) at /usr/local/ast-src/src/cmd/ksh93/sh/xec.c:940
#13 0x004140cc in sh_exec (t=0xa05ee70, flags=5) at /usr/local/ast-src/src/cmd/ksh93/sh/xec.c:833
#14 0x00413ef0 in sh_exec (t=0xa05ee80, flags=4) at /usr/local/ast-src/src/cmd/ksh93/sh/xec.c:779
#15 0x00401f7f in exfile (iop=0xa053950, fno=6) at /usr/local/ast-src/src/cmd/ksh93/sh/main.c:520
#16 0x00401810 in sh_main (ac=2, av=0x616107ac, userinit=0)
     at /usr/local/ast-src/src/cmd/ksh93/sh/main.c:318
#17 0x0040106a in main (argc=2, argv=0x616107ac)
     at /usr/local/ast-src/src/cmd/ksh93/sh/pmain.c:33
#18 0x610065b0 in dll_crt0_1 () at /usr/local/cygwin-src/src/winsup/cygwin/dcrt0.cc:774
#19 0x61006a59 in _dll_crt0 () at /usr/local/cygwin-src/src/winsup/cygwin/dcrt0.cc:872
#20 0x61006ab1 in dll_crt0 (uptr=0x0) at /usr/local/cygwin-src/src/winsup/cygwin/dcrt0.cc:885
#21 0x0045c44e in cygwin_crt0 ()
#22 0x0040103c in mainCRTStartup ()
#23 0x77e7eb69 in _libkernel32_a_iname ()

It is difficult to tell exactly what happened -- it looks like
the CopySid call did not return -- based on
the stack it looks like something may have gone wrong with
the DLL linkage code that loads advapi32 and calls the real
CopySid.  It did not get to the point where it overwrites the
original mov, call instruction sequence in the DLL linkage code.

One interesting point I haven't figured out yet is that
the exception address passed to Cygwin's exception handler
is almost exactly 2x (as in left shift 1) the address of
the _win32_CopySid AT 12 code.  I checked the IA32 exception handler
stack format and it looks like the exception that NT got was
due to a jump to the weeds -- the EIP pushed on the stack is the
same as the exception address passed to the Cygwin
exception handler.

I notice a comment in the source about replacing CopySid with
memcpy.  Does anyone remember why this was done?  Is there something
flaky about CopySid?

Something else I wonder about -- wait_sig() is still setting up,
and the main thread is in waitpid() -- perhaps a signal came
in while the signal handler is still setting up?  I haven't looked
at that stuff and don't know how it works.

Sorry if I am missing anything obvious -- I am learning Cygwin
internals as I go, and this is a very knotty problem.

Enough for now, time to go home...

Joe Buehler

- Raw text -


  webmaster     delorie software   privacy  
  Copyright 2019   by DJ Delorie     Updated Jul 2019