Mail Archives: cygwin/2009/10/16/23:41:06
--------------070308000803040805050300
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Charles Wilson wrote:
> 1) look at the parent's stack trace when it is inside fork(). Ditto
> its return address.
> 2) after the child longjmp's back into fork() from dll_crt0_1,
> look at its stack trace and return address. (although I can't
> really catch it that early. I can only catch it in the debugger
> just after the CYGWIN_FORK_SLEEP...but at least I'm still
> back inside fork() at that point.
>
> They ought to match in all respects, correct?
I think...maybe...I've located the primary symptom. I don't yet know
the cause, so I don't have fix. But...
The stack traces are "close", but do not match exactly. Here's an
interleaved version, "P" for parent, "C" for child (and edited for
formatting, and to redact private data):
A: I /think/ this is the import thunk for fork, rather than
the /actual/ _fork in the cygwin1.dll. Blame mingw gdb.
B: where I set the breakpoint in child. So that's reassuring.
P #0 0x00439bb4 in fork () [[[[ B ]]]]
C #0 fork () at fork.cc:617 [[[[ A ]]]]
P #1 0x0043843f in pipe_connect_unix (ctx=0x7ff8fa30,
name=0x137ad68 "/usr/bin/pinentry-gtk-2.exe",
argv=0x7ff8fa1c, fd_child_list=0x7ff8fa10, atfork=0x40c803
<atfork_cb>, atforkvalue=0x136b948, flags=128)
at assuan-pipe-connect.c:234
C #1 0x610cd03c in _sigfe () from /usr/bin/cygwin1.dll
C #2 0x0137ad68 in ?? ()
C #3 0x00001120 in ?? ()
C #4 0x00000019 in ?? ()
P #2 0x00438f56 in assuan_pipe_connect_ext (ctx=0x7ff8fa30,
name=0x137ad68 "/usr/bin/pinentry-gtk-2.exe",
argv=0x7ff8fa1c, fd_child_list=0x7ff8fa10, atfork=0x40c803
<atfork_cb>, atforkvalue=0x136b948, flags=128)
at assuan-pipe-connect.c:920
C #5 0x00438f56 in assuan_pipe_connect_ext (ctx=0x7ff8fa30,
name=0x137ad68 "/usr/bin/pinentry-gtk-2.exe",
argv=0x7ff8fa1c, fd_child_list=0x7ff8fa10, atfork=0x40c803
<atfork_cb>, atforkvalue=0x136b948, flags=128)
at assuan-pipe-connect.c:920
P #3 0x0040cbd0 in start_pinentry (ctrl=0x136b948)
at agent/call-pinentry.c:316
C #6 0x0040cbd0 in start_pinentry (ctrl=0x136b948)
at agent/call-pinentry.c:316
P #4 0x0040e013 in agent_get_passphrase (ctrl=0x136b948,
retpass=0x7ff8fec0,
desc=0x136bae9 "Please enter the passphrase to unlock the
secret key for the OpenPGP certificate:REDACTED"...,
prompt=0x0,errtext=0x0, with_qualitybar=0)
at agent/call-pinentry.c:809
C #7 0x0040e013 in agent_get_passphrase (ctrl=0x136b948,
retpass=0x7ff8fec0,
desc=0x136bae9 "Please enter the passphrase to unlock the
secret key for the OpenPGP certificate:REDACTED"...,
prompt=0x0, errtext=0x0, with_qualitybar=0)
at agent/call-pinentry.c:809
P #5 0x00406f15 in cmd_get_passphrase (ctx=0x136ba50,
line=0x136babc "REDACTED")
at agent/command.c:1111
C #8 0x00406f15 in cmd_get_passphrase (ctx=0x136ba50,
line=0x136babc "REDACTED")
at agent/command.c:1111
P #6 0x00435115 in dispatch_command (ctx=0x136ba50,
line=0x136baa7 "--data --repeat=0 -- REDACTED", linelen=282)
at assuan-handler.c:497
C #9 0x00435115 in dispatch_command (ctx=0x136ba50,
line=0x136baa7 "--data --repeat=0 -- REDACTED", linelen=282)
at assuan-handler.c:497
P #7 0x004355a9 in process_request (ctx=0x136ba50)
at assuan-handler.c:720
C #10 0x004355a9 in process_request (ctx=0x136ba50)
at assuan-handler.c:720
P #8 0x004355d7 in assuan_process (ctx=0x136ba50)
at assuan-handler.c:742
C #11 0x004355d7 in assuan_process (ctx=0x136ba50)
at assuan-handler.c:742
P #9 0x00408519 in start_command_handler (ctrl=0x136b948,
listen_fd=-1, fd=7)
at agent/command.c:1944
C #12 0x00408519 in start_command_handler (ctrl=0x136b948,
listen_fd=-1, fd=7)
at agent/command.c:1944
P #10 0x004041a2 in start_connection_thread (arg=0x136b948)
at agent/gpg-agent.c:1755
C #13 0x004041a2 in start_connection_thread (arg=0x136b948)
at agent/gpg-agent.c:1755
P #11 0x6fa44bba in ?? ()
C #14 0x6fa44bba in pth_exit () from /usr/bin/cygpth-20.dll
P #12 0x0136b948 in ?? ()
C #15 0x00000000 in ?? ()
P #13 0x00000202 in ?? ()
P #14 0x0022c9e8 in ?? ()
The register dump and $esp inspection for both parent and child are in
the attached file, for the truly interested.
So, what's the smoking gun? This, in the child:
(gdb) p *(child_info_fork *)child_proc_info
$5 = {<child_info> = {msv_count = 0, cb = 288, intro = 2936076800,
magic = 3897586042, type = 2, subproc_ready = 0x230, user_h = 0xb4,
parent = 0x234, cygheap = 0x61245650, cygheap_max = 0x6124c1ac,
cygheap_reserve_sz = 0, flag = 2 '\002', fhandler_union_cb = 464,
retry = 10, exit_code = 0, static retry_count = 10},
forker_finished = 0x238, stacksize = 0, jmp = {1, 22793784, 0, 0, 7,
4210969, 2147022580, 2147022140, 1627853804, 3866659, 2293760,
2281064, 2284368, 0 <repeats 39 times>}, stacktop = 0x7ff8f53c,
stackbottom = 0x230000, filler = "\000\000\000"}
fork_info->stacksize is zero. According to fork.cc in frok::child:
/* If we've played with the stack, stacksize != 0. That means that
fork() was invoked from other than the main thread. Make sure that
the threadinfo information is properly set up. */
if (fork_info->stacksize)
{
_main_tls = &_my_tls;
_main_tls->init_thread (NULL, NULL);
_main_tls->local_clib = *_impure_ptr;
_impure_ptr = &_main_tls->local_clib;
}
So, since in this child, we have stacksize == 0, that must mean that
fork() was invoked from the main thread (of the parent, since a child
/never/ "invokes" fork. It can't; parents invoke fork). Right?
Wrong: look at frame #10(parent)/#13(child)
0x004041a2 in start_connection_thread (arg=0x136b948)
at agent/gpg-agent.c:1755
Here's the start of that function:
/* This is the standard connection thread's main function. */
static void *
start_connection_thread (void *arg)
{
...
}
It is launched inside:
/* Connection handler loop. Wait for connection requests and spawn a
thread after accepting a connection. */
static void
handle_connections (gnupg_fd_t listen_fd, gnupg_fd_t listen_fd_ssh)
{
...
for (;;)
{
...
/* We now might create new threads and because we don't want any
signals (as we are handling them here) to be delivered to a
new thread. Thus we need to block those signals. */
pth_sigmask (SIG_BLOCK, &sigs, NULL);
if (!shutdown_pending && FD_ISSET (FD2INT (listen_fd),
&read_fdset))
{
>>> HERE <<< if (!pth_spawn (tattr, start_connection_thread, ctrl))
{
log_error ("error spawning connection handler: %s\n",
strerror (errno) );
assuan_sock_close (fd);
xfree (ctrl);
}
}
...
}
...
}
So, we have a thread, which is not the main thread, which calls fork.
Yet, when the forkee (child) checks that, it thinks that it WAS launched
from the main thread, and does not do the main_tls/_my_tls fixup stuff
for the stack.
Right now the "CYGWIN_FORK_SLEEP" check comes AFTER the main_tls fixup
stuff, so there's another cygwin kernel rebuild in my future, if I want
to step into the child early enough to look at the main_tls fixup stuff.
However, even if I poke at fork_info->stacksize and make it non-zero, so
that the main_tls/my_tls fixup happens --- I still need to figure out
exactly WHAT non-zero value it should have. I suspect that's almost as
hard as figuring out why cygwin didn't do it in the first place -- and
if I knew that, I'd just fix it there, rather than poking at it manually
using gdb...
I have a hunch that an STC (okay, less-hellaciously-complicated test
case) could be developed, using just GNU pth and avoiding all the
libassuan/gnupg gobbledygook.
Anyway, I'm going to be AFK (or, at least AF-This-Problem) for the weekend.
I've attached the entire log of relevant data from parent and child,
gzipped.
--
Chuck
--------------070308000803040805050300
Content-Type: application/gzip;
name="fork-data.tar.gz"
Content-Transfer-Encoding: base64
Content-Disposition: inline;
filename="fork-data.tar.gz"
H4sIAIBwb0UAA+1ZbW/cNhL2Z/0Kwm5RO9kXvomSDDtF0SZFv6RGkftwuDsI
FEXt6qzVqpLWXrfIf78h9bLaN7RB7MQ4mHCw4gxn+HAoDudRClnqvB7HspaT
el2fPEXD0DyMzS/xXGJ/4c/KoQlX8BNCqBCUYY/TE0wYcdkJwk+CZqetqlqW
CJ2o+zSrlvnRce+Xuf4SeL5wO8MI4TXGnAVRxFGao2RZ3qLzC+eMdBqfs8Ro
irTQoVrmuVZ1uMrTNTpX9foar70k8RPJ8AjlcqFBQJgnY+Gj0+mqKqdRmk+L
NIf3rHwYz+rbMZ3otT4dIVnO7npzokYOgpbEoZqnWRxmaVVvtOBc1gYbiDhW
PmboqhGEKnrTKe9ktmoAiCjg/gglmZxV14T6F9a5rJGFlC2VzKZVqaZZGsmq
Wsl88zQmEzxxx2Rb30itrBWYeIzbeEzUJWXcOaN90BJXmKA1Y8Ot2Ol1/RK6
YegCip0z1oYOqyjGJnTmZNZht3wTsTLbADyGapavihltfyBeeEJYC8jKWtFU
zsDtFKyycTcFIGFEOGe8Q6IhFdlNNGPDmQY0sIBiXspK7+IZoVLXRt0FXivc
bEusK9WOkzpApzeZNvbgUpeonms0cFov0SqHxdxaRaUVOEW3+sEcSyv6tdD5
zc83SOmyTpNUyVpf/vb2px9+/PD2p9PJZDJCRblcFGb7Yd91WdbwttmOxXKf
1vPw95XM0vohkuU1vnj0EPo4cM7cLoQiIa4JoVrEBwK47uLiAtgMfHT9SKHT
flnOmeiOlUsad3FaFbJWczhTi4XM411ndrFDh9JDp2N7z6HxuNSFlhAVeET9
LA2ATOfX1KdP8M7PAWamS4gQDzznzOuX5MrAptdyqXRVhaX+faWremdFTwrI
M+fP3wCKvWHqanB9UTycOmdB9wr5Lgk2+aDd8LAdvX8MTfLTeZjE12MCWSy+
9h4pVTQTAzwScEj1BLf4OJF0iM9mtXSZh/W81BJeTUjWj523ZiCwTwaO57oA
hwAckUjOo0gaON9/31zj1MBsJx/KmYUPjWI6lHMrp1QFejDeOZ/F0QX0kyUk
upmJcVk5Wq7RoG3us1ZACfcwXIzYd7TaHQo1oS8S0QpIEAiXMupyR8e7Q5sH
z9HRjgbWpTTzOwHFTPiBSx1dFQeAicTfE4DPQ0MDn+8JwGt6DFi8ozGvBby2
nBIciMDR6c4kbcHVP1yZO/iNo+2tOxhHiQ3Qv9DNO/TDO/TLO/QfR1Vo2xmJ
mtV7TrWrosw+MNeJj6v0cVWyq2LNXG7gzHZV7a7j7mVZTyle36NvYDOcTcAv
u9Fdadn37cvobfqcM0FU329Lo86VhweugE8QirddQfgHU2lM3a7f2AvVuyJb
rkzbHso4OqI3J0t6DPeu6FFXf7ffu2JbsdpD5ePjqJoWDE9t9bCIltkg8M5+
Qf8acWHv16pJYWhiygfneMHbe/PR1Z6719z13zRLWCzvzM/5t5AxLkbfwjne
mMqDpgK3pqa+QGjvpAxeoIPm7vbMZuIRXiuqzs3sF709dw/ae2TbHq99Y1gY
8HIDnh9et8ePrBtMna/N+YbN8oWnpf9/xf89xsiG/wPxx4R6HL/w/y/RDP9v
GX9fk5iSI9Z3Opve6hIqYiu4T/NqVUzVwwyepsZkotSlIF77oUAQrGLMlEke
YZXOEm1cJsBFUM9kG2MyibOsY8od3d3UH2wroW/kfCuxb+Ruf5d8KuO2hdjn
0W70Oby7LQSfBfnuCNbXJ98dL/p88o2+Nvu28w8p+AH6/SSBtBS843OfQsH3
aPMWD390HgVtw/P+NrffR/nsyH3HDZ8Nubfs8BmR+4aVPmNyz54Xuec75L6A
PKLXaX3wkgXlmOLmmoUjNaAHn0rne0b3ROzdjbbYO/UC5lmefYiSu0ztCY6w
d5HwPcFjsneodjzCudwXWHbymtPg/5/LQ/g7fioIUbGKjlJbQSiPYrLHV3tX
fODKDKXqmKu/Ys2uq3YIeD9r0rbeFN6/2NQ7w/Ek6L8wuGLXFdle0B7KgZ4E
HtO8d+X1rpp+oLZdDb4wEFfZWrTXY0YSQZ2DZL6LvAMlwPI+XEKVUlZhvQyr
h1ydX1xehuHNb28/fPhn+O4f73/88Muv78MQaD5hrr/F80tze7YzwAUMWVsv
okyjV4Mp3i9RssobA8iEtQROgKpCKyiFdIxkHJdwl0w6oAV6dd7UxQZyaGnG
q4tGYu4dK3a+cdE1+vNqM/CN6S+qO0i2q7yGDtz4KoJf6sMFkEKJszSdgAns
CR939cBCzlIFCuYHnusLzOkI1Q+FNmNHqFpFdkqTsx+MTzgWYLmqdBnObT/i
raPC/pd4OwaEkFDnWhZWYPbKFaYGaaXhAlJmp1FEdpSgU0NAdHmnw+qPZiEm
JRhE6Lt/Y0y/A0F71ZkPB3Cr2IVy0WGB+rM0cA2XMOkeYhLrxhNcSDUs2I7o
Q0XwR2tpYg0+kzRPqzlsTbMY31qp2yr9o3Xy34VZ159wlXa5d2TE8Oe1CNoU
OGo/5VLXx/0z4fBMBPV8l/kYTJkvhHDNYNgeT5iR1Ce4Xw70OBMAA6Orpkyr
EAtQnS509eZji65eNrHucoy1tZpoWddw1bW7h83WJ2lm6oRrdArxxN2/04/P
6iPLS3tpL+2lvbSX9sza/wDW52qMACgAAA==
--------------070308000803040805050300
Content-Type: text/plain; charset=us-ascii
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
--------------070308000803040805050300--
- Raw text -