X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=0.3 required=5.0 tests=AWL,BAYES_20,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,TW_TM,TW_YG X-Spam-Check-By: sourceware.org Message-ID: <4BF2A0A9.5070201@gmail.com> Date: Tue, 18 May 2010 15:14:01 +0100 From: Dave Korn User-Agent: Thunderbird 2.0.0.17 (Windows/20080914) MIME-Version: 1.0 To: Cygwin Mailing List Subject: pty infinite master control thread spawning problem Content-Type: multipart/mixed; boundary="------------080809090508040307010100" X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com --------------080809090508040307010100 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hi, I'm having trouble with the latest changes to the pty control code. In my case they manifest when I run parallel "make -j check" on GCC; after a minute or two all the expect processes end up spinning CPU and everything grinds to a halt. When I attach to and debug one of the stuck expect processes, I see 35 threads: - main thread stuck in a TransactNamedPipe call that never completes: > Thread 1 (thread 608.0xd38): > #0 0x77f884ff in ntdll!ZwFsControlFile () > from /win/c/WINNT/system32/ntdll.dll > #1 0x7c592692 in TransactNamedPipe () from /win/c/WINNT/system32/KERNEL32.dll > #2 0x7c5927f0 in KERNEL32!CallNamedPipeW () > from /win/c/WINNT/system32/KERNEL32.dll > #3 0x7c592754 in KERNEL32!CallNamedPipeA () > from /win/c/WINNT/system32/KERNEL32.dll > #4 0x61061564 in fhandler_pty_master::close (this=0x1) > at /gnu/winsup/src/winsup/cygwin/fhandler_tty.cc:1432 > #5 0x610dda7a in close (fd=6) > at /gnu/winsup/src/winsup/cygwin/syscalls.cc:1140 > #6 0x610c3a3a in _sigfe () from /usr/bin/cygwin1.dll - signal thread happily waiting for a signal: > Thread 2 (thread 608.0xbe0): > #0 0x77f88a87 in ntdll!ZwReadFile () from /win/c/WINNT/system32/ntdll.dll > #1 0x7c586381 in ReadFile () from /win/c/WINNT/system32/KERNEL32.dll > #2 0x610ca395 in wait_sig () at /gnu/winsup/src/winsup/cygwin/sigproc.cc:1194 - 32 threads all stuck in WFSO with no useful backtrace, and one that is frantically looping in cygthread::callfunc() at this point: > if (issimplestub) > { > /* Wait for main thread to assign 'h' */ > while (!h) > yield (); ... with "h" never getting set. The attached testcase demonstrates the underlying problem. It appears that a fresh cygthread is getting created to run the pty_master_thread function every time we open a new pty master, and for some reason they're never getting recycled. Then when the cygthread array fills up and we fall back to the simplestub mechanism, things go really wrong, and the main thread hangs up in a call to fhandler_pty_master::close that never completes while the latest spawned cygthread frantically loops waiting for something that will never happen. Compile with "gcc-4 t1.c -o t1", possibly adding "-DFAST" (or "-DRAPID") if you want to get it to the stuck state quickly. After 31 times round the loop, everything locks up, wheels spinning. And it can't be Ctrl-C'd at that point any more, nor even "kill -9"ed. Killing it in windows task manager, or attaching gdb to it and using gdb's kill command both work, and Ctrl-C works if you started it under gdb, just not when you start it from bash. Anyone got any ideas where I should be looking next? cheers, DaveK --------------080809090508040307010100 Content-Type: text/plain; name="t1.c" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="t1.c" #define _XOPEN_SOURCE #include #include #include #include #include #ifdef RAPID #include #endif #ifdef FAST #define sleep(x) #elif defined (RAPID) #define sleep(x) Sleep(x*50); #endif int main (int argc, const char **argv) { int nruns = 100; int i; if (argc > 1) nruns = atoi (argv[1]); printf ("running %d loops...\n", nruns); for (i = 0; i < nruns; i++) { int master_fd, rv; pid_t child_pid; char *devname; printf ("%d:start\n", i); fflush (0); sleep (1); master_fd = posix_openpt (O_RDWR|O_NOCTTY); printf ("%d:open ptmx %d\n", i, master_fd); if (master_fd < 0) { perror ("opening master\n"); fflush (0); continue; } rv = grantpt (master_fd); if (rv < 0) { perror ("grantpt error\n"); fflush (0); close (master_fd); continue; } rv = unlockpt (master_fd); if (rv < 0) { perror ("unlockpt error\n"); fflush (0); close (master_fd); continue; } devname = ptsname (master_fd); if (!devname) { perror ("ptsname error\n"); fflush (0); close (master_fd); continue; } printf ("%d: got slave %s\n", i, devname); fflush (0); sleep (1); printf ("%d: forking...", i); fflush (0); child_pid = fork (); if (child_pid < 0) { perror ("ptsname error\n"); fflush (0); continue; } else if (child_pid == 0) { /*exit (0);*/ int slave_fd, n; char intext[1024]; char outtext[1024]; char *msg; printf ("%d# in child\n", i); fflush (0); sleep (1); close (master_fd); printf ("%d# child closed master\n", i); sleep (1); slave_fd = open (devname, O_RDWR); if (slave_fd < 0) { perror ("opening slave"); exit (0); } msg = intext; *msg = 0; while (msg - intext < 1023) { n = read (slave_fd, msg, 1); if (n < 0) break; else if (n == 1) { if (*msg == 0x0a) break; else msg++; } } *msg = 0; printf ("%d# got input '%s'\n", i, intext); fflush (0); n = sprintf (outtext, "Child %d rx '%s'\n", i, intext); printf ("%d# sending %d bytes\n", i, n); fflush (0); n = write (slave_fd, outtext, n); printf ("%d# write %d bytes\n", i, n); fflush (0); sleep (3); close (slave_fd); printf ("%d# child closed slave\n", i); sleep (1); printf ("%d# child exiting\n", i); exit (0); } else { char output[1024]; char reply[1024]; char *ptr; int n2; char once = 0; printf ("%d: parent, child pid %d\n", i, (int) child_pid); fflush (0); sleep (4); n2 = sprintf (output, "From master %d to child %d\n", i, (int) child_pid); printf ("%d# sending %d bytes\n", i, n2); fflush (0); n2 = write (master_fd, output, n2); printf ("%d# wrote %d bytes\n", i, n2); fflush (0); sleep (4); ptr = reply; *ptr = 0; while (ptr - reply < 1023) { n2 = read (master_fd, ptr, 1); if (n2 < 0) break; else if (n2 == 1) { if (*ptr == 0x0a && once++) break; else ptr++; } } *ptr = 0; printf ("%d: got reply '%s'\n", i, reply); } sleep (1); printf ("%d: closing master %d\n", i, master_fd); close (master_fd); sleep (1); printf ("%d: end of loop\n", i); } sleep (1); printf ("LOOP FINISHED\n"); fflush (0); return 0; } --------------080809090508040307010100 Content-Type: text/plain; charset=us-ascii -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple --------------080809090508040307010100--