Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com From: "Bob Byrnes" Date: Thu, 30 Oct 2003 22:31:19 -0500 Organization: Curl Corporation X-Address: 1 Cambridge Center, 10th Floor, Cambridge, MA 02142-1612 X-Phone: 617-761-1238 X-Fax: 617-761-1201 To: cygwin AT cygwin DOT com Subject: Cygwin deadlocks due to broken select() when writing to pipes Message-Id: <20031031033119.611A4E55A@carnage.curl.com> I have recently discovered that the Cygwin implementation of select() is broken (or at best incomplete): it incorrectly claims that file descriptors are *always* ready to write to pipes. That's bad, because when select() indicates that file descriptors are ready for writing (or reading), then it is supposed to be guaranteed that a subsequent write() (or read()) will not block. But writes to a pipe can certainly block if the pipe happens to be full (i.e., the process reading from the other end of the pipe is doing so slowly, and the amount of data in transit exceeds the system-dependent limit on the buffer size of the pipe). Many programs (rsync and sshd come to mind) are written to use select() to avoid blocking write() and read() calls, and if select() misbehaves as described above, then they can deadlock. We have observed this happening in a variety of scenarios, but the most reproducible is to run rsync over ssh to pull data from a Cygwin system to some other system, like Linux. This has been reported by others to the rsync mailing list: http://www.mail-archive.com/rsync AT lists DOT samba DOT org/msg07559.html The strace output reported in this message is consistent with our experience, and shows that a deadlock occurs when the rsync server process is looping doing ... select(2, NULL, [1], NULL, {60, 0}) = 1 (out [1], left {60, 0}) write(1, "...", 4096) = 4096 The write() blocks after select() incorrectly claims that fd 1 is ready for writing. The Cygwin strace output shows this even more clearly: ---------------------------------------- 128 124570283 [main] rsync 940 cygwin_select: 2, 0x0, 0x226A30, 0x0, 0x226A20 182 124570465 [main] rsync 940 dtable::select_write: fd 1 95 124570560 [main] rsync 940 cygwin_select: to->tv_sec 60, to->tv_usec 0, ms 60000 98 124570658 [main] rsync 940 cygwin_select: sel.always_ready 1 103 124570761 [main] rsync 940 select_stuff::cleanup: calling cleanup routines 104 124570865 [main] rsync 940 set_bits: me 0x101BA4C0, testing fd 1 () 103 124570968 [main] rsync 940 set_bits: ready 1 96 124571064 [main] rsync 940 select_stuff::poll: returning 1 101 124571165 [main] rsync 940 select_stuff::cleanup: calling cleanup routines 101 124571266 [main] rsync 940 select_stuff::~select_stuff: deleting select records 178 124571444 [main] rsync 940 writev: writev (1, 0x2269F0, 1) 97 124571541 [main] rsync 940 fhandler_base::write: binary write ... write() blocks here, eventually ... 140 124571681 [main] rsync 940 fhandler_base::write: 4096 = write (0x226A60, 4096) 102 124571783 [main] rsync 940 writev: 4096 = write (1, 0x2269F0, 1), errno 0 ---------------------------------------- I have also appended a short test program that reproduces the bug. The program creates a pipe and writes to it in small chunks until the pipe fills. If it is compiled with -USELECT, then eventually write() blocks, as expected. However, if we compile with -DSELECT, then on UNIX systems, one or more write() calls succeed, and eventually select() starts timing out to indicate that the pipe is full (so the write file descriptor is not ready). On Cygwin the program blocks in write() even with -DSELECT, which isn't supposed to happen. I was a bit surprised not to see any mention of this important limitation of select() for pipes in the User's Guide (section 1.6.10) or in the source code. But in winsup/cygwin/select.cc it is clear that fhandler_pipe::select_write just sets the write_ready field of the select_record to true, and peek_pipe doesn't do anything for the write file descriptor case. We can also see that the always_ready field is set in the strace output above. It isn't immediately clear how to fix this. I see that PeekNamedPipe() is used to determine if read descriptors for pipes are ready, but this obviously won't work for write file descriptors. Were any other approaches considered and rejected while this code was being developed, or was the problem not recognized at the time? -- Bob Byrnes e-mail: byrnes AT curl DOT com Curl Corporation phone: 617-761-1200 1 Cambridge Center, 10th Floor fax: 617-761-1201 Cambridge, MA 02142-1612 ---------------------------------------- /* sel-pipe.c */ #include #include #include #ifdef SELECT #include #include #include #endif /* SELECT */ #ifndef CHUNK #define CHUNK 1024 #endif static char buf[CHUNK]; int main(int argc, char **argv) { int pfds[2]; int count = 0; if (pipe(pfds) == -1) { perror("pipe"); exit(2); } while (1) { #ifdef SELECT int nfds; struct timeval timeout; fd_set wfds; int found; nfds = pfds[1] + 1; timeout.tv_sec = 1; timeout.tv_usec = 0; FD_ZERO(&wfds); FD_SET(pfds[1], &wfds); switch (found = select(nfds, NULL, &wfds, NULL, &timeout)) { case 1: if (!FD_ISSET(pfds[1], &wfds)) { fprintf(stderr, "select returned without fd set\n"); exit(3); } break; /* continue with write, below */ case 0: printf("pipe is full\n"); fflush(stdout); continue; case -1: perror("select"); exit(4); default: fprintf(stderr, "select returned strange fd count %d\n", found); exit(5); } #endif /* SELECT */ printf("writing chunk #%d ... ", ++count); fflush(stdout); if (write(pfds[1], buf, sizeof(buf)) == -1) { perror("write"); exit(9); } printf("done\n"); fflush(stdout); } } -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/