Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Subject: Cygwin or openssh socket problem/bug? Date: Wed, 19 Oct 2005 11:11:55 +0200 Message-ID: From: "Michelsen, Robert" To: X-IsSubscribed: yes Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id j9J9C9fx026543 Hello, dont know if this is "openssh" package problem or cygwin internal problem so i post my findings here... I use ssh (agent) running in background to get my remote auth stuff (cvs) done. In my ".bashrc" ---- snip .bashrc --- [ -z "$SSH_AUTH_SOCK" ] && eval `ssh-agent -s` [ -z "$SSH_AGENT_PID" ] || ssh-add -l >/dev/null 2>&1 || ssh-add And ".logout" ---- snip .lgout ----- kill $SSH_AGENT_PID ---- snip ----- When i update my cygwin installation using "setup.exe" i occasionally get the ssh-agent hanging while eating 100% cpu. This happens while the Cygwin Setup Post-Install Script runs. If i renice the "ssh-agent" process, setup (gui) exits cleanly but the process is eating cpu forever. $ ssh -V OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 Using my favorite win32 user mode debugger, ollydbg: ---- snip ----- Threads Ident Entry Data block Last error Status Priority User time System time 0000056C 7C810856 7FFDD000 ERROR_SUCCESS (00000000) Paused 32 + 0 0.0000 s 0.0000 s 00000C4C 00000000 7FFDF000 ERROR_IO_PENDING (000003E5) Active 32 - 15 18.6250 s 29.1718 s 00000F9C 7C810856 7FFDE000 ERROR_SUCCESS (00000000) Paused 32 + 0 0.0000 s 0.0000 s ---- snip ----- Thread 0xc4c is eating cpu (32-15 = reniced it to idle prio) forever. I debugged thru disassembly and using cygwin dll symbols + ssh-agent sources (dont have debug symbols)... ----------------snip ssh-agent.c ----------------------------- http://www.openbsd.org/cgi-bin/cvsweb/~checkout~/src/usr.bin/ssh/ssh-age nt.c?rev=1.123&content-type=text/plain skip: new_socket(AUTH_SOCKET, sock); if (ac > 0) { signal(SIGALRM, check_parent_exists); alarm(10); } idtab_init(); if (!d_flag) signal(SIGINT, SIG_IGN); signal(SIGPIPE, SIG_IGN); signal(SIGHUP, cleanup_handler); signal(SIGTERM, cleanup_handler); nalloc = 0; while (1) { prepare_select(&readsetp, &writesetp, &max_fd, &nalloc); if (select(max_fd + 1, readsetp, writesetp, NULL, NULL) < 0) { if (errno == EINTR) continue; fatal("select: %s", strerror(errno)); } after_select(readsetp, writesetp); } /* NOTREACHED */ } static void after_select(fd_set *readset, fd_set *writeset) { struct sockaddr_un sunaddr; socklen_t slen; char buf[1024]; int len, sock; u_int i; uid_t euid; gid_t egid; for (i = 0; i < sockets_alloc; i++) switch (sockets[i].type) { case AUTH_UNUSED: break; case AUTH_SOCKET: if (FD_ISSET(sockets[i].fd, readset)) { slen = sizeof(sunaddr); sock = accept(sockets[i].fd, (struct sockaddr *) &sunaddr, &slen); if (sock < 0) { error("accept from AUTH_SOCKET: %s", strerror(errno)); break; } ----------------snip ssh-agent.c ----------------------------- 0022EE0C 00402F7C ssh-agen.00402F7C 0022EE10 00000001 |nfds = 1 0022EE14 004754E0 |Readfds = 004754E0 0022EE18 004754F0 |Writefds = 004754F0 0022EE1C 00000000 |Exceptfds = NULL 0022EE20 00000000 \pTimeout = NULL 0022EE24 00350688 0022EE28 0022EF00 0022EE2C 00000764 0022EE30 7C81B808 RETURN to kernel32.7C81B808 from kernel32.7C80250B 0022EE34 0022D238 0022EE38 61133000 ASCII "Cygwin Setup Post-Install Script" ----------------snip ----------------------------- The problem is the following (forever) loop: ---- while (1) { prepare_select(&readsetp, &writesetp, &max_fd, &nalloc); if (select(max_fd + 1, readsetp, writesetp, NULL, NULL) < 0) { if (errno == EINTR) continue; fatal("select: %s", strerror(errno)); } after_select(readsetp, writesetp); } ---- "int cygwin_select(int, _types_fd_set*, _types_fd_set*, _types_fd_set*, timeval*)" in ssh-agents's main is returning "1" (eax) "after_select" is called which calls "cygwin1.accept()" "accept" returns "-1" (eax) and lasterror/errno is 0x6C errno = 0x6C -> #define ESHUTDOWN 108 /* Cannot send after transport endpoint shutdown */ ----- The main question is: can cygwin's "select" be successful and following "accept" fail due to non-socket? Is the problem openssh or cygwin related? Any thoughts? Regards, Robert Michelsen -- -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/