X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; q=dns; s=default; b=N+P4Hav Xc3So9MVb2Vx3hksuGrcyVb9JWCQlU0q0m0hobaCbMx6qowFJ681o6Kbt8O0Rzdp jKZ8wI8ckhQRve7PfwlkMxGql6rJAJWlOgX1wQkn13Pj/j0H866t23x52d3mRbFK GsGq18Bt1nj3ywyD27kBxQJoeP0j6QE8uJsM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; s=default; bh=YUZ/7vxj1HEO0 9vhSTlmFPjNkNo=; b=QGt1evqcVELMMl+uffLCjrWjewY8itdU6JuB9jkEa8EID wLXyPOQUs4OT5o7rJYyYbfLFfrMhwVAsMRhHZZ4+MUc4/v1eW9o6CxcEaPfFQS2R KeXIV+oXZTBcmyV+cYPVadIDXHyEnPlI5AGkfD60+gbcYtF0Eq1ye8FuXqNiS4= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy=Bray, sits, respond X-HELO: mail-ua0-f180.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=Xcm9T1EekURKST0QQX0ByhGQ7l8wOFVmMN8nSrsynBw=; b=dA7MIn74JQtVTLICkiU6689prOvb48Gt0kiCoorpv+mnVCwyekBLPMH9LmbwSYJZ0O 3ANimDAKIXP7kuRgtTVO2yreMrRdp18Y3Vyph6MIrfQooGoIiQ03syyoZl265TZmSN32 s2S9pPP4I5l5IM5JnZZx/1I95x4AKHdrP2IldDNbB1+zLZUcCV2gUm3oFMQLM91sMh2T gxEr32Czm7KxQ6AaLG3qfzH02ZiCCNFK6x/h3w21TDDEwuftRqDwSaEMRTHTQuTB7Syv uesSqJBU0rLUTvbOOMLsKjVJAnQx7iI3z8/Nr3dIKS0HX3swlVNCP6dj4aefOboVavRd uoyA== X-Gm-Message-State: AIkVDXINfntScVYy5AX0CFCl4BgVT2izVfc/2Uy3x1hs/00BJs50CslmxE07Gs4ZXSWlazPLzMZQ7UCxQ4pSEA== X-Received: by 10.176.71.87 with SMTP id i23mr55663729uac.123.1483959713353; Mon, 09 Jan 2017 03:01:53 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: From: Erik Bray Date: Mon, 9 Jan 2017 12:01:52 +0100 Message-ID: Subject: Re: Cygwin hanging in pselect To: cygwin AT cygwin DOT com Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes On Fri, Jan 6, 2017 at 12:40 PM, Erik Bray wrote: > Hello, and happy new-ish year, > > I've been working on and off over the past few months on bringing > Python's compatibility with Cygwin up to snuff, including having all > pertinent tests passing. I've noticed that there are several tests > (which I currently skip) that cause the process to hang indefinitely, > and not respond to any signals from Cygwin (it can only be killed from > Windows). This is Cygwin 64-bit--I have not tested 32-bit. > > I finally looked into this problem and found the lockup to be in > pselect() somewhere. Attached I've provided the most minimal example > I've been able to come up with so far that reproduces the problem, > which I'll describe in a bit more detail next. I would attach a > cygcheck output if requested, but I was also able to reproduce this on > a recent build from source. > > So far as I've been able to tell, the problem only occurs with AF_UNIX > sockets. In the example I have a 'server' socket and a 'client' > socket both set to non-blocking. The client connects to the socket, > returning errno EINPROGRESS as expected. Then I do a pselect on the > client socket to wait until it is ready to be read from. The hang > only happens when I pselect on the client socket, and not on the > server socket. It doesn't seem to make a difference what the timeout > is. One thing I have no tried is if the client and server are > actually different processes, but the example from the Python tests > this is reproducing is where they are both in the same process. > > Below is (I think) the most relevant output from strace on the test > case. It seems to hang somewhere in socket_cleanup, but I haven't > investigated any further than that. I made a little bit of progress debugging this, but now I'm stumped. It seems the problem is this: For each socket whose fd is passed to select() a thread_socket is started which calls peek_socket until there are bits ready on the socket, or until the timeout is reached. This in turn calls fhandler_socket::evaluate_events. The reason it's only locking up on my "client thread" on which connect() is called, is that evaluate_events notes that the socket is waiting to connect, and this passes control to fhandler_socket::af_local_connect(). af_local_connect() temporarily sets the socket to blocking, then sends a magic string to the socket (you can see in my strace log that this succeeds). What's strange, and what I don't understand, is that there are no FD_READ or FD_OOB events recorded for the WSASendTo call from af_local_send_secret(). Then, after af_local_send_secret() it calls af_local_recv_secret(). This calls recv_internal() which in turn calls recursively into fhandler_socket::evaluate_events where it waits for an FD_READ or FD_OOB event that never arrives. And since it set the socket to blocking it just sits in an infinite loop. Meanwhile the timer for the select() call expires and tries to shut down the thread_socket but it can't because it never completes. What I don't understand is why there is not an event recorded for the WSASendTo in send_internal. I even wrapped it with the following debug code to wait for an FD_READ event immediately following the WSASendTo: else if (get_socket_type () == SOCK_STREAM) { WSAEventSelect(get_socket (), wsock_evt, EVENT_MASK); res = WSASendTo (get_socket (), out_buf, out_idx, &ret, flags, wsamsg->name, wsamsg->namelen, NULL, NULL); debug_printf("WSASendTo sent %d bytes; ret: %d", ret, res); while (!(res=wait_for_events (FD_READ | FD_OOB, 0))) { debug_printf("Waiting for socket to be readable"); } } But the strace at this point just outputs: 62 108286 [socksel] poll_test 24152 fhandler_socket::af_local_connect: af_local_connect called, no_getpeereid=0 156 108442 [socksel] poll_test 24152 fhandler_socket::send_internal: WSASendTo sent 16 bytes; ret: 0 It never returns from send_internal. I don't have deep knowledge of WinSock, but from what I've read ISTM WSASendTo should have triggered an FD_READ event on the socket, and it doesn't for some reason. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple