delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2017/01/09/06:02:11

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:mime-version:in-reply-to:references:from:date
:message-id:subject:to:content-type; q=dns; s=default; b=N+P4Hav
Xc3So9MVb2Vx3hksuGrcyVb9JWCQlU0q0m0hobaCbMx6qowFJ681o6Kbt8O0Rzdp
jKZ8wI8ckhQRve7PfwlkMxGql6rJAJWlOgX1wQkn13Pj/j0H866t23x52d3mRbFK
GsGq18Bt1nj3ywyD27kBxQJoeP0j6QE8uJsM=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:mime-version:in-reply-to:references:from:date
:message-id:subject:to:content-type; s=default; bh=YUZ/7vxj1HEO0
9vhSTlmFPjNkNo=; b=QGt1evqcVELMMl+uffLCjrWjewY8itdU6JuB9jkEa8EID
wLXyPOQUs4OT5o7rJYyYbfLFfrMhwVAsMRhHZZ4+MUc4/v1eW9o6CxcEaPfFQS2R
KeXIV+oXZTBcmyV+cYPVadIDXHyEnPlI5AGkfD60+gbcYtF0Eq1ye8FuXqNiS4=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy=Bray, sits, respond
X-HELO: mail-ua0-f180.google.com
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=Xcm9T1EekURKST0QQX0ByhGQ7l8wOFVmMN8nSrsynBw=; b=dA7MIn74JQtVTLICkiU6689prOvb48Gt0kiCoorpv+mnVCwyekBLPMH9LmbwSYJZ0O 3ANimDAKIXP7kuRgtTVO2yreMrRdp18Y3Vyph6MIrfQooGoIiQ03syyoZl265TZmSN32 s2S9pPP4I5l5IM5JnZZx/1I95x4AKHdrP2IldDNbB1+zLZUcCV2gUm3oFMQLM91sMh2T gxEr32Czm7KxQ6AaLG3qfzH02ZiCCNFK6x/h3w21TDDEwuftRqDwSaEMRTHTQuTB7Syv uesSqJBU0rLUTvbOOMLsKjVJAnQx7iI3z8/Nr3dIKS0HX3swlVNCP6dj4aefOboVavRd uoyA==
X-Gm-Message-State: AIkVDXINfntScVYy5AX0CFCl4BgVT2izVfc/2Uy3x1hs/00BJs50CslmxE07Gs4ZXSWlazPLzMZQ7UCxQ4pSEA==
X-Received: by 10.176.71.87 with SMTP id i23mr55663729uac.123.1483959713353; Mon, 09 Jan 2017 03:01:53 -0800 (PST)
MIME-Version: 1.0
In-Reply-To: <CAOTD34ZBJSD2guV9Qjz_wvtMH+vWYi7Hgr3NdaJ81FPOBycuZA@mail.gmail.com>
References: <CAOTD34ZBJSD2guV9Qjz_wvtMH+vWYi7Hgr3NdaJ81FPOBycuZA AT mail DOT gmail DOT com>
From: Erik Bray <erik DOT m DOT bray AT gmail DOT com>
Date: Mon, 9 Jan 2017 12:01:52 +0100
Message-ID: <CAOTD34Z_58ce-E0wCuJP67UODZLmWncuXaZUGOyNeYX_atXh6w@mail.gmail.com>
Subject: Re: Cygwin hanging in pselect
To: cygwin AT cygwin DOT com
X-IsSubscribed: yes

On Fri, Jan 6, 2017 at 12:40 PM, Erik Bray <erik DOT m DOT bray AT gmail DOT com> wrote:
> Hello, and happy new-ish year,
>
> I've been working on and off over the past few months on bringing
> Python's compatibility with Cygwin up to snuff, including having all
> pertinent tests passing.  I've noticed that there are several tests
> (which I currently skip) that cause the process to hang indefinitely,
> and not respond to any signals from Cygwin (it can only be killed from
> Windows).  This is Cygwin 64-bit--I have not tested 32-bit.
>
> I finally looked into this problem and found the lockup to be in
> pselect() somewhere.  Attached I've provided the most minimal example
> I've been able to come up with so far that reproduces the problem,
> which I'll describe in a bit more detail next. I would attach a
> cygcheck output if requested, but I was also able to reproduce this on
> a recent build from source.
>
> So far as I've been able to tell, the problem only occurs with AF_UNIX
> sockets.  In the example I have a 'server' socket and a 'client'
> socket both set to non-blocking.  The client connects to the socket,
> returning errno EINPROGRESS as expected.  Then I do a pselect on the
> client socket to wait until it is ready to be read from.  The hang
> only happens when I pselect on the client socket, and not on the
> server socket.  It doesn't seem to make a difference what the timeout
> is.  One thing I have no tried is if the client and server are
> actually different processes, but the example from the Python tests
> this is reproducing is where they are both in the same process.
>
> Below is (I think) the most relevant output from strace on the test
> case.  It seems to hang somewhere in socket_cleanup, but I haven't
> investigated any further than that.

I made a little bit of progress debugging this, but now I'm stumped.
It seems the problem is this:

For each socket whose fd is passed to select() a thread_socket is
started which calls peek_socket until there are bits ready on the
socket, or until the timeout is reached.  This in turn calls
fhandler_socket::evaluate_events.

The reason it's only locking up on my "client thread" on which
connect() is called, is that evaluate_events notes that the socket is
waiting to connect, and this passes control to
fhandler_socket::af_local_connect().  af_local_connect() temporarily
sets the socket to blocking, then sends a magic string to the socket
(you can see in my strace log that this succeeds).  What's strange,
and what I don't understand, is that there are no FD_READ or FD_OOB
events recorded for the WSASendTo call from af_local_send_secret().
Then, after af_local_send_secret() it calls af_local_recv_secret().
This calls recv_internal() which in turn calls recursively into
fhandler_socket::evaluate_events where it waits for an FD_READ or
FD_OOB event that never arrives.  And since it set the socket to
blocking it just sits in an infinite loop.

Meanwhile the timer for the select() call expires and tries to shut
down the thread_socket but it can't because it never completes.

What I don't understand is why there is not an event recorded for the
WSASendTo in send_internal.  I even wrapped it with the following
debug code to wait for an FD_READ event immediately following the
WSASendTo:

      else if (get_socket_type () == SOCK_STREAM)
      {
        WSAEventSelect(get_socket (), wsock_evt, EVENT_MASK);
        res = WSASendTo (get_socket (), out_buf, out_idx, &ret, flags,
                 wsamsg->name, wsamsg->namelen, NULL, NULL);
          debug_printf("WSASendTo sent %d bytes; ret: %d", ret, res);
          while (!(res=wait_for_events (FD_READ | FD_OOB, 0))) {
              debug_printf("Waiting for socket to be readable");
          }
      }



But the strace at this point just outputs:
   62  108286 [socksel] poll_test 24152
fhandler_socket::af_local_connect: af_local_connect called,
no_getpeereid=0
  156  108442 [socksel] poll_test 24152
fhandler_socket::send_internal: WSASendTo sent 16 bytes; ret: 0

It never returns from send_internal.  I don't have deep knowledge of
WinSock, but from what I've read ISTM WSASendTo should have triggered
an FD_READ event on the socket, and it doesn't for some reason.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019