Mail Archives: cygwin/2017/01/09/08:29:22
X-Recipient: | archive-cygwin AT delorie DOT com
|
DomainKey-Signature: | a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
|
| :list-unsubscribe:list-subscribe:list-archive:list-post
|
| :list-help:sender:mime-version:from:date:message-id:subject:to
|
| :content-type; q=dns; s=default; b=IDRMS0Bi3FkXfhrtzhl0S6jRnUIVg
|
| QU/9x2hfVi58pAGfmq3DSYyLQM9j1HZHthnyqkrvt62kJv7W1kO4GJaaYOC2sLCl
|
| 1Z0eDErxIiUrhDxEiZuOtXc1AxwAOWTqSBSXrVje27+pRM80gqZHQJLy5pcbDKVj
|
| kY9Z7oUgZuQouo=
|
DKIM-Signature: | v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
|
| :list-unsubscribe:list-subscribe:list-archive:list-post
|
| :list-help:sender:mime-version:from:date:message-id:subject:to
|
| :content-type; s=default; bh=DdjAk+hdutbC9OFABpuc6IZU070=; b=bPx
|
| ukM4VSB4bqBVIlrKE0TIjK9cfm79FXo8IaOuoqkIV3Nf3buN5rJySqvQBftIq4TN
|
| 9wohF6En3pe7Rw1M3Sr47zktdpupcDOQ4gIh7n5D/Km4AE4X7dxQ+ovdTPJdKbKp
|
| hvd2SQVNFaVusW0zScbSbWs2BgNzLRl9lpxTRgs0=
|
Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm
|
List-Id: | <cygwin.cygwin.com>
|
List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com>
|
List-Archive: | <http://sourceware.org/ml/cygwin/>
|
List-Post: | <mailto:cygwin AT cygwin DOT com>
|
List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
|
Sender: | cygwin-owner AT cygwin DOT com
|
Mail-Followup-To: | cygwin AT cygwin DOT com
|
Delivered-To: | mailing list cygwin AT cygwin DOT com
|
Authentication-Results: | sourceware.org; auth=none
|
X-Virus-Found: | No
|
X-Spam-SWARE-Status: | No, score=-0.8 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy=Meanwhile, respond
|
X-HELO: | mail-vk0-f42.google.com
|
X-Google-DKIM-Signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=IPJJsHyJW/EnRzghhnECMddMH/HJiGSC5SqMJn8duPY=; b=FUOXX3AOdWTH/FM6/t0ONijB61sbi/88dpc1F43CKGeID403WV2e2eRnR3Ep6ngwlk lziTV9vysTNawFTaOToJ7+N8QEsu4gvjLe+QtEUI/yDi+fGDAOHgPev1b3fHFQsfqAI1 yzpNTI5iuoZWw+7Bf11nFrDojbxNV+0/g9jO58tTHRUMufVSV6qCCyuNkHWmYcZ25XDM Fi/yAScAwproJ0nC8gkagwQ5BjtoD9rAG5aNULi1yyf+ITrhLkVLEWNTxXuzNacYuOdT +yPkzlAChtUIiyPXKlcnmTRbsUTEeis8VE6wb3yrGW2dc2SnqMWh7PR/rghMoiOku1C9 XaVA==
|
X-Gm-Message-State: | AIkVDXIvz+0X+A6Vs6+XxsM+YGgdpqIVlMNVIqbLk5Z4DqRh6AlcBTl7bfvHqHEO/CorG0ywmqbVMJBGhXcMZg==
|
X-Received: | by 10.31.80.197 with SMTP id e188mr32409286vkb.109.1483968541002; Mon, 09 Jan 2017 05:29:01 -0800 (PST)
|
MIME-Version: | 1.0
|
From: | Erik Bray <erik DOT m DOT bray AT gmail DOT com>
|
Date: | Mon, 9 Jan 2017 14:29:00 +0100
|
Message-ID: | <CAOTD34aMu6DLP-8kaRXZRoihGxW9Jusf1pDBeM3cTWeNRfLVWw@mail.gmail.com>
|
Subject: | Re: Hangs on connect to UNIX socket being listened on in the same process (was: Cygwin hanging in pselect)
|
To: | cygwin AT cygwin DOT com
|
X-IsSubscribed: | yes
|
On Mon, Jan 9, 2017 at 12:01 PM, Erik Bray <erik DOT m DOT bray AT gmail DOT com> wrote:
> On Fri, Jan 6, 2017 at 12:40 PM, Erik Bray <erik DOT m DOT bray AT gmail DOT com> wrote:
>> Hello, and happy new-ish year,
>>
>> I've been working on and off over the past few months on bringing
>> Python's compatibility with Cygwin up to snuff, including having all
>> pertinent tests passing. I've noticed that there are several tests
>> (which I currently skip) that cause the process to hang indefinitely,
>> and not respond to any signals from Cygwin (it can only be killed from
>> Windows). This is Cygwin 64-bit--I have not tested 32-bit.
>>
>> I finally looked into this problem and found the lockup to be in
>> pselect() somewhere. Attached I've provided the most minimal example
>> I've been able to come up with so far that reproduces the problem,
>> which I'll describe in a bit more detail next. I would attach a
>> cygcheck output if requested, but I was also able to reproduce this on
>> a recent build from source.
>>
>> So far as I've been able to tell, the problem only occurs with AF_UNIX
>> sockets. In the example I have a 'server' socket and a 'client'
>> socket both set to non-blocking. The client connects to the socket,
>> returning errno EINPROGRESS as expected. Then I do a pselect on the
>> client socket to wait until it is ready to be read from. The hang
>> only happens when I pselect on the client socket, and not on the
>> server socket. It doesn't seem to make a difference what the timeout
>> is. One thing I have no tried is if the client and server are
>> actually different processes, but the example from the Python tests
>> this is reproducing is where they are both in the same process.
>>
>> Below is (I think) the most relevant output from strace on the test
>> case. It seems to hang somewhere in socket_cleanup, but I haven't
>> investigated any further than that.
>
> I made a little bit of progress debugging this, but now I'm stumped.
> It seems the problem is this:
>
> For each socket whose fd is passed to select() a thread_socket is
> started which calls peek_socket until there are bits ready on the
> socket, or until the timeout is reached. This in turn calls
> fhandler_socket::evaluate_events.
>
> The reason it's only locking up on my "client thread" on which
> connect() is called, is that evaluate_events notes that the socket is
> waiting to connect, and this passes control to
> fhandler_socket::af_local_connect(). af_local_connect() temporarily
> sets the socket to blocking, then sends a magic string to the socket
> (you can see in my strace log that this succeeds). What's strange,
> and what I don't understand, is that there are no FD_READ or FD_OOB
> events recorded for the WSASendTo call from af_local_send_secret().
> Then, after af_local_send_secret() it calls af_local_recv_secret().
> This calls recv_internal() which in turn calls recursively into
> fhandler_socket::evaluate_events where it waits for an FD_READ or
> FD_OOB event that never arrives. And since it set the socket to
> blocking it just sits in an infinite loop.
>
> Meanwhile the timer for the select() call expires and tries to shut
> down the thread_socket but it can't because it never completes.
>
> What I don't understand is why there is not an event recorded for the
> WSASendTo in send_internal. I even wrapped it with the following
> debug code to wait for an FD_READ event immediately following the
> WSASendTo:
>
> else if (get_socket_type () == SOCK_STREAM)
> {
> WSAEventSelect(get_socket (), wsock_evt, EVENT_MASK);
> res = WSASendTo (get_socket (), out_buf, out_idx, &ret, flags,
> wsamsg->name, wsamsg->namelen, NULL, NULL);
> debug_printf("WSASendTo sent %d bytes; ret: %d", ret, res);
> while (!(res=wait_for_events (FD_READ | FD_OOB, 0))) {
> debug_printf("Waiting for socket to be readable");
> }
> }
>
>
>
> But the strace at this point just outputs:
> 62 108286 [socksel] poll_test 24152
> fhandler_socket::af_local_connect: af_local_connect called,
> no_getpeereid=0
> 156 108442 [socksel] poll_test 24152
> fhandler_socket::send_internal: WSASendTo sent 16 bytes; ret: 0
>
> It never returns from send_internal. I don't have deep knowledge of
> WinSock, but from what I've read ISTM WSASendTo should have triggered
> an FD_READ event on the socket, and it doesn't for some reason.
After playing around with this a bit more I came up with a much
simpler example. This has nothing to do with select( ) at all,
directly.
The simplified example is just:
#include <arpa/inet.h>
#include <sys/socket.h>
#include <string.h>
#include <stdio.h>
#include <sys/un.h>
#include <errno.h>
int main(void) {
fd_set rfds;
int sock_server, sock_client;
int retval;
struct sockaddr_un addr;
memset(&addr, 0, sizeof(addr));
addr.sun_family = AF_UNIX;
strcpy(addr.sun_path, "@test.sock");
sock_server = socket(AF_UNIX, SOCK_STREAM, 0);
if (bind(sock_server, (struct sockaddr*)&addr, sizeof(addr))) {
printf("binding server socket failed");
return 1;
}
retval = listen(sock_server, 5);
printf("Ret from listen: %d\n", retval);
sock_client = socket(AF_UNIX, SOCK_STREAM, 0);
retval = connect(sock_client, (struct sockaddr*)&addr, sizeof(addr));
printf("Ret from client connect: %d; errno: %d\n", retval, errno);
return 0;
}
On Linux this example works as I expect, and the connect() call
returns immediately. However, on Cygwin the connect() call hangs
after af_local_send_secret(), as described in my first message.
However, when I split this example up into separate client and server
processes it works as expected and the connect() is properly
negotiated and returns immediately.
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
- Raw text -