delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin-developers/2003/02/20/09:15:45

Mailing-List: contact cygwin-developers-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-developers-subscribe AT cygwin DOT com>
List-Archive: <http://sources.redhat.com/ml/cygwin-developers/>
List-Post: <mailto:cygwin-developers AT cygwin DOT com>
List-Help: <mailto:cygwin-developers-help AT cygwin DOT com>, <http://sources.redhat.com/ml/#faqs>
Sender: cygwin-developers-owner AT cygwin DOT com
Delivered-To: mailing list cygwin-developers AT cygwin DOT com
Date: Thu, 20 Feb 2003 15:15:39 +0100
From: Corinna Vinschen <vinschen AT redhat DOT com>
To: Cygwin-Developers <cygwin-developers AT cygwin DOT com>
Subject: Re: Threaded socket hang in 1.3.20
Message-ID: <20030220141539.GE2467@cygbert.vinschen.de>
Reply-To: cygwin-developers AT cygwin DOT com
Mail-Followup-To: Cygwin-Developers <cygwin-developers AT cygwin DOT com>
References: <20030218222746 DOT GD2404 AT tishler DOT net>
Mime-Version: 1.0
In-Reply-To: <20030218222746.GD2404@tishler.net>
User-Agent: Mutt/1.4i

On Tue, Feb 18, 2003 at 05:27:47PM -0500, Jason Tishler wrote:
> The attached C++ testcase demonstrates the problem.  In 1.3.20-1, the
> program hangs in the call to socket() in the second thread:
> 
>     Creating thread for fn1
>     fn1 begin
>     fn1: calling accept()...
>     Creating thread for fn2
>     fn2 begin
>     fn2: calling socket()...
> 
> I'm not sure why connect() fails, because a "telnet localhost 54321"
> works just fine.  I'm probably demonstrating my sockets ignorance.

I looked into this problem and it turns out to be a non-socket specific
problem but instead a deadlock problem in cygheap:

When accept is called, it creates a new file descriptor by calling

  cygheap_fdnew res_fd;

before calling winsock's accept().  This in turn creates an exclusive lock
in cygheap_fdnew():

  cygheap_fdnew (int seed_fd = -1, bool lockit = true)
    {
      if (lockit)
	SetResourceLock (LOCK_FD_LIST, WRITE_LOCK | READ_LOCK, "cygheap_fdnew");
      [...]

which is not unlocked as long as the function isn't left.

Since accept hangs until a connection is actually made (on blocking
sockets), the lock persists.  The next socket() call also creates a new
file descriptor the same way.  Since the above lock still applies, this
time the creation of the file descriptor hangs in the call to
SetResourceLock().

Looking through our sources, I found some places where cygheap_fdnew
could possible cause a hang or where the return value isn't tested or
where the lock is unnecessary long due to calling cygheap_fdnew too early.
I've cleaned that up a bit and commited the changes.

Now back to the test case.  With these changes the socket() call doesn't
hang but now connect() is in trouble.  It hangs for a while until it
returns with error 116, Connection timeout.

I must admit, that I didn't find the cause so far.  Help in debugging
this is appreciated.

Corinna


-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Developer                                mailto:cygwin AT cygwin DOT com
Red Hat, Inc.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019