Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Message-ID: <20040226101953.54870.qmail@web60305.mail.yahoo.com> Date: Thu, 26 Feb 2004 02:19:53 -0800 (PST) From: Patrick Samson Subject: Re: select() hangs sometimes, for TCP connections To: cygwin AT cygwin DOT com In-Reply-To: <20040213122720.17765.qmail@web60301.mail.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-IsSubscribed: yes Note-from-DJ: This may be spam I finally found the culprit. It seems to be a Tcl extension which was badly built. The DB replication scripts are written in Tcl. For the communication between hosts, the extension Tcl-DP is used, with TCP socket channels. The extension is provided as C sources. So I had to build a DLL. The window$ version was intended to be built with VC++. As I didn't have access to this compiler, my purpose was to make it with gcc. Nice, isn't it? But, as there is always a but, it was my first use of autotools, so I wasn't always aware of what I was doing. I fixed some bugs in the sources, made some adaptations for the integration with Tcl sources, made a lot of combinations in the autotools configuration to get a DLL loadable with the tcl command "package require dp". As there was too many errors with a basic build, the -no-cygwin version was mandatory. Unfortunately, before to find a working solution with the -shared option of gcc, I was dealing with libtool. It seems that libtool introduces a dependency with cygwin1.dll with its way to impose the entry point (-Wl,-e,...). And because of the -no-cygwin option, msvcrt.dll is used. IIRC, mixing cygwin1 and msvcrt at the same time is not advised (eventhough some pretend to succeed in building such an executable). I made a version with VC++, and another one with gcc, without libtool or unnecessary options. These two leave the system stable. I can't say for sure the problem is solved, just that the system is more stable. I ran the replication more than 570 and 220 times. Here is the dependencies for a working version: D:/cygwin/usr/share/tcl8.4/dp4.0/win/dp40.dll D:\cygwin\bin\tcl84.dll C:\WINNT\System32\ADVAPI32.DLL C:\WINNT\System32\ntdll.dll C:\WINNT\System32\KERNEL32.dll C:\WINNT\System32\USER32.dll C:\WINNT\System32\GDI32.dll C:\WINNT\System32\RPCRT4.dll D:\cygwin\bin\cygwin1.dll C:\WINNT\System32\msvcrt.dll C:\WINNT\System32\WS2_32.DLL C:\WINNT\System32\WS2HELP.dll And for the bad version: D:/cygwin/usr/share/tcl8.4/dp4.0/win/dp40.dll.gcc.ko D:\cygwin\bin\cygwin1.dll ^^^^^^^ C:\WINNT\System32\ADVAPI32.DLL C:\WINNT\System32\ntdll.dll C:\WINNT\System32\KERNEL32.dll C:\WINNT\System32\USER32.dll C:\WINNT\System32\GDI32.dll C:\WINNT\System32\RPCRT4.dll D:\cygwin\bin\tcl84.dll C:\WINNT\System32\msvcrt.dll ^^^^^^^ C:\WINNT\System32\WSOCK32.DLL C:\WINNT\System32\WS2_32.dll C:\WINNT\System32\WS2HELP.dll The imported functions of cygwin1.dll are: abort cygwin_detach_dll cygwin_internal dll_dllcrt0 pthread_atfork calloc malloc realloc free --- Patrick Samson wrote: > Problem: sometimes select() doesn't return. > > Context: I run a DB replication scenario, > with cron, everything 5 mn. There is no change in > the > DB, so the scenario is always the same. Most of the > time, it works. But eventually, after some time (may > be some minutes or hours), a process A keeps waiting > forever in select() for a response on a TCP socket. > With gdb I can see that the other end B returned in > its > ReadCommand() function, meaning it has send its > response and waits for a new command, so this side > should be OK. > [snip] __________________________________ Do you Yahoo!? Get better spam protection with Yahoo! Mail. http://antispam.yahoo.com/tools -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/