delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2004/03/11/19:03:53

Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sources.redhat.com/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sources.redhat.com/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
To: cygwin AT cygwin DOT com
From: Frank Seesink <frank AT mail DOT wvnet DOT edu>
Subject: Problems running Jabberd v1.4.3 under Cygwin v1.5.7 (or latest snapshot), and heap allocation error caused by fork()
Date: Thu, 11 Mar 2004 19:03:13 -0500
Lines: 165
Message-ID: <c2quo3$qsq$1@sea.gmane.org>
Mime-Version: 1.0
X-Complaints-To: usenet AT sea DOT gmane DOT org
X-Gmane-NNTP-Posting-Host: franktp.wvn.wvnet.edu
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113

QUESTION:

Is there an issue in Cygwin 1.5.7 (and still in the 20040306 snapshot) 
that might cause a program which has been working back in November 2003 
to suddenly stop doing so, specifically in its communication with a 
secondary executable which it forks/launches to do DNS resolution?

Second, is there a definite guide to what it truly means when an 
application kicks up a Win32 error 487 "*** couldn't allocate cygwin 
heap..." type message?  [I am not looking for answers like "try the 
latest cygwin1.dll" or other "throw the wrench in the engine block and 
see if the noise stops" sort of solutions.  I am looking for a deeper 
understanding of the error message.  I like to 'know' what a program is 
doing to cause such an error, if that makes sense.]

DETAILS:
____________________________________________________________
ISSUE #1:  APPLICATION NO LONGER WORKING UNDER LATEST CYGWIN

Please note I wrote this list last regarding the same application 
(Jabberd v1.4.3) back on 4 Dec 2003:

	http://article.gmane.org/gmane.os.cygwin/41362

for those looking for more background.  That thread concerned an 
apparent issue (since fixed) where cygrunsrv did not send the proper 
TERMSIG to an entire process group.  (Kudos again to Brian Ford and 
Corinna Vinschen for resolving that.  Cygrunsrv has been a champ ever 
since!)

Now I seem to be dealing with a new issue.  Since the release of Jabberd 
v1.4.3 in Nov2003, I have had it running under Cygwin without issue 
(with the exception of the above).  However, starting around the 
beginning of February, Jabberd can apparently no longer do DNS 
resolution.  Note all other features still work (basically any 
communications between a Jabber client program and the server itself). 
Only when attempts are made which involve an external server (like doing 
server-2-server communication) is there an issue.

And the only thing I can think that may have caused it--after talking to 
our Systems folks to verify no changes were made in our DNS servers or 
their configurations, or anything else which might have "broken" this 
simple server app--was an update to Cygwin to the latest and greatest at 
the time (v1.5.7, released on 31Jan2004).  I tried to roll back to 
v1.5.5 using setup.exe, but that met with some not-so-pleasant messages 
as I simply fired up BASH.  [I think I need to rollback more than just 
the cygwin package, but not sure what all needs to be reverted.  There 
is no dependency checking in setup.exe that I can tell.]  I also tried 
the latest Cygwin snapshot (whole install, not just DLL), but alas that 
hasn't fixed things either.

To grasp where I think the issue lies, a quick background on Jabberd. 
Without delving into boring details, Jabberd v1.4.x is an open-source 
Jabber/XMPP server, details here:

	http://jabberd.jabberstudio.org/1.4

[For those inclined to ask, I am not working on/with Jabberd2 at this 
point due to the fact it is a complete rewrite and has its own set of 
issues under Cygwin, and 1.4.x is still in high use.]

The basic achitecture of Jabberd 1.4.x is a main process which fires up, 
then loads dynamic library modules to handle various subtasks:

	jabberd.exe
	+---dialback.dll
	+---dnsrv.dll       <---> jabadns.exe    *NOTE:  See below
	+---jsm.dll
	+---pthsock.dll
	+---xdb_file.dll

[Substitute .so for .dll if you're under *nix.]

Note in the diagram above the module 'dnsrv.dll'.  This is the one piece 
left in Jabberd which is completely different under Windows/Cygwin than 
it is under *nix.

In *nix, dnsrv.so is built using libresolv and basically spawns new 
processes via fork() to do resolution.  But between various fork() 
issues in Cygwin in the past and the lack of a libresolv, it appears the 
original authors (I cannot take credit for the work) chose to rewrite 
the asynchronous DNS resolver for Cygwin.  The Windows version 
(dnsrv.dll) basically fires up a second executable called jabadns.exe, 
which in turn does the DNS resolution via Windows calls and then returns 
the results.  Noting the response when simply typing 'jabadns' at the 
BASH prompt:

	$ jabadns
	Syntax: jabadns <read handle> <write handle><debug flag>

it appears that the dynamic library dnsrv.dll communicates with 
jabadns.exe via handles.  That's about all I know.

The key thing is that all of this worked just fine for months, then 
suddenly stopped working around the beginning of Feb 2004.  I looked all 
over, and eventually I noticed that Cygwin v1.5.7 was released on 
31Jan2004, and that I had upgraded not long after that...right around 
the time this DNS resolution started failing.

Anyway, apologies for the long-winded explanation, but hoping something 
in hear clicks for the core cygwin developers.  Please note the Jabber 
server I am running is the same one I compiled back in November.  On my 
testbed machine, I tried recompiling under the latest Cygwin DLLs and 
tools, etc., but to no avail.  I'm afraid I'm stumped.

Thoughts/ideas?

____________________________________________________________
ISSUE #2:  MAKING JABBERD BUILD LIKE UNDER *NIX
	(i.e., 'The Bigger Picture')

With this latest issue, I decided to retackle what I alluded to in my 
4Dec2003 post; namely, ripping out the above Windows-centric version of 
the dnsrv module and using the original *nix-based version.  This was 
possible thanks to the inclusion of the 'minires' package in Cygwin, and 
getting Jabberd to compile under Cygwin as if it were any other *nix was 
painless.

The only problem is, I am struggling with a perennial Win32 error 487 
"couldn't allocate cygwin heap..." message, which I have definitely 
traced via gdb to a fork() call within the dnsrv code.

Please understand, I do not believe the issue to be the use of fork() 
itself per se.  Jabberd's main process also calls fork(), as well as 
using GNU Pth 2.0.0 (which in turn calls various spawn() functions from 
what I remember).  But the issue occurs when the one particular fork() 
call is executed in the dynamic library dnsrv.dll.

This made me wonder whether using fork() in a .DLL was a no-no. 
However, a quick test program confirmed that this was not the case.  I 
wrote a simple main() which used dlopen() to open a simple library which 
in turn called fork() and ran different commands in the parent and 
child, and all worked well.

I Googled for information and found various mailing list threads where 
folks have had this Win32 error 487 with various software projects. 
However, the only advice ever given was of the sort "Oh, that version of 
Cygwin1.dll appears to have issues.  Try another."  This does not help 
me track down the root cause, unless in fact cygwin1.dll still has 
issues in this regard.

I again ran across references to Jason Tishler's rebase tool (and 
rebaseall script), so using the modified version of rebaseall which I 
hacked to rebase the Jabberd DLL files, I tried that.  But to no avail. 
  No matter what I try, I get this error message.

I found indications that under Cygwin, gcc defaults to a set heap/stack 
size, with a default of 1MB if I read things correctly.  So I tried 
passing arguments like

	-Wl,--heap,5000000,--stack,5000000

to the linker via the gcc line in the Makefile, in an attempt to make 
the default heap/stack size larger.  Again, nothing changed.

At this point, I do not know if I am chasing my tail or not.  When an 
application suffers this Win32 error 487 message, is it usually an 
indication of some glitch in cygwin1.dll, or is it as the message seems 
to indicate, either some sort of issue of not enough stack/heap space, 
or worse, some kind of access violation where the program is attempting 
to access memory it should not?

Again, any and all guidance, advice, wisdom, pointers, etc., welcome. 
Thanks in advance to anyone who read this far...and especially if you 
are still willing to help me. :-)


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019