Mail Archives: cygwin/2004/03/11/19:03:53
QUESTION:
Is there an issue in Cygwin 1.5.7 (and still in the 20040306 snapshot)
that might cause a program which has been working back in November 2003
to suddenly stop doing so, specifically in its communication with a
secondary executable which it forks/launches to do DNS resolution?
Second, is there a definite guide to what it truly means when an
application kicks up a Win32 error 487 "*** couldn't allocate cygwin
heap..." type message? [I am not looking for answers like "try the
latest cygwin1.dll" or other "throw the wrench in the engine block and
see if the noise stops" sort of solutions. I am looking for a deeper
understanding of the error message. I like to 'know' what a program is
doing to cause such an error, if that makes sense.]
DETAILS:
____________________________________________________________
ISSUE #1: APPLICATION NO LONGER WORKING UNDER LATEST CYGWIN
Please note I wrote this list last regarding the same application
(Jabberd v1.4.3) back on 4 Dec 2003:
http://article.gmane.org/gmane.os.cygwin/41362
for those looking for more background. That thread concerned an
apparent issue (since fixed) where cygrunsrv did not send the proper
TERMSIG to an entire process group. (Kudos again to Brian Ford and
Corinna Vinschen for resolving that. Cygrunsrv has been a champ ever
since!)
Now I seem to be dealing with a new issue. Since the release of Jabberd
v1.4.3 in Nov2003, I have had it running under Cygwin without issue
(with the exception of the above). However, starting around the
beginning of February, Jabberd can apparently no longer do DNS
resolution. Note all other features still work (basically any
communications between a Jabber client program and the server itself).
Only when attempts are made which involve an external server (like doing
server-2-server communication) is there an issue.
And the only thing I can think that may have caused it--after talking to
our Systems folks to verify no changes were made in our DNS servers or
their configurations, or anything else which might have "broken" this
simple server app--was an update to Cygwin to the latest and greatest at
the time (v1.5.7, released on 31Jan2004). I tried to roll back to
v1.5.5 using setup.exe, but that met with some not-so-pleasant messages
as I simply fired up BASH. [I think I need to rollback more than just
the cygwin package, but not sure what all needs to be reverted. There
is no dependency checking in setup.exe that I can tell.] I also tried
the latest Cygwin snapshot (whole install, not just DLL), but alas that
hasn't fixed things either.
To grasp where I think the issue lies, a quick background on Jabberd.
Without delving into boring details, Jabberd v1.4.x is an open-source
Jabber/XMPP server, details here:
http://jabberd.jabberstudio.org/1.4
[For those inclined to ask, I am not working on/with Jabberd2 at this
point due to the fact it is a complete rewrite and has its own set of
issues under Cygwin, and 1.4.x is still in high use.]
The basic achitecture of Jabberd 1.4.x is a main process which fires up,
then loads dynamic library modules to handle various subtasks:
jabberd.exe
+---dialback.dll
+---dnsrv.dll <---> jabadns.exe *NOTE: See below
+---jsm.dll
+---pthsock.dll
+---xdb_file.dll
[Substitute .so for .dll if you're under *nix.]
Note in the diagram above the module 'dnsrv.dll'. This is the one piece
left in Jabberd which is completely different under Windows/Cygwin than
it is under *nix.
In *nix, dnsrv.so is built using libresolv and basically spawns new
processes via fork() to do resolution. But between various fork()
issues in Cygwin in the past and the lack of a libresolv, it appears the
original authors (I cannot take credit for the work) chose to rewrite
the asynchronous DNS resolver for Cygwin. The Windows version
(dnsrv.dll) basically fires up a second executable called jabadns.exe,
which in turn does the DNS resolution via Windows calls and then returns
the results. Noting the response when simply typing 'jabadns' at the
BASH prompt:
$ jabadns
Syntax: jabadns <read handle> <write handle><debug flag>
it appears that the dynamic library dnsrv.dll communicates with
jabadns.exe via handles. That's about all I know.
The key thing is that all of this worked just fine for months, then
suddenly stopped working around the beginning of Feb 2004. I looked all
over, and eventually I noticed that Cygwin v1.5.7 was released on
31Jan2004, and that I had upgraded not long after that...right around
the time this DNS resolution started failing.
Anyway, apologies for the long-winded explanation, but hoping something
in hear clicks for the core cygwin developers. Please note the Jabber
server I am running is the same one I compiled back in November. On my
testbed machine, I tried recompiling under the latest Cygwin DLLs and
tools, etc., but to no avail. I'm afraid I'm stumped.
Thoughts/ideas?
____________________________________________________________
ISSUE #2: MAKING JABBERD BUILD LIKE UNDER *NIX
(i.e., 'The Bigger Picture')
With this latest issue, I decided to retackle what I alluded to in my
4Dec2003 post; namely, ripping out the above Windows-centric version of
the dnsrv module and using the original *nix-based version. This was
possible thanks to the inclusion of the 'minires' package in Cygwin, and
getting Jabberd to compile under Cygwin as if it were any other *nix was
painless.
The only problem is, I am struggling with a perennial Win32 error 487
"couldn't allocate cygwin heap..." message, which I have definitely
traced via gdb to a fork() call within the dnsrv code.
Please understand, I do not believe the issue to be the use of fork()
itself per se. Jabberd's main process also calls fork(), as well as
using GNU Pth 2.0.0 (which in turn calls various spawn() functions from
what I remember). But the issue occurs when the one particular fork()
call is executed in the dynamic library dnsrv.dll.
This made me wonder whether using fork() in a .DLL was a no-no.
However, a quick test program confirmed that this was not the case. I
wrote a simple main() which used dlopen() to open a simple library which
in turn called fork() and ran different commands in the parent and
child, and all worked well.
I Googled for information and found various mailing list threads where
folks have had this Win32 error 487 with various software projects.
However, the only advice ever given was of the sort "Oh, that version of
Cygwin1.dll appears to have issues. Try another." This does not help
me track down the root cause, unless in fact cygwin1.dll still has
issues in this regard.
I again ran across references to Jason Tishler's rebase tool (and
rebaseall script), so using the modified version of rebaseall which I
hacked to rebase the Jabberd DLL files, I tried that. But to no avail.
No matter what I try, I get this error message.
I found indications that under Cygwin, gcc defaults to a set heap/stack
size, with a default of 1MB if I read things correctly. So I tried
passing arguments like
-Wl,--heap,5000000,--stack,5000000
to the linker via the gcc line in the Makefile, in an attempt to make
the default heap/stack size larger. Again, nothing changed.
At this point, I do not know if I am chasing my tail or not. When an
application suffers this Win32 error 487 message, is it usually an
indication of some glitch in cygwin1.dll, or is it as the message seems
to indicate, either some sort of issue of not enough stack/heap space,
or worse, some kind of access violation where the program is attempting
to access memory it should not?
Again, any and all guidance, advice, wisdom, pointers, etc., welcome.
Thanks in advance to anyone who read this far...and especially if you
are still willing to help me. :-)
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
- Raw text -