Mail Archives: cygwin/2003/05/12/04:01:50
--=-rIXmOE0wzXgt9MoJ+rlf
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Hello,
Attached, you will find a MUCH simplified version of
the problem I am having. I believe that it has all
the essential elements. I built the program with
make
nothing more. I run it, and it continuously loops as
designed. I install it with cygrunsrv and it seems to
be installed, then I start it with cygrunsrv and it
always errs, so I am unable to tell you whether or
not it reproduces my problem. If it does, the program
should die rather than continuing to loop.
You might want to try it for yourself as you surely
understand using cygrunsrv better than I do. This is
the best I can do. I would appreciate it to know if
you are able to reproduce the problem.
Best regards,
Kern
On Mon, 2003-05-12 at 03:02, Igor Pechtchanski wrote:
> On 11 May 2003, Kern Sibbald wrote:
>
> > Hello,
> >
> > In addition to the email you sent me last Thursday, which I
> > received just a few minutes later, I just now received another
> > copy apparently destined for david DOT postill AT pobox DOT com, but some
> > how it got routed to me, the long, slow way (3 days).
> > Well, anything that goes to or through the Blue Yonder is
> > likely to be a bit slow ... :-)
> >
> > By the way, I could not run my app. with cygrunsrv. I
> > don't know why, probably because both cygrunsrv and my app
> > are trying to talk to the service manager, so for the moment
> > I give up on this.
>
> Kern,
>
> cygrunsrv expects to be the one to talk to the service manager. If your
> program also does, there's an obvious conflict of interest. I was
> suggesting making a small command-line testcase, running it with
> cygrunsrv, and seeing if it exhibits the same kind of behavior your main
> program does. If it doesn't, move code from your main application until
> the behavior is replicated (or until all of the main application except
> the service manager code is present). If you still can't replicate the
> problem, it's probably in your service manager interface code, and you
> won't need it anyway with cygrunsrv (and you would have by that point a
> service that runs with cygrunsrv). If the behavior is replicated, look
> into the code that was added last -- that's probably your culprit. If you
> can replicate the behavior in a small example, send it to the list.
>
> > Best regards,
> > Kern
> >
> > PS: I sent this off list on purpose -- I suspect there may be a
> > bug in the list program, or more likely a bug at David Postill's
> > site.
>
> I don't see how this rates a private e-mail, especially to me. If there
> is a bug in the list software, the list should know about it. If there is
> a bug at David Posthill's site, he should know about it. Please do not
> send private mail unless requested to do so.
> Igor
> P.S. I'm forwarding this whole e-mail to the list, as the below may be of
> interest to at least David Posthill and possibly others.
>
> > Here it is the email mentioned above with headers turned on:
> >
> > ============= Copy of email just received =================
> > Return-Path: <cygwin-owner AT cygwin DOT com>
> > Received: from blueyonder.co.uk (pcow025o.blueyonder.co.uk
> > [195.188.53.125]) by matou.sibbald.com (8.11.6/8.11.6) with
> > ESMTP id
> > h4BK6rf15398 for <kern AT sibbald DOT com>; Sun, 11 May 2003 22:06:56
> > +0200
> > Received: from mail pickup service by blueyonder.co.uk with Microsoft
> > SMTPSVC; Sun, 11 May 2003 19:18:26 +0100
> > Received: from pcol001m.blueyonder.net ([195.188.53.104]) by
> > blueyonder.co.uk with Microsoft SMTPSVC(5.5.1877.757.75); Fri,
> > 9 May 2003
> > 16:12:40 +0100
> > Received: from exim by pcol001m.blueyonder.net with relayed (Exim 4.12)
> > id
> > 19E974-0005xX-00 for david DOT postill AT blueyonder DOT co DOT uk; Fri, 09 May
> > 2003
> > 15:44:54 +0100
> > Received: from [212.24.65.71] (helo=mutt.eurobell.net) by
> > pcol001m.blueyonder.net with smtp (Exim 4.12) id
> > 19E973-0005xU-00 for
> > david DOT postill AT blueyonder DOT co DOT uk; Fri, 09 May 2003 15:44:41 +0100
> > Received: (qmail 13027 invoked from network); 8 May 2003 19:04:05 -0000
> > Received: from unknown (HELO kumquat.pobox.com) (64.119.218.72) by
> > mailq1.blueyonder.co.uk with SMTP; 8 May 2003 19:04:05 -0000
> > Received: from kumquat.pobox.com (localhost.localdomain [127.0.0.1]) by
> > kumquat.pobox.com (Postfix) with ESMTP id 986D659E98 for
> > <david DOT postill AT blueyonder DOT co DOT uk>; Thu, 8 May 2003 15:04:01
> > -0400 (EDT)
> > Delivered-To: david DOT postill AT pobox DOT com
> > Received: from sources.redhat.com (sources.redhat.com [66.187.233.205])
> > by
> > kumquat.pobox.com (Postfix) with SMTP id 655AC3E832 for
> > <david DOT postill AT pobox DOT com>; Thu, 8 May 2003 15:03:58 -0400 (EDT)
> > Received: (qmail 9121 invoked by alias); 8 May 2003 19:03:45 -0000
> > Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
> > Precedence: bulk
> > List-Unsubscribe:
> > <mailto:cygwin-unsubscribe-david.postill=pobox DOT com AT cygwin DOT com>
> > List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
> > List-Archive: <http://sources.redhat.com/ml/cygwin/>
> > List-Post: <mailto:cygwin AT cygwin DOT com>
> > List-Help: <mailto:cygwin-help AT cygwin DOT com>,
> > <http://sources.redhat.com/ml/#faqs>
> > Sender: cygwin-owner AT cygwin DOT com
> > Mail-Followup-To: cygwin AT cygwin DOT com
> > Delivered-To: mailing list cygwin AT cygwin DOT com
> > Received: (qmail 9114 invoked from network); 8 May 2003 19:03:45 -0000
> > Received: from unknown (HELO slinky.cs.nyu.edu) (128.122.20.14) by
> > sources.redhat.com with SMTP; 8 May 2003 19:03:45 -0000
> > Received: from localhost (pechtcha AT localhost) by slinky.cs.nyu.edu
> > (8.11.7+Sun/8.11.7) with ESMTP id h48J3fM28152; Thu, 8 May 2003
> > 15:03:42
> > -0400 (EDT)
> > X-Authentication-Warning: slinky.cs.nyu.edu: pechtcha owned process
> > doing
> > -bs
> > Date: Thu, 8 May 2003 15:03:41 -0400 (EDT)
> > From: Igor Pechtchanski <pechtcha AT cs DOT nyu DOT edu>
> > Reply-To: cygwin AT cygwin DOT com
> > To: Kern Sibbald <kern AT sibbald DOT com>
> > Cc: cygwin AT cygwin DOT com
> > Subject: Re: pthread_signal() references illegal memory address
> > In-Reply-To: <1052391117 DOT 6139 DOT 1146 DOT camel AT rufus>
> > Message-ID: <Pine DOT GSO DOT 4 DOT 44 DOT 0305081501120 DOT 22924-100000 AT slinky DOT cs DOT nyu DOT edu>
> > Importance: Normal
> > MIME-Version: 1.0
> > Content-Type: TEXT/PLAIN; charset=US-ASCII
> > X-Annoyance-Filter-Junk-Probability: 0
> > X-Annoyance-Filter-Classification: Mail
> >
> > On 8 May 2003, Kern Sibbald wrote:
> >
> > > Hello,
> > >
> > > Please don't think I'm not interested in this if
> > > it takes a bit of time to get back to you ...
> > >
> > > See responses below:
> > >
> > >
> > > On Mon, 2003-05-05 at 19:30, Igor Pechtchanski wrote:
> > > > On 5 May 2003, Kern Sibbald wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > On Mon, 2003-05-05 at 18:38, Igor Pechtchanski wrote:
> > > > > > On 5 May 2003, Kern Sibbald wrote:
> > > > > > [snip]
> > > > > > > Anyway here is one:
> > > > > > >
> > > > > > > Running WinXP Home version.
> > > > > > >
> > > > > > > Using Cygwin 1.3.20
> > > > > > >
> > > > > > > When running my program with LocalSystem userid
> > > > > > > as a service, doing a pthread_kill(thread_id, SIGUSR2)
> > > > > > > causes some sort of memory fault referencing memory at 0x3a
> > > > > > > (or something like that because the program disappears
> > > > > > > poof).
> > > > > > >
> > > > > > > Running as a normal user works fine.
> > > > > >
> > > > > > What's the exact error message (I assume you get a popup box)?
> > > > >
> > > > > No, I get absolutely nothing. Poof and it is gone, well, the
> > > > > service manager knows it went away but not why.
> > > > >
> > > > > A friend ran the program on Win2K and he got:
> > > > >
> > > > > Instruction at 0x0041276a referenced memory at 0x3c
> > > > >
> > > > > That appears to be somewhere in the cygwin1.dll.
> > > >
> > > > Try checking the "Allow service to interact with the desktop" box,
> > and you
> > > > should see the error popup on your system too.
> > >
> > > My service always interacts with the desktop. It is capable of
> > > doing MessageBox(), and it always has an icon in the system tray
> > > with a menu that works.
> > >
> > > I get absolutely nothing in terms of output of any sort when
> > > the program crashes -- as I said, it goes poof. This could
> > > be my own fault for trapping signals, but normally during
> > > signal handling there is a considerable amount of printout,
> > > ...
> > >
> > > > > > Is there a stacktrace file generated?
> > > > >
> > > > > If it is, I don't know where the system put it.
> > > >
> > > > The system should put it in the directory from which the program is
> > run.
> > >
> > > There is no stack dump or any other file in the directory from
> > > which the program (Bacula) executes.
> > >
> > > >
> > > > > > Did you try setting
> > > > > > "error_start:c:/cygwin/bin/dumper.exe" in your CYGWIN
> > environment
> > > > > > variable?
> > >
> > > I doubt this would help much, maybe I am wrong, please see below.
> > >
> > > > >
> > > > > No, if you can tell me how to set the environment variable for
> > > > > a service, I'll try it, but since it is a service, I am unlikely
> > > > > to get any output.
> > > >
> > > > "cygrunsrv --help", or "man cygrunsrv", or see /bin/ssh-host-config
> > for an
> > > > example. You might also need the "Allow service to interact with
> > desktop"
> > > > bit.
> > >
> > > None of the above mentioned things exist on my system. In any case,
> > > I have no problem setting the program up as a service (it installs
> > > itself with allowing interaction with the desktop by default).
> > >
> > > > > > Did you try running the program from the command line in a
> > > > > > LocalSystem-owned shell?
> > > > >
> > > > > I ran it in an rxvt shell under my id and it does not crash.
> > > > > Tell me how to get a LocalSystem owned shell and I will try
> > > > > it. This is XP Home, so I don't have access to a lot of the
> > > > > XP security dialogs.
> > > >
> > > > "at <time> /interactive c:\cygwin\bin\bash.exe -i --login"
> > > > (<time> should be current time however long you're willing to wait,
> > at
> > > > least one minute). "at /?" for help.
> > > > [Note, this works on Win2k, don't know about XP Home].
> > >
> > > Yes, your trick works on WinXP Home too. So much for Windows
> > > security!
> > >
> > > The interesting thing is that when I run the program under
> > > a rxvt window with the bash shell with the LocalSystem account,
> > > it does NOT crash. I also ran the program under
> > > a MS DOS shell and I get the same result: it does
> > > not crash.
> > >
> > > It crashes only if it is started by the service dialog.
> > >
> > > > > > Can you provide a simple testcase that
> > > > > > reproduces your problem?
> > > > >
> > > > > Probably not as my program is some 65K+ lines of code.
> > >
> > > > You could try a simple program that calls the offending function
> > (after
> > > > creating some threads, most likely), and see if the problem
> > manifests...
> > >
> > > Well, I was considering doing so, since creating a thread
> > > and sending it a signal is a 10 line program. However, this
> > > problem requires the program to run as a service, and that
> > > is a considerable amount of code.
> >
> > Kern,
> >
> > That's what "cygrunsrv" is for! It takes *any* command-line program and
> > turns it into a service. :-D Try making a small command-line example
> > and
> > run it as a service using cygrunsrv (you'll have to install the
> > cygrunsrv
> > package).
> > Igor
> >
> > > > > I've solved the problem for myself by doing the "signal"
> > > > > a different way, so it is not critical for me but it cost
> > > > > about 8 hours of debugging -- primarily due to the fact that
> > > > > it seems to be dependent on whether or not it is a service.
> > > > >
> > > > > Best regards,
> > > > > Kern
> > > >
> > > > It's most likely dependent on the value of your CYGWIN variable or
> > some
> > > > permissions (as the service runs as LocalSystem). Trying the
> > program out
> > > > from a LocalSystem-owned window (see above) should give you some
> > idea of
> > > > what's at fault.
> > >
> > > I agree with you, but my CYGWIN environment variable is not set.
> > >
> > > If you have any other ideas I'll try them, otherwise, I'll avoid
> > > using pthread_signal() under CYGWIN.
> > >
> > > Best regards,
> > > Kern
--=-rIXmOE0wzXgt9MoJ+rlf
Content-Disposition: attachment; filename=pthread_bug.c
Content-Type: text/x-c; name=pthread_bug.c; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
#include "stdio.h"
#include "signal.h"
#include "pthread.h"
#include "unistd.h"
static int hb_bsock;
static pthread_t heartbeat_id;
static int stop;
#ifndef _NSIG
#define BA_NSIG 100
#else
#define BA_NSIG _NSIG
#endif
static const char *sig_names[BA_NSIG+1];
typedef void (SIG_HANDLER)(int sig);
static SIG_HANDLER *exit_handler;
/*
* Handle signals here
*/
static void signal_handler(int sig)
{
static int already_dead = 0;
if (already_dead) {
_exit(1);
}
/* Ignore certain signals */
if (sig == SIGCHLD || sig == SIGUSR2) {
return;
}
printf("Got signal %d. Exiting.\n", sig);
already_dead = sig;
exit(1);
}
void init_signals(void terminate(int sig))
{
struct sigaction sighandle;
struct sigaction sigignore;
struct sigaction sigdefault;
exit_handler = terminate;
sig_names[0] = "UNKNOWN SIGNAL";
sig_names[SIGHUP] = "Hangup";
sig_names[SIGINT] = "Interrupt";
sig_names[SIGQUIT] = "Quit";
sig_names[SIGILL] = "Illegal instruction";;
sig_names[SIGTRAP] = "Trace/Breakpoint trap";
sig_names[SIGABRT] = "Abort";
#ifdef SIGEMT
sig_names[SIGEMT] = "EMT instruction (Emulation Trap)";
#endif
#ifdef SIGIOT
sig_names[SIGIOT] = "IOT trap";
#endif
sig_names[SIGBUS] = "BUS error";
sig_names[SIGFPE] = "Floating-point exception";
sig_names[SIGKILL] = "Kill, unblockable";
sig_names[SIGUSR1] = "User-defined signal 1";
sig_names[SIGSEGV] = "Segmentation violation";
sig_names[SIGUSR2] = "User-defined signal 2";
sig_names[SIGPIPE] = "Broken pipe";
sig_names[SIGALRM] = "Alarm clock";
sig_names[SIGTERM] = "Termination";
#ifdef SIGSTKFLT
sig_names[SIGSTKFLT] = "Stack fault";
#endif
sig_names[SIGCHLD] = "Child status has changed";
sig_names[SIGCONT] = "Continue";
sig_names[SIGSTOP] = "Stop, unblockable";
sig_names[SIGTSTP] = "Keyboard stop";
sig_names[SIGTTIN] = "Background read from tty";
sig_names[SIGTTOU] = "Background write to tty";
sig_names[SIGURG] = "Urgent condition on socket";
sig_names[SIGXCPU] = "CPU limit exceeded";
sig_names[SIGXFSZ] = "File size limit exceeded";
sig_names[SIGVTALRM] = "Virtual alarm clock";
sig_names[SIGPROF] = "Profiling alarm clock";
sig_names[SIGWINCH] = "Window size change";
sig_names[SIGIO] = "I/O now possible";
#ifdef SIGPWR
sig_names[SIGPWR] = "Power failure restart";
#endif
#ifdef SIGWAITING
sig_names[SIGWAITING] = "No runnable lwp";
#endif
#ifdef SIGLWP
sig_name[SIGLWP] = "SIGLWP special signal used by thread library";
#endif
#ifdef SIGFREEZE
sig_names[SIGFREEZE] = "Checkpoint Freeze";
#endif
#ifdef SIGTHAW
sig_names[SIGTHAW] = "Checkpoint Thaw";
#endif
#ifdef SIGCANCEL
sig_names[SIGCANCEL] = "Thread Cancellation";
#endif
#ifdef SIGLOST
sig_names[SIGLOST] = "Resource Lost (e.g. record-lock lost)";
#endif
/* Now setup signal handlers */
sighandle.sa_flags = 0;
sighandle.sa_handler = signal_handler;
sigfillset(&sighandle.sa_mask);
sigignore.sa_flags = 0;
sigignore.sa_handler = SIG_IGN;
sigfillset(&sigignore.sa_mask);
sigdefault.sa_flags = 0;
sigdefault.sa_handler = SIG_DFL;
sigfillset(&sigdefault.sa_mask);
sigaction(SIGPIPE, &sigignore, NULL);
sigaction(SIGCHLD, &sighandle, NULL);
sigaction(SIGCONT, &sigignore, NULL);
sigaction(SIGPROF, &sigignore, NULL);
sigaction(SIGWINCH, &sigignore, NULL);
sigaction(SIGIO, &sighandle, NULL);
sigaction(SIGINT, &sigdefault, NULL);
sigaction(SIGXCPU, &sigdefault, NULL);
sigaction(SIGXFSZ, &sigdefault, NULL);
sigaction(SIGHUP, &sigignore, NULL);
sigaction(SIGQUIT, &sighandle, NULL);
sigaction(SIGILL, &sighandle, NULL);
sigaction(SIGTRAP, &sighandle, NULL);
/* sigaction(SIGABRT, &sighandle, NULL); */
#ifdef SIGEMT
sigaction(SIGEMT, &sighandle, NULL);
#endif
#ifdef SIGIOT
/* sigaction(SIGIOT, &sighandle, NULL); used by debugger */
#endif
sigaction(SIGBUS, &sighandle, NULL);
sigaction(SIGFPE, &sighandle, NULL);
sigaction(SIGKILL, &sighandle, NULL);
sigaction(SIGUSR1, &sighandle, NULL);
sigaction(SIGSEGV, &sighandle, NULL);
sigaction(SIGUSR2, &sighandle, NULL);
sigaction(SIGALRM, &sighandle, NULL);
sigaction(SIGTERM, &sighandle, NULL);
#ifdef SIGSTKFLT
sigaction(SIGSTKFLT, &sighandle, NULL);
#endif
sigaction(SIGSTOP, &sighandle, NULL);
sigaction(SIGTSTP, &sighandle, NULL);
sigaction(SIGTTIN, &sighandle, NULL);
sigaction(SIGTTOU, &sighandle, NULL);
sigaction(SIGURG, &sighandle, NULL);
sigaction(SIGVTALRM, &sighandle, NULL);
#ifdef SIGPWR
sigaction(SIGPWR, &sighandle, NULL);
#endif
#ifdef SIGWAITING
sigaction(SIGWAITING,&sighandle, NULL);
#endif
#ifdef SIGLWP
sigaction(SIGLWP, &sighandle, NULL);
#endif
#ifdef SIGFREEZE
sigaction(SIGFREEZE, &sighandle, NULL);
#endif
#ifdef SIGTHAW
sigaction(SIGTHAW, &sighandle, NULL);
#endif
#ifdef SIGCANCEL
sigaction(SIGCANCEL, &sighandle, NULL);
#endif
#ifdef SIGLOST
sigaction(SIGLOST, &sighandle, NULL);
#endif
}
static void *sd_heartbeat_thread(void *arg)
{
pthread_detach(pthread_self());
hb_bsock = 1;
printf("HB thread started.\n");
for ( ; !stop; ) {
sleep(1000);
}
hb_bsock = 0;
return NULL;
}
void start_heartbeat_monitor()
{
stop = 0;
hb_bsock = 0;
pthread_create(&heartbeat_id, NULL, sd_heartbeat_thread, NULL);
}
/* Terminate the heartbeat thread. Used for both SD and DIR */
void stop_heartbeat_monitor()
{
/* Wait for heartbeat thread to start */
while (hb_bsock == 0) {
printf("Waiting for hb thread to start.\n");
sleep(1);
}
stop = 1;
while (hb_bsock) {
/* Cygwin 1.3.20 craps out on the following */
printf("Send sig %d\n", SIGUSR2);
pthread_kill(heartbeat_id, SIGUSR2); /* make heartbeat thread go away */
sleep(1);
}
}
void terminate(int sig)
{
printf("Terminate handler.\n");
exit(1);
}
int main(int argc, char **argv)
{
init_signals(terminate);
for ( ;; ) {
printf("Start...\n");
start_heartbeat_monitor();
sleep(1);
printf("Stop...\n");
stop_heartbeat_monitor();
printf("Start and stop complete.\n");
}
}
--=-rIXmOE0wzXgt9MoJ+rlf
Content-Type: text/plain; charset=us-ascii
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
--=-rIXmOE0wzXgt9MoJ+rlf--
- Raw text -