Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Date: Sun, 13 Nov 2005 14:19:27 +0100 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: cygrunsrv hangs forever on exec error (fix included) Message-ID: <20051113131927.GA3462@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <43761BF8 DOT 9030603 AT t-online DOT de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <43761BF8.9030603@t-online.de> User-Agent: Mutt/1.4.2i On Nov 12 17:44, Christian Franke wrote: > Hi, > > if the exec in cygrunsrv fails or the command exits to early, cygrunsrv > will hang forever. > The service can no longer be controlled until cygrunsrv has been kill(-9)ed. > Auto restart of service does not work in this case. > > > Steps to reproduce (at least on XP SP2): > > $ cygrunsrv -I test -p /bin/true > > $ cygrunsrv -S test > cygrunsrv: Error starting a service: ... Win32 error 1053: > ... > > $ sleep 3600 #;-) > > $ cygrunsrv -S test > Service : test > Current State : Start Pending > Controls Accepted : Stop > Command : /bin/true > > cygrunsrv process has to be killed manually. > > Same result occurs if the command's exe is removed. > > > This is due to the following (IMO undocumented and "interesting";-) > behavior of the windows SCM: > > The StartServiceControlDispatcher() routine does not return unless some > thread (no necessarily service_main()) sets SERVICE_STOPPED. > > The service_main() thread started by SCM is left alone. > Exiting service_main() does nothing, in particular it does not end > StartServiceControlDispatcher(). > > But, if SERVICE_STOPPED is set, StartServiceControlDispatcher() > immediately returns, again without any care about service_main(). > Because usually now the program exit()'s and kills all threads, code > following SERVICE_STOPPED may not be executed at all. Thanks for this report and the simple testcases. The description is very helpful. I just don't really like the idea to leave the service_main function through _exit. So I rearranged the waitpid evaluation to accomplish two results, first, except in the neverexit case, always set the service status to SERVICE_STOPPED *and* do it as the last action in service_main, second, simplify the code. What bugged me was that the WIFEXITED/WIFSIGNALED parts exists twice, even though there isn't really a lot of difference between the return code from the first vs. the second waitpid. I didn't create a new cygrunsrv version for now, instead I'm sending my diff. I would like to hear what you think and if I didn't made a fatal mistake, I'll uplaod a new cygrunsrv version with this changes. Corinna * cygrunsrv.cc (service_main): Simplify waitpid return value evaluation. Always set service status to SERVICE_STOPPED, except in the neverexits case. Move the set_service_status call to be always the last action in service_main. Index: cygrunsrv.cc =================================================================== RCS file: /cvs/cygwin-apps/cygrunsrv/cygrunsrv.cc,v retrieving revision 1.28 diff -p -u -r1.28 cygrunsrv.cc --- cygrunsrv.cc 22 May 2005 16:34:55 -0000 1.28 +++ cygrunsrv.cc 13 Nov 2005 13:10:53 -0000 @@ -1515,81 +1515,78 @@ service_main (DWORD argc, LPSTR *argv) report_service_status (); int status = 1; - if (!waitpid (server_pid, &status, WNOHANG)) + int ret = waitpid (server_pid, &status, WNOHANG); + if (!ret) { /* Child has probably `execv'd properly. */ set_service_status (SERVICE_RUNNING); syslog(LOG_INFO, "`%s' service started", svcname); /* Keep repeating waitpid() command as long as it returned because of a handled signal sent to this cygrunsrv process */ - int ret; while ((ret = waitpid (server_pid, &status, 0)) == -1 && errno == EINTR) ; + } + else + /* The ret == -1 case below is only valid for the inner watpid call. */ + ret = 0; - /* If ret is -1, report errno, else process the status */ - if (ret == -1) + /* If ret is -1, report errno, else process the status */ + if (ret == -1) + { + switch (errno) { - switch (errno) - { - case ECHILD: - syslog (LOG_ERR, "service `%s' exited, & its status was lost" - " errno ECHILD", svcname); - break; - default: - syslog (LOG_ERR, "service `%s' error: waitpid() failed: " - "errno %d", svcname, errno); - } + case ECHILD: + syslog (LOG_ERR, "service `%s' exited, its status was lost" + " errno ECHILD", svcname); + break; + default: + syslog (LOG_ERR, "service `%s' error: waitpid() failed: " + "errno %d", svcname, errno); } - else if (WIFEXITED (status)) + service_main_exitval = errno; + set_service_status (SERVICE_STOPPED); + } + else if (WIFEXITED (status)) + { + unsigned char s = WEXITSTATUS (status); + if (neverexits && !shutting_down) { - unsigned char s = WEXITSTATUS (status); - if (neverexits && !shutting_down) - { - syslog (LOG_ERR, "`%s' service exited prematurely with " - "exit status: %u", svcname, s); - /* Do not report that the service is stopped so that if - recovery options are set, Windows will automatically - restart the service. */ - service_main_exitval = s; - } - else - { - syslog (LOG_INFO, "`%s' service stopped, exit status: %u", - svcname, s); - set_service_status (SERVICE_STOPPED); - service_main_exitval = 0; - } + syslog (LOG_ERR, "`%s' service exited prematurely with " + "exit status: %u", svcname, s); + /* Do not report that the service is stopped so that if + recovery options are set, Windows will automatically + restart the service. */ + service_main_exitval = s; } - else if (WIFSIGNALED (status)) + else { - /* If the signal is the one we've send, everything's ok. - Otherwise we log the signal event. */ - if (!termsig_sent || WTERMSIG (status) != termsig) - syslog (LOG_ERR, "service `%s' failed: signal %d raised", - svcname, WTERMSIG (status)); - else - syslog (LOG_INFO, "`%s' service stopped, signal %d received", - svcname, WTERMSIG (status)); - set_service_status (SERVICE_STOPPED); + syslog (LOG_INFO, "`%s' service stopped, exit status: %u", + svcname, s); service_main_exitval = 0; + set_service_status (SERVICE_STOPPED); } - else - syslog (LOG_ERR, "`%s' service stopped for an unknown reason", - svcname); - } - else if (WIFEXITED (status)) - { - /* Although we're not going to set the service status to stopped, - only allow zero exit status if neverexits is not set. */ - if (!neverexits) - service_main_exitval = WEXITSTATUS (status); - syslog_starterr ("execv", 0, WEXITSTATUS (status)); } else if (WIFSIGNALED (status)) - syslog (LOG_ERR, "starting service `%s' failed: signal %d raised", - svcname, WTERMSIG (status)); - break; + { + /* If the signal is the one we've send, everything's ok. + Otherwise we log the signal event. */ + if (!termsig_sent || WTERMSIG (status) != termsig) + syslog (LOG_ERR, "service `%s' failed: signal %d raised", + svcname, WTERMSIG (status)); + else + syslog (LOG_INFO, "`%s' service stopped, signal %d received", + svcname, WTERMSIG (status)); + service_main_exitval = 0; + set_service_status (SERVICE_STOPPED); + } + else + { + syslog (LOG_ERR, "`%s' service stopped for an unknown reason", + svcname); + service_main_exitval = 0; + set_service_status (SERVICE_STOPPED); + } } } -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat, Inc. -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/