delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2006/08/25/11:21:38

X-Spam-Check-By: sourceware.org
Mime-Version: 1.0 (Apple Message framework v752.2)
To: cygwin AT cygwin DOT com
Message-Id: <CEFD7032-DD32-4F1E-8D2F-C706BE73F470@andrew.cmu.edu>
From: Ethan Tira-Thompson <ejt AT andrew DOT cmu DOT edu>
Subject: cygserver blocking on semctl(SETVAL) call
Date: Fri, 25 Aug 2006 11:21:20 -0400
X-Mailer: Apple Mail (2.752.2)
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

--Apple-Mail-29--29062674
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
	charset=US-ASCII;
	delsp=yes;
	format=flowed

[Slightly modified from version previously sent on cygwin-developers,  
who suggest this is a better forum for discussion]

I've discovered what I believe to be a internal deadlock issue in  
cygserver.

I have a piece of code:
void SemaphoreManager::setValue(semid_t id, int x) const {
	semun params;
	params.val=x;
	cout << "SEMCTL..." << flush;
	if(semctl(semid,id,SETVAL,params)<0) {
		perror("ERROR: SemaphoreManager::setValue (semctl)");
		exit(EXIT_FAILURE);
	}
	cout << "done" << endl;
}

This is part of a function which gets called a number of times  
throughout the life of the program.  It works just fine up until one  
particular call (with x=0) which reliably causes it to block between  
the two cout's.  Not just my program either -- all IPC is blocked at  
this point.  So bringing up new cygwin windows, running 'ipcs', etc.,  
all hang.  Once I kill any one process in the group that are using  
the semaphore, it seems to jump start things a bit and may run a bit  
more, but usually eventually blocks again until all of my program's  
processes are killed.

My code runs fine under Linux and Mac OS X, it's only now that we're  
nearing release that I'm testing under cygwin and finding something  
has gone wrong in the past 9 months or so -- either something updated  
on your end, or a change in our code that's now tickling an issue.

The kicker to note here -- is there any reason a *SETVAL* operation  
could be blocked???  It should either go through or return an error.   
I'm fairly convinced it's *not* this particular semctl call that's  
causing the block, it just gets hung up because some *other*,  
previous, operation has hung cygserver, and it's that operation  
that's causing the trouble.

One nuisance is that when I run cygserver with -d, it doesn't block  
in the same place -- something about all that debugging output  
changes the race conditions.  In any case, I've attached the  
cygserver output leading up to a block, in hopes it means something  
to you.

Thanks for taking a look -- I'm afraid I'm stumped.  (doesn't help  
gdb only reports '??' for all function calls when I attach to a  
process, so I can't tell what any of my code is doing.  And yes, I do  
have -g enabled)

Our code can be checked out from CVS, but before running you'll need  
to increase the semmns and semmsl parameters as described in step 5:
http://www.cs.cmu.edu/~tekkotsu/cygwin-install.html

After that's set up:
cvs -d :pserver:anonymous AT cvs DOT tekkotsu DOT org:/cvs checkout -P Tekkotsu
cd Tekkotsu;
setenv TEKKOTSU_ROOT `pwd` || export TEKKOTSU_ROOT=`pwd`
cd project
make sim
./sim-ERS7 Speed=0

When launched, the simulator forks into four processes, using IPC to  
communicate between them.  'Speed=0' pauses our simulator so it  
shouldn't be trying to process anything.  When launched, it goes  
through a series of runlevels CONSTRUCTING, STARTING, RUNNING,  
STOPPING, DESTRUCTING, DESTRUCTED.  Passing InitialRunlevel=X on the  
command line will stop in a runlevel other than "running", and then  
you can use the 'runlevel' command within the simulator to advance.   
It reliably gets into the "starting" runlevel, but something about  
the "running" runlevel triggers the problem.  SemaphoreManager (from  
the code displayed above) is found in the root IPC directory.
Beware leaked semaphores sets btw, since this problem also causes the  
signal handler to block when trying to remove the set on being  
killed, you'll need to kill -9 it, and use 'ipcs' to check for any  
leftover sets, and then 'ipcrm' them manually between runs.  
(Actually, I find it easier to just kill/relaunch cygserver itself  
which releases all of the blocked processes and clears leftover  
semaphores at the same time)

-ethan

The following trace corresponds to the 'cygserver -d' activity  
following entering the 'runlevel' command to move from STARTING to  
RUNNING, and the block that occurs in that runlevel.

--Apple-Mail-29--29062674
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
	x-unix-mode=0755;
	name=cygserverout.txt
Content-Disposition: attachment;
	filename=cygserverout.txt

cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/process.cc, line 287: Try hold(2508)
cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/process.cc, line 287: holding (2508)
cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sem.cc, line 81: leaving (2508)
cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1001: call to semop(131072
, 0x22C620, 2)

cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1031: Try locking mutex se
mid[0] (2508) (hold: 0)
cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1031: Locked      mutex se
mid[0]/1090 (2508)
cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1086: semop: semaptr=687FA
0, sem_base=686798, semptr=6867C8, sem[4]=0 : op=0, flag=wait

cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1086: semop: semaptr=687FA
0, sem_base=686798, semptr=6867C8, sem[4]=0 : op=1, flag=wait

cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1258: semop:  done

cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1261: Unlocked    mutex se
mid[0]/1090 (owner: 2508)
cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/process.cc, line 287: Try hold(2508)
cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/process.cc, line 287: holding (2508)
cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/process.cc, line 287: Try hold(3696)cygserve
r: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sem.cc, line 81: leaving (2508)

cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/process.cc, line 287: holding (3696)cygserve
r: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 580: call to semctl(131072, 4, 122
90, 0x22C4EC)


cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/process.cc, line 287: Try hold(808)cygserver
: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sem.cc, line 81: leaving (3696)cygserver: /netrel/src
/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 735: Try locking mutex semid[0] (2508) (hold: 0)



cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/process.cc, line 287: holding (808)cygserver
: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1001: call to semop(131072, 0x22CAA
0, 2)
cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 735: Locked      mutex sem
id[0]/1091 (2508)


cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sem.cc, line 81: leaving (808)cygserver: /ne
trel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1031: Try locking mutex semid[0] (3696)
(hold: 2508)cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 847: Unlocked
   mutex semid[0]/1091 (owner: 2508)


cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1001: call to semop(131072
, 0x22CA20, 2)
cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1031: Locked      mutex se
mid[0]/1092 (3696)cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/process.cc, line 287: Try
hold(2844)


cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1031: Try locking mutex se
mid[0] (808) (hold: 3696)cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1
086: semop: semaptr=687FA0, sem_base=686798, semptr=6867C8, sem[4]=1 : op=0, flag=wait
cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/process.cc, line 287: holding (2844)


cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1100: semop:  not zero now

cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sem.cc, line 81: leaving (2844)

cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1123: semop:  rollback 0 t
hrough -1
cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1001: call to semop(131072
, 0x22C6F0, 2)


cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1146: semop:  good night!
cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1031: Try locking mutex se
mid[0] (3296) (hold: 3696)

cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1031: Locked      mutex se
mid[0]/1093 (808)cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/bsd_mutex.cc, line 309: Unl
ocked    mutex semid[0]/1092 (owner: 3696)

cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1086: semop: semaptr=687FA
0, sem_base=686798, semptr=6867C8, sem[4]=1 : op=0, flag=wait

cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1100: semop:  not zero now


cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1123: semop:  rollback 0 t
hrough -1

cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1146: semop:  good night!

cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1031: Locked      mutex se
mid[0]/1094 (3296)cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/bsd_mutex.cc, line 309: Un
locked    mutex semid[0]/1093 (owner: 808)

cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1086: semop: semaptr=687FA
0, sem_base=686798, semptr=6867C8, sem[4]=1 : op=0, flag=wait

cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1100: semop:  not zero now


cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1123: semop:  rollback 0 t
hrough -1

cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/sysv_sem.cc, line 1146: semop:  good night!

cygserver: /netrel/src/cygwin-1.5.21-2/winsup/cygserver/bsd_mutex.cc, line 309: Unlocked    mutex se
mid[0]/1094 (owner: 3296)


--Apple-Mail-29--29062674
Content-Type: text/plain; charset=us-ascii

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/
--Apple-Mail-29--29062674--

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019