Mailing-List: contact cygwin-help AT sourceware DOT cygnus DOT com; run by ezmlm
Sender: cygwin-owner AT sources DOT redhat DOT com
Subject: Re: SIGILL with pthreads and sockets
From: Robert Collins <robert DOT collins AT itdomain DOT com DOT au>
To: Wesel <wassdamo AT pacbell DOT net>
Cc: cygwin AT cygwin DOT com
In-Reply-To: <3B6CA668.86B5E8C0@pacbell.net>
References: <3B6CA668 DOT 86B5E8C0 AT pacbell DOT net>
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Message-Id: <996978320.22039.93.camel@lifelesswks>
Mime-Version: 1.0
Date: 05 Aug 2001 12:28:36 +1000

On 04 Aug 2001 18:50:32 -0700, Wesel wrote:
> And bears, oh my.
> 
> I've been trying to make some very simple proxy server software (just an
> Advertisement filter) and not having much luck.  I managed to track the problem 
> down to the function select() and those for resolving host names.  When used in
> child threads, these functions somehow render the system unstable so that later,
> during an innocuous program statement, a SIGILL is raised.

Ok, pragmatic solution: just use the precompiled squid proxy server or
modify the squid source to do what you need. Onto the less pragmatic
solution...

> Anyway, here's the code that produces a SIGILL.  By commenting out the "#define
> THREADED" line, I made all errors go away (as well as any semblance of speed or
> efficiency).  Could somebody wiser in the ways of cygwin please tell me if I'm
> running up against some unforseen or little-known compiler problem?  I'm really at a
> loss why it doesn't work.

What version of cygwin are you using (uname -a)? 
 
> Some notes before the code.
> 1)  Most of the time my threads run synchronously, only one running at a time.  Code
> I put between the UNLOCK and the LOCK macros consists of blocking functions, select,
> gethostname, and such.  I repeat, most of the time the thread is LOCKed.  Outside of
> between the UNLOCK and LOCK macros, shared resources can not be accessed at the
> same time.

Good.

> 2) My mysterious lock_function is a rather lame cludge that checks to make sure I'm
> not calling pthread_mutex_lock twice in a row in a thread.

If you need a kludge to prevent double locking, there is likely a design
issue with your program. 
> 3) threadcounter is a local variable for each thread.  It is initialized to a
> constantly incrementing g_threadcounter, so every thread has a unique number
> starting from 0 which is the main thread, and continuing 1 through TEST_SET which
> are the child threads.  I could have used pthread_self(), but where's the fun in
> reading hexadecimal anyway?  :)

<shrug>.

> 4) The hosts string array is not intended to infringe upon any copyrights, being
> that it's as many URLs as I could think up in 5 minutes.  Please don't sue me,
> Disney.
> 
> File: test.cpp
> ---
> #include <arpa/inet.h>
> #include <sys/socket.h>
> #include <netdb.h>
> #include <unistd.h>
> #include <errno.h>//for all our possible error message
> 
> #include <stdio.h>
> #include <stdlib.h>
> 
> #include <pthread.h>
> #include <map>//for watching thread locks

Whats in <map> ? My ignorance is showing here.

> #define THREADED
> 
> #define TRUE 1
> 
> struct SocketInfo
> {
> 	SocketInfo() {}
> 	
> 	SocketInfo(const SocketInfo& sp)
> 	{
> 		address = sp.address;
> 		socket = sp.socket;
> 	}
> 	sockaddr_in address;
> 	int socket;
> 	const char* host;
> };
> 
> void Test4(void);
> void* Test4Thread(void*);
> 
> int main(int argc, char* argv[])
> {
> 	try
> 	{
> 		Test4();
> 	}
> 	catch(int error)
> 	{
> 		printf("Feep! %s\n",strerror(error));
> 	}
> 	catch(...)
> 	{
> 		puts("Feeperific!");
> 		throw;
> 	}
> 	
> 	return 0;
> }
>
> #ifdef THREADED
> 
> pthread_mutex_t  popcorn = PTHREAD_MUTEX_INITIALIZER;
> 
> void lock_function(int which, int inc)
> {
> 	static map<int,int> lock_test;
> 	
> 	lock_test[which] += inc;
> 	
> 	if(lock_test[which]>1)
> 	{
> 		printf("Feep!  %d thread locked itself twice!\n", which);
> 		exit(0);
> 	}
> 	
> 	if(lock_test[which]<0)
> 	{
> 		printf("Feep!  %d thread unlocked itself twice!\n", which);
> 		exit(0);
> 	}
> }
> 	
> #define LOCK { \
> 	pthread_mutex_lock(&popcorn); \
> 	lock_function(threadcounter, 1); \
> 	printf("%d> Lock\n",threadcounter); \
> }
> #define UNLOCK { \
> 	printf("%d> Unlock\n",threadcounter);\
> 	lock_function(threadcounter, -1);\
> 	pthread_mutex_unlock(&popcorn); \
> }
> 
> #else
> 
> #define LOCK
> #define UNLOCK
> 
> #endif
> 
> #define DESTPORT 80
> 
> int g_threadcounter = 0;
> int sock_size = sizeof(sockaddr_in);
> 
> 
> #define TEST_SET 15
> //Make 10 connections.
> 
> SocketInfo dest[TEST_SET];
> pthread_t t_id[TEST_SET];
> char* hosts[TEST_SET] = { 
> 	"transform.to", 
> 	"integral.org",
> 	"www.google.com",
> 	"altavista.com",
> 	"208.180.232.33",
> 	"www.gamefaqs.com",
> 	"204.71.200.74",
> 	"www.pokemon.com",
> 	"www.disney.com",
> 	"216.218.194.6",
> 	"www.ucdavis.edu",
> 	"www.cnet.com",
> 	"www.gnu.org",
> 	"www.landfield.com",
> 	"216.200.16.61"};
> 
> int yes = 1;
> 
> void* Test4thread(void* arg) {
> 	
> 	int threadcounter = ++g_threadcounter;

You have a race here. Every thread _can_ end up with the same
threadcounter. 

> 	LOCK;
> 	
> 	SocketInfo& dest = *((SocketInfo*) arg);

I may be off my rocker here, but shouldn't this be 
SocketInfo dest * = (SocketInfo *) arg;
otherwise there is no point having a globally declared dest[] anyway.
also relevant is the duplicate name that can lead to scope confusion
BTCATK.

> 	hostent* hp = NULL;
> 	
> 	try 
> 	{
> 	
> 	printf("Thread #%d starting!\n",threadcounter);
> 
> 	
> 	bzero((char*) dest.address.sin_zero, 8); 
> 	// zero the rest of the struct
> 	dest.address.sin_family = AF_INET;	
> 	// host byte order
> 	dest.address.sin_port = htons(DESTPORT);
> 	// short, network byte order
> 	
> 	
> 	UNLOCK;
> 	printf("Thread #%d resolving!\n",threadcounter);
> 	printf("Resolving %s...\n", dest.host);

You have races here: printf to console is not guaranteed threadsafe.

> 	dest.address.sin_addr.s_addr = inet_addr(dest.host);
> 	if(dest.address.sin_addr.s_addr == (unsigned)-1) 
> 	{
> 		//host is not an IP address.  Attempt to resolve...
> 		hp = gethostbyname(dest.host);
> 		if (hp)
> 		{
> 			printf("%d> Host! %s\n", threadcounter, hp->h_name);
> 			dest.address.sin_family = hp->h_addrtype;
> 			bcopy(hp->h_addr, (caddr_t)&dest.address.sin_addr, hp->h_length);
> 		}
> 		else
> 		{
> 				printf("Unknown host %s\n", dest.host);
> 				return arg;
> 		}
> 	}
> 	printf("Thread #%d done resolving.\n",threadcounter);
> 	LOCK;
> 		
> 	int right_fd = connect(dest.socket,(sockaddr*) &dest.address,sock_size);
> 	
> 	if(right_fd == -1)
> 	{
> 		puts("Destination would not connect!");
> 		printf("%s %d\n", _sys_errlist[errno], threadcounter);
> 		return arg;

Where's your UNLOCK?

> 	}
> 	
> 	fd_set sockfd;
> 	timeval timeout;
> 	timeout.tv_sec = 1;
> 	timeout.tv_usec = 0;
> 	
> 	
> 	FD_ZERO(&sockfd);
> 	FD_SET(dest.socket,&sockfd);
> 	
> 	printf("Thread #%d selecting!\n",threadcounter);
> 	UNLOCK;
> 	if(select(dest.socket+1, &sockfd, NULL, NULL, &timeout)<0)
> 	{
> 		printf("%d> Feep! %s", threadcounter, strerror(errno));
> 		return arg;
> 	}
> 	LOCK;
> 	printf("Thread #%d done selecting!\n",threadcounter);
> 	
> 	}
> 	catch(...) { puts("Feeperdeep"); }
> 	
> 	close(dest.socket);
> 	
> 	UNLOCK;
> 			
> 	return arg;
> }
> 
> void Test4(void) {
> 	
> 	setbuf(stdout,NULL);
> 	
> 	int threadcounter = g_threadcounter;

if g_threadcounter is your global tool to id threads, then this is
suspect: you need int *threadcounter = &g_threadcounter;

> 	int i = 0;
> 
> 	for(i = 0; i < TEST_SET; i++)
> 	{
> 		dest[i].host = hosts[i];
> 				
> 		if ((dest[i].socket = socket(AF_INET, SOCK_STREAM, 0)) == -1) {
> 			perror("socket");
> 			exit(1);
> 		}
> 		
> 		if (setsockopt(dest[i].socket,SOL_SOCKET,SO_REUSEADDR,&yes,sizeof(int)) == -1) {
> 			perror("setsockopt");
> 			exit(1);
> 		}
> 	
> 	}
> 	
> 	LOCK;  //Wait for it...
> 	for(i = 0; i < TEST_SET; i++)
> 	{
> 		t_id[i] = new pthread_t;

This is wrong. pthreads is a C interface, not a C++ interface. If you
want to initialize a pthread_t variable before pthread_create, use
memset (&t_id[i], '\0', sizeof(pthread_t)); It shouldn't be causing your
bug, but it will lead to memory leaks at the very least.

> #ifdef THREADED
> 		pthread_create (t_id + i, NULL, Test4thread, (void*) (dest + i));
This is not intuitive to read: even though it is functionally correct.
I'd suggest
pthread_create (&t_id[i], NULL, Test4thread, (void *) (&dest[i]));

> #else
> 		Test4thread((void*) (dest + i));
> #endif
> 	}
> 	UNLOCK; //Go!
> 	
> 	for(i = 0; i < TEST_SET; i++)
> 	{
> #ifdef THREADED
> 		pthread_join(t_id[i],NULL);
> #endif
> 		LOCK;
> 		printf("Thread %d joined.\n",i+1);
> 		UNLOCK;
> 		delete t_id[i];

again, this is wrong, new and delete aren't valid for pthread
operations. I have _no idea_ what sideeffects you could be creating by
doing this. You are trying to free an opaque pointer. Your compiler
*should* be spitting the dummy at this.

> 	}
> 	
> 	threadcounter--;
> 		
> 	puts("This is after threads.");
> 	return;
> }
> ---
> 
> Get the picture?  Basically it creates a bunch of sockets, pairs them up with a
> host name in my SocketInfo structure, then has 10 baby threads resolve the host
> names, connect the sockets and wait for readable data.  Since I'm connecting at
> the HTTP port, there will never be readable data until I send "GET somefile.html"
> or something.  Therefore, the select functions all timeout, then the threads clean
> up and exit harmlessly.
> 
> Well maybe not so harmlessly.  After thread #6 prints "Thread #6 Done selecting"
> and UNLOCKS, a SIGILL happens.  It's always thread #6 in GDB. Thread #9 when
> not using GDB.  I don't know why.  It happens sometime while returning from the
> lock_function function, according to GDB.
> 
> I'd appreciate it if someone would tell me if this is a problem with cygwin
> itself, or with the nut on the end of the keyboard.

#0 tell us what version of cygwin this happens on.
#1 build yourself a debug version of cygwin.
#2 break point at the line before failing, and then breakpoint the
fialing system call, that will get you debugging intop the cygwin1.dll
itself.
#3 I'd do the following, if the code where mine.

1) rather than arrays of data for per thread stuff use pthread_key
2) LOCK and UNLOCK sparingly rather than mostly LOCKED. Just lock around
printf's and other global behaviour.

Rob
 
> 
> Wesel
> --
> 
> Please do not feed me Twinkies
> 
> --
> Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
> Bug reporting:         http://cygwin.com/bugs.html
> Documentation:         http://cygwin.com/docs.html
> FAQ:                   http://cygwin.com/faq/
> 


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/