Mailing-List: contact cygwin-help AT sourceware DOT cygnus DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT sources DOT redhat DOT com Delivered-To: mailing list cygwin AT sources DOT redhat DOT com Date: Sat, 04 Aug 2001 18:50:32 -0700 From: Wesel Subject: SIGILL with pthreads and sockets To: cygwin AT cygwin DOT com Message-id: <3B6CA668.86B5E8C0@pacbell.net> MIME-version: 1.0 X-Mailer: Mozilla 4.75 [en] (Win98; U) Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 7bit X-Accept-Language: en,ja And bears, oh my. I've been trying to make some very simple proxy server software (just an Advertisement filter) and not having much luck. I managed to track the problem down to the function select() and those for resolving host names. When used in child threads, these functions somehow render the system unstable so that later, during an innocuous program statement, a SIGILL is raised. I looked up SIGILL and it said either it was caused by either a hardware problem or a problem with the compiler itself. I'm hoping desperately that it's not a hardware error as the only thing that sucks worse than my modem is my financial standing. We're talking student level poor, folks. :) My only solace lies in a compiler error, and the generous aid of you all. Anyway, here's the code that produces a SIGILL. By commenting out the "#define THREADED" line, I made all errors go away (as well as any semblance of speed or efficiency). Could somebody wiser in the ways of cygwin please tell me if I'm running up against some unforseen or little-known compiler problem? I'm really at a loss why it doesn't work. Some notes before the code. 1) Most of the time my threads run synchronously, only one running at a time. Code I put between the UNLOCK and the LOCK macros consists of blocking functions, select, gethostname, and such. I repeat, most of the time the thread is LOCKed. Outside of between the UNLOCK and LOCK macros, shared resources can not be accessed at the same time. 2) My mysterious lock_function is a rather lame cludge that checks to make sure I'm not calling pthread_mutex_lock twice in a row in a thread. 3) threadcounter is a local variable for each thread. It is initialized to a constantly incrementing g_threadcounter, so every thread has a unique number starting from 0 which is the main thread, and continuing 1 through TEST_SET which are the child threads. I could have used pthread_self(), but where's the fun in reading hexadecimal anyway? :) 4) The hosts string array is not intended to infringe upon any copyrights, being that it's as many URLs as I could think up in 5 minutes. Please don't sue me, Disney. File: test.cpp --- #include #include #include #include #include //for all our possible error message #include #include #include #include //for watching thread locks #define THREADED #define TRUE 1 struct SocketInfo { SocketInfo() {} SocketInfo(const SocketInfo& sp) { address = sp.address; socket = sp.socket; } sockaddr_in address; int socket; const char* host; }; void Test4(void); void* Test4Thread(void*); int main(int argc, char* argv[]) { try { Test4(); } catch(int error) { printf("Feep! %s\n",strerror(error)); } catch(...) { puts("Feeperific!"); throw; } return 0; } #ifdef THREADED pthread_mutex_t popcorn = PTHREAD_MUTEX_INITIALIZER; void lock_function(int which, int inc) { static map lock_test; lock_test[which] += inc; if(lock_test[which]>1) { printf("Feep! %d thread locked itself twice!\n", which); exit(0); } if(lock_test[which]<0) { printf("Feep! %d thread unlocked itself twice!\n", which); exit(0); } } #define LOCK { \ pthread_mutex_lock(&popcorn); \ lock_function(threadcounter, 1); \ printf("%d> Lock\n",threadcounter); \ } #define UNLOCK { \ printf("%d> Unlock\n",threadcounter);\ lock_function(threadcounter, -1);\ pthread_mutex_unlock(&popcorn); \ } #else #define LOCK #define UNLOCK #endif #define DESTPORT 80 int g_threadcounter = 0; int sock_size = sizeof(sockaddr_in); #define TEST_SET 15 //Make 10 connections. SocketInfo dest[TEST_SET]; pthread_t t_id[TEST_SET]; char* hosts[TEST_SET] = { "transform.to", "integral.org", "www.google.com", "altavista.com", "208.180.232.33", "www.gamefaqs.com", "204.71.200.74", "www.pokemon.com", "www.disney.com", "216.218.194.6", "www.ucdavis.edu", "www.cnet.com", "www.gnu.org", "www.landfield.com", "216.200.16.61"}; int yes = 1; void* Test4thread(void* arg) { int threadcounter = ++g_threadcounter; LOCK; SocketInfo& dest = *((SocketInfo*) arg); hostent* hp = NULL; try { printf("Thread #%d starting!\n",threadcounter); bzero((char*) dest.address.sin_zero, 8); // zero the rest of the struct dest.address.sin_family = AF_INET; // host byte order dest.address.sin_port = htons(DESTPORT); // short, network byte order UNLOCK; printf("Thread #%d resolving!\n",threadcounter); printf("Resolving %s...\n", dest.host); dest.address.sin_addr.s_addr = inet_addr(dest.host); if(dest.address.sin_addr.s_addr == (unsigned)-1) { //host is not an IP address. Attempt to resolve... hp = gethostbyname(dest.host); if (hp) { printf("%d> Host! %s\n", threadcounter, hp->h_name); dest.address.sin_family = hp->h_addrtype; bcopy(hp->h_addr, (caddr_t)&dest.address.sin_addr, hp->h_length); } else { printf("Unknown host %s\n", dest.host); return arg; } } printf("Thread #%d done resolving.\n",threadcounter); LOCK; int right_fd = connect(dest.socket,(sockaddr*) &dest.address,sock_size); if(right_fd == -1) { puts("Destination would not connect!"); printf("%s %d\n", _sys_errlist[errno], threadcounter); return arg; } fd_set sockfd; timeval timeout; timeout.tv_sec = 1; timeout.tv_usec = 0; FD_ZERO(&sockfd); FD_SET(dest.socket,&sockfd); printf("Thread #%d selecting!\n",threadcounter); UNLOCK; if(select(dest.socket+1, &sockfd, NULL, NULL, &timeout)<0) { printf("%d> Feep! %s", threadcounter, strerror(errno)); return arg; } LOCK; printf("Thread #%d done selecting!\n",threadcounter); } catch(...) { puts("Feeperdeep"); } close(dest.socket); UNLOCK; return arg; } void Test4(void) { setbuf(stdout,NULL); int threadcounter = g_threadcounter; int i = 0; for(i = 0; i < TEST_SET; i++) { dest[i].host = hosts[i]; if ((dest[i].socket = socket(AF_INET, SOCK_STREAM, 0)) == -1) { perror("socket"); exit(1); } if (setsockopt(dest[i].socket,SOL_SOCKET,SO_REUSEADDR,&yes,sizeof(int)) == -1) { perror("setsockopt"); exit(1); } } LOCK; //Wait for it... for(i = 0; i < TEST_SET; i++) { t_id[i] = new pthread_t; #ifdef THREADED pthread_create (t_id + i, NULL, Test4thread, (void*) (dest + i)); #else Test4thread((void*) (dest + i)); #endif } UNLOCK; //Go! for(i = 0; i < TEST_SET; i++) { #ifdef THREADED pthread_join(t_id[i],NULL); #endif LOCK; printf("Thread %d joined.\n",i+1); UNLOCK; delete t_id[i]; } threadcounter--; puts("This is after threads."); return; } --- Get the picture? Basically it creates a bunch of sockets, pairs them up with a host name in my SocketInfo structure, then has 10 baby threads resolve the host names, connect the sockets and wait for readable data. Since I'm connecting at the HTTP port, there will never be readable data until I send "GET somefile.html" or something. Therefore, the select functions all timeout, then the threads clean up and exit harmlessly. Well maybe not so harmlessly. After thread #6 prints "Thread #6 Done selecting" and UNLOCKS, a SIGILL happens. It's always thread #6 in GDB. Thread #9 when not using GDB. I don't know why. It happens sometime while returning from the lock_function function, according to GDB. I'd appreciate it if someone would tell me if this is a problem with cygwin itself, or with the nut on the end of the keyboard. Wesel -- Please do not feed me Twinkies -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/