Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Message-ID: <424D5C64.5050706@smousseland.com> Date: Fri, 01 Apr 2005 16:36:20 +0200 From: Vincent Dedun User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: ipc, sockets and windows sp2 References: <424D0232 DOT 5060305 AT smousseland DOT com> <20050401090414 DOT GD7415 AT cygbert DOT vinschen DOT de> <424D2B0B DOT 8000604 AT smousseland DOT com> <20050401121143 DOT GD1471 AT cygbert DOT vinschen DOT de> In-Reply-To: <20050401121143.GD1471@cygbert.vinschen.de> Content-Type: multipart/mixed; boundary="------------030508050500010108040108" --------------030508050500010108040108 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Corinna Vinschen wrote : >>So I hope you wouldn't mind I attached a short testing program you can >>easily compil with gcc to reproduce the bug. >> >> > >Cool, that's exactly what I was asking for. I was immediately able to >reproduce the problem and it turned out, that on fork() the socket >duplication from parent to child process for some reason occupied space >in the child, which in the parent is occupied by the shared memory returned >by shmat. > >Consequentially the duplication of the shared memory couldn't occupy the >same address as in the parent. That's a fatal error so the forked child >terminated itself with error 487, which basically means "Invalid address". > >I've changed fork() so that the shared memory is duplicated before sockets >are duplicated, which is ok because sockets don't have special requirements >for memory addresses. That works fine for me, but it would be good if you >could test the next snapshot, which I just uploaded, nevertheless. > >It's just incredible that nobody found this problem before. > > Yes, I find this incredible as any unix server which use IPC (instead of threads for exemple), will wants to support multiple connections at a time so use this mechanisms. I doubt that we're the only ones to use shared memory, socket and multi-process !! Anyway, BIG THANKS to have resolved the problem so quickly. I recompiled from the cygwin cvs, and it solved my problem, my master now runs well. However, there is still a problem, sorry ;) This time with semaphores (either part of IPC). It's less important for me as the master can runs without them, but it's better to have them. So i updated the test case to see what happens. I added semaphore lock/release function that I call in the child process, so each child want to lock before accepting connection and released when connection is finished. For one child, it is ok, but starting second child, the semaphore lock operation (semop() with sem_flg=SEM_UNDO and sem_op=-1) makes cygserver hangs ! Then I get "lost connection to cygserver" errors from my process, plus some "error getting signal_arrived to server(6)" from cygserver process. So, instead of waiting for semaphore release (semval to go back from 0 to 1), semop returns even if the semaphore is locked, then the program continues like the semaphore was unlocked, but it is still locked. moreover, sem value is decremented at each semaphore_lock call, so it get -1 value at third call, where we want it to have either 0 for locked and 1 for unlocked. Then it stops here as cygserver is hanged, no more news from next childs (I set 10 child in the exemple). under osx for exemple, you see the first child locking the semaphore, then all childs wait for the semaphore to be released (semop wait for releasing), and semaphore value is 1 then 0. I hope this will help, thank you again for your fix. Vincent PS: the same conditions as previous ones apply to this test (windows version, cygwin dll contains your update on fix_shm_after_fork). ------------------------ --------------030508050500010108040108 Content-Type: text/plain; name="fork-ipc-sem.c" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="fork-ipc-sem.c" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define USE_IPC #define USE_SEM //define BIND_AFTER_FORK #define BUFFERLEN 256 struct database { int shmid; int semid; int test1; int test2; } *wdb; int get_shared_memory(char *path_key) { key_t key; int shmid; int shmflg; char file[BUFFERLEN]; snprintf(file, BUFFERLEN-1, "%s.exe", path_key); if ((key = ftok(file, 'Z')) == -1) { perror("Getting key for shared memory"); exit(1); } shmflg = IPC_CREAT|0600; if ((shmid = shmget(key, sizeof(struct database), shmflg)) == -1) { perror ("Getting shared memory"); exit(1); } fprintf(stderr,"shmid: %i\n", shmid); return (shmid); } int get_semaphores(char *path_key) { key_t key; int semid; struct sembuf op; int semflg; char file[BUFFERLEN]; snprintf(file, BUFFERLEN-1, "%s.exe", path_key); if ((key = ftok(file, 'Z')) == -1) { perror("Getting key for semaphores"); exit(1); } semflg = IPC_CREAT|0600; if ((semid = semget(key, 1, semflg)) == -1) { perror("Getting semaphores"); exit(1); } if (semctl(semid, 0, SETVAL, 1) == -1) { perror("semctl SETVAL -> 1"); exit(1); } if (semctl(semid, 0, GETVAL) == 0) { op.sem_num = 0; op.sem_op = 1; op.sem_flg = 0; if (semop(semid, &op, 1) == -1) { perror("semaphore_release"); exit(1); } } fprintf(stderr,"semval: %i semid: %i\n", semctl (semid, 0, GETVAL), semid); return (semid); } void *attach_shared_memory(int shmid) { void *rv; // return value if ((rv = shmat(shmid, 0, 0)) == (void *) -1) { perror("shmat"); return ((void *) -1); } return (rv); } int detach_shared_memory(void *shmaddr) { int rv; // return value if ((rv = shmdt(shmaddr)) == -1) { perror("shmdt"); return (-1); } return (rv); } void set_signal_handlers (void) { struct sigaction ignore; ignore.sa_handler = SIG_IGN; sigemptyset(&ignore.sa_mask); ignore.sa_flags = 0; sigaction(SIGHUP, &ignore, NULL); // So we keep running as a daemon } int get_socket(short port) { int sfd; //socket file descriptor struct sockaddr_in addr; int opt; opt = 1; sfd = socket(PF_INET, SOCK_STREAM, 0); if (sfd == -1) { perror("socket"); exit(1); } else { if (setsockopt(sfd, SOL_SOCKET, SO_REUSEADDR, (int *) &opt, sizeof(opt)) == -1) perror ("setsockopt"); addr.sin_family = AF_INET; addr.sin_port = htons(port); addr.sin_addr.s_addr = htonl(INADDR_ANY); if (bind(sfd, (struct sockaddr *) &addr, sizeof (addr)) == -1) { perror("bind"); sfd = -1; } else { listen (sfd, 5); } } return (sfd); } int accept_socket (int sfd, struct sockaddr_in *addr) { int fd; int len = sizeof(struct sockaddr_in); if ((fd = accept(sfd, (struct sockaddr *) addr, &len)) == -1) { perror("Accepting connection\n"); exit(1); } return (fd); } void semaphore_lock(int semid) { struct sembuf op; op.sem_num = 0; op.sem_op = -1; op.sem_flg = SEM_UNDO; fprintf(stderr,"Locking... semval: %i semid: %i\n",semctl (semid,0,GETVAL),semid); if (semop(semid, &op, 1) == -1) { perror("semaphore_lock"); printf("%i\n",errno); exit(0); } fprintf(stderr,"Locked !!! semval: %i semid: %i\n",semctl (semid,0,GETVAL),semid); } void semaphore_release(int semid) { struct sembuf op; fprintf(stderr,"Unlocking... semval: %i semid: %i\n",semctl (semid,0,GETVAL),semid); op.sem_num = 0; op.sem_op = 1; op.sem_flg = SEM_UNDO; if (semop(semid, &op, 1) == -1) { perror ("semaphore_release"); printf("%i\n",errno); exit(0); } fprintf(stderr,"Unlocked !!! semval: %i semid: %i\n",semctl (semid,0,GETVAL),semid); } int main(int argc, char *argv[]) { int sfd; // socket file descriptor int csfd; // child sfd, the socket once accepted int shmid; // shared memory id int semid; // semaphore id struct sockaddr_in addr; // Address of the remote host pid_t child; pid_t child_wait; int n_children; int rc; // Return code int i; // For loops n_children = 0; set_signal_handlers(); #ifdef USE_IPC shmid = get_shared_memory(argv[0]); semid = get_semaphores(argv[0]); if ((wdb = attach_shared_memory(shmid)) == (void *) -1) exit (1); wdb->shmid = shmid; wdb->semid = semid; #endif #ifndef BIND_AFTER_FORK if ((sfd = get_socket(1234)) == -1) exit(0); #endif printf ("Waiting for connections...\n"); while (1) { if (n_children < 10) { if ((child = fork()) == 0) { #ifdef BIND_AFTER_FORK if ((sfd = get_socket(1234)) == -1) exit(0); #endif #ifdef USE_SEM semaphore_lock(wdb->semid); #endif if ((csfd = accept_socket(sfd, &addr)) != -1) { close(sfd); // handle connection here close(csfd); } else perror("Accepting connection\n"); #ifdef USE_SEM semaphore_release(wdb->semid); #endif exit(0); } else if (child != -1) n_children++; else perror("Forking\n"); } else { if ((child_wait = wait (&rc)) != -1) n_children--; } } exit(0); } --------------030508050500010108040108 Content-Type: text/plain; name="fork-ipc-sem.out" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="fork-ipc-sem.out" shmid: 65536 semval: 1 semid: 65536 Waiting for connections... Locking... semval: 1 semid: 65536 Locked !!! semval: 0 semid: 65536 Locking... semval: 0 semid: 65536 13 [main] a 2468 transport_layer_pipes::connect: lost connection to cygserver, error = 2 Locked !!! semval: -1 semid: 65536 10 [main] a 4120 transport_layer_pipes::connect: lost connection to cygserver, error = 2 7 [main] a 1092 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 4616 transport_layer_pipes::connect: lost connection to cygserver, error = 2 8 [main] a 4844 transport_layer_pipes::connect: lost connection to cygserver, error = 2 11 [main] a 4024 transport_layer_pipes::connect: lost connection to cygserver, error = 2 15 [main] a 4596 transport_layer_pipes::connect: lost connection to cygserver, error = 2 8 [main] a 4368 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 4448 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 3800 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 2212 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 5192 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 588 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 5876 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 4940 transport_layer_pipes::connect: lost connection to cygserver, error = 2 7 [main] a 2304 transport_layer_pipes::connect: lost connection to cygserver, error = 2 4 [main] a 6080 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 1488 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 4076 transport_layer_pipes::connect: lost connection to cygserver, error = 2 10 [main] a 2980 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 4152 transport_layer_pipes::connect: lost connection to cygserver, error = 2 6 [main] a 1836 transport_layer_pipes::connect: lost connection to cygserver, error = 2 6 [main] a 3660 transport_layer_pipes::connect: lost connection to cygserver, error = 2 7 [main] a 5408 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 4720 transport_layer_pipes::connect: lost connection to cygserver, error = 2 10 [main] a 460 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 5444 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 1752 transport_layer_pipes::connect: lost connection to cygserver, error = 2 4 [main] a 1944 transport_layer_pipes::connect: lost connection to cygserver, error = 2 8 [main] a 5796 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 2928 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 5068 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 1096 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 4156 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 3720 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 5992 transport_layer_pipes::connect: lost connection to cygserver, error = 2 9 [main] a 5052 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 3424 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 364 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 4360 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 4440 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 5548 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 3832 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 2756 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 5148 transport_layer_pipes::connect: lost connection to cygserver, error = 2 9 [main] a 3880 transport_layer_pipes::connect: lost connection to cygserver, error = 2 5 [main] a 4356 transport_layer_pipes::connect: lost connection to cygserver, error = 2 8 [main] a 5836 transport_layer_pipes::connect: lost connection to cygserver, error = 2 --------------030508050500010108040108 Content-Type: text/plain; charset=us-ascii -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/ --------------030508050500010108040108--