X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-2.5 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,RP_MATCHES_RCVD,SPF_HELO_PASS X-Spam-Check-By: sourceware.org X-IronPortListener: Outbound_SMTP X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AiIJAPqeqlCcKEep/2dsb2JhbABFhiC7MYIBB4IgAQQSERFXAQgaAgYgAgQdExURAQQbGodrnnmETooFkmiBIo8MMmEDjTGZDoJvghk From: "Lavrentiev, Anton (NIH/NLM/NCBI) [C]" To: "cygwin AT cygwin DOT com" Subject: RE: Possible race in SYSV IPC (semaphores) Date: Mon, 19 Nov 2012 21:06:15 +0000 Message-ID: <5F8AAC04F9616747BC4CC0E803D5907D012856@MLBXV09.nih.gov> Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by delorie.com id qAJL71mD017062 Hello again, I can now positively confirm the race condition in cygserver w.r.t. the named pipe used to serialize SYSV requests through the server. The race is due to that transport_layer_pipes::accept() (bool *const recoverable) (file: transport_pipes.cc) does actually _create_ the pipe when pipe_instance == 0 (ironically, transport_layer_pipes::listen() does not create any OS primitives at all!). This means that under heavy load, cygserver threads may all end up processing their requests and closing all instances of the pipe (bringing pipe_instance == 0) yet not being able to get to the point of accepting new request (that is, to re-create the pipe). For the client (user process), this looks like the pipe does not exist (during that very tiny period of time), and the following message gets printed: Iteration 3016 1 [main] a 4872 transport_layer_pipes::connect: lost connection to cygserver, error = 2 Note "error 2" for ERROR_FILE_NOT_FOUND. The error is so subtle it is very difficult to reproduce with the default number of threads in cygserver, 10 (although, not impossible, esp. on a moderately-to-heavily loaded system). The error is not due to deadlocking (reported earlier today) -- the server remains afloat coming to CreateNamedPipe() just a jiffy too late, it's the client that suffers from the pipe disappearance. Deadlocking of the logging cygserver seems to remain a yet another issue. Anton Lavrentiev Contractor NIH/NLM/NCBI P.S. The code to reveal the race is the very same test program posted today (stripped of the BUGx conditionals): #include #include #include #include #include #define SEMKEY 1 union semun { int val; /* value for SETVAL */ struct semid_ds *buf; /* buffer for IPC_STAT, IPC_SET */ unsigned short *array; /* array for GETALL, SETALL */ }; static void doCriticalWork(void) { return; } int main(void) { struct sembuf lock[2]; int n, semid; if ((semid = semget(SEMKEY, 2, IPC_CREAT | 0666)) < 0) { perror("semget(IPC_CREATE)"); return 1; } lock[0].sem_num = 0; lock[0].sem_op = 0; lock[0].sem_flg = IPC_NOWAIT; lock[1].sem_num = 0; lock[1].sem_op = 1; lock[1].sem_flg = SEM_UNDO; if (semop(semid, lock, 2) != 0) { perror("semop(LOCK[0])"); return 1; } for (n = 0; ; ++n) { static const union semun arg = { 0 }; int error; printf("Iteration %d\n", n); lock[0].sem_num = 1; lock[0].sem_op = 0; /* precondition: [1] == 0 */ lock[0].sem_flg = 0; lock[1].sem_num = 1; lock[1].sem_op = 1; /* postcondition: [1] == 1 */ lock[1].sem_flg = SEM_UNDO; if (semop(semid, lock, 2) < 0) { error = errno; fprintf(stderr, "semop(LOCK[1]): %d, %s\n", error, strerror(error)); break; } doCriticalWork(); lock[0].sem_num = 1; lock[0].sem_op = -1; lock[0].sem_flg = SEM_UNDO | IPC_NOWAIT; if (semop(semid, lock, 1) < 0) { error = errno; fprintf(stderr, "semop(UNLOCK[1]): %d, %s\n", error, strerror(error)); break; } } return 1; }