X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:mime-version :content-type; q=dns; s=default; b=bVw6lVFR+qsFx85h1sxNPC9LifPJ4 Z895AcJ7MWIJ1imr9Y6fTuewT5q/PIGJxCs66HhcnqdV21H/vOqQGTQbmGAfFz42 oZIuxjCskp348oOxq6YtxEiX3VMQ/1UYDpMHNqPGcUZYTXKqW8DGr4+67oofLf1k eo37YtJ68ka6pE= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:mime-version :content-type; s=default; bh=swMfN9O/fBM1oSjABBi1r4I7tXI=; b=mxw JjjUpqM2f69fGvvT5PMfZzscdG9VDcbGI4le42tQggT5r5AiRx91HmSPmDWT866t 0ipUAmoOjrWStMCiUK6oZW2SP74mblHZlZ2ptzz0qPf7SHmv+bEwyOoTtF1Gq2ip nLOGwd6gvkn+2CPNZd0XI/0cIzW+wHARQsXTNyA0= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: artax.karlin.mff.cuni.cz Date: Wed, 19 Nov 2014 17:42:02 +0100 (CET) From: Mikulas Patocka To: cygwin AT cygwin DOT com Subject: Instability with signals and threads Message-ID: User-Agent: Alpine 2.02 (DEB 1266 2009-07-14) X-Personality-Disorder: Schizoid MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Hi I have a program that sets a repetitive timer with setitimer and spawns several threads. The program is very unstable on cygwin, it locks up in few minutes. The bug manifests itself in the following way: the signal thread calls cygheap->find_tls to find a thread to deliver the signal to. find_tls generates an exception when scanning the threadlist, jumps to the __except block and calls threadlist[idx]->remove(INFINITE). The method threadlist[idx]->remove is called with invalid "this" pointer (sometimes it is zero, sometimes it points to unmapped memory), generates another exception on "initialized = 0" line and becomes stuck on this assignment. I found out that when I modify the remove_tls method so that it always acquires the lock and removes the thread from the threadlist (change "tls_sentry here(wait)" to "tls_sentry here(INFINITE)"), the bug goes away and the multithreaded program is stable. Alternativelly - the crash can be fixed if we change "_my_tls.remove (0)" to "_my_tls.remove (INFINITE)" in thread_wrapper (though, there is another _my_tls.remove (0) call in dll_entry in winsup/cygwin/init.cc and it could trigger the same crash) I'd like to ask - what's the reason for not waiting for the lock in remove_tls? If the lock is already locked, remove_tls does nothing - but the _cygtls structure is freed anyway, so that there is dangling pointer no the thread list. Do you think that we can drop this "wait" argument and always wait for the lock in remove_tls? Another possible bug - when find_tls exits, it drops the tls_sentry lock and returns the pointer to _cygtls. What happens if the thread owning the tls exits at this point? It seems that there is nothing that prevents it from exiting and that the caller of find_tls (sigpacket::process) will work with a pointer to invalid thread. It seems that we need to add some reference count to _cygtls to prevent it from disappearing while we are trying to send a signal to it. (or keep tls_sentry::lock locked until sigpacket::process is done with the signal, though I don't know if keeping the lock for so long won't cause deadlocks). Mikulas -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple