X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; q=dns; s= default; b=kdJQSpbjgF1rSLopSHLjV3g8Cx2HnlUS6SZ35yRZushHaSV7gtiUq eSEwZfU9zXOURtzo6SE0G/pNLkJgqd1tBF+8suYVivAVBkAIIxBgnPD+pYq8hJY9 RQCQ55f4Yif8EADuL1WkQokJ3w93IXGhDF6I34RU//RTL45N3/iSLY= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; s=default; bh=M8O+grziFycu/rKq5WELXpbjL7E=; b=Qm2uyN1NXS5TM/JzJ39c6gEeaKHl alnTrgVVvueo7B+/7SX97rLS67nnYgdV1Gm6Cm1PRqHIP/OwX4Cl3N44NZMGdCEP aR8/KNJg1D6k4o+cDvtMmmZCVIaU6SAS+eldiHYS/EaF2BF3v6fZE5yQ+G5563sQ XrsQRu7N8Uvsgag= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-5.9 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.2 X-HELO: calimero.vinschen.de Date: Fri, 21 Nov 2014 11:15:02 +0100 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: Instability with signals and threads Message-ID: <20141121101502.GA3810@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="2z7AKWNQ4hR/M4ga" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) --2z7AKWNQ4hR/M4ga Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Nov 20 21:22, Mikulas Patocka wrote: > > Never mind that. I can fix your testcase by calling _my_tls.remove wit= h=20 > > INFINITE as parameter in both places. If I drop one of them, your=20 > > testcase will invariable fail at one point. With both INFINITE params= =20 > > in place, your testcase is now running half an hour without problems. >=20 > For me, this change doesn't fix the testcase, it just reduces the=20 > probability that it hangs. >=20 > With this change, the testcase still locks up, but with a different=20 > stacktrace: > thread1: > Sleep > _yield > pthread::create > sigdelayed ?? > _cygwin_exit_return ?? > _cygtls::call2 >=20 > thread2: > SetEvent > muto::release > init_cygheap::find_tls > _cygtls::init_thread >=20 > thread3: > WriteFile > sig_send > timer_thread > cygthread::callfunc > cygthread::stub > _cygtls::call2 >=20 > thread4: > VirtualFree=20 > thread_wrapper >=20 > thread5: > only ntdll stuff Do you use a DLL built with optimization by any chance? I wouldn't take the backtraces too serious in that case. For debugging it helps a lot to use a Cygwin DLL built without -O2. Btw., are you testing on 32 or 64 bit? I'm testing on 64 bit. I can't reproduce your backtrace, but I can reproduce another one, which is related to thread_exit. At one point after a couple thousand runs through your testcase I have a variable number of threads hanging in thread_exit, and a timer thread which is unable to send its signal. the other threads all hang in thread_exit, waiting for a muto which is taken by a thread which doesn't exist anymore. That's a very serious downside of the muto implementation not being able to recognize being abandoned. I wonder if that shouldn't be using a real OS mutex. As a sidenote, the snapshot doesn't work well in other scenarios, too, apparently. Yaakov reported hangs in KDE :( > > Thinking about it, the fact that _cygtls::remove allows to apply a=20 > > non-INFINITE wait is rather strange, isn't it? Calling remove_tls with= =20 > > a 0 wait, it allows to return the function silently, without actually= =20 > > having removed the thread from the list. This is bound to go downhill= =20 > > at one point and looks like a kludge to me to circumvent some potential= =20 > > hang in another situation... >=20 > Looking at CVS history, the "wait" argument was added to cygtls.cc versio= n=20 > 1.2 with a comment: "Add a 'wait' argument to control how long we wait fo= r=20 ^^^ Wow. So that's really old, more than 10 years. > > Other than that, there's certainly some room for improvement. Calling= =20 > > threadlist[idx]->remove from the find_tls exception handler looks=20 > > extremly hairy to me. I wonder if that should be called at all at this= =20 > > point, or if there shouldn't be better some "simplified" removal=20 > > operation which doesn't require the _cygtls pointer. If the thread=20 > > doesn't exist anymore, so does its _cygtls area. >=20 > I suggest to remove that exception handler at all. This thing can't ever= =20 > work reliably - it could reduce probability of crashes but not eliminate= =20 > them. Even if we handled the page fault correctly - what happens if some= =20 > other thread allocates a different object at the location that belonged t= o=20 > the tls before? - then find_tls thinks that this different object is tls= =20 > and corrupts it. My point exactly. AFAICS, the problem is that the cygtls area of a thread is on the thread's own stack. While this looks neat in the first place, and works fine in most scenarios, the problem is that it gets destroyed by the OS as soon as the thread exits. So there's a chance that another thread using the cygtls area of this thread (the signal thread for instance) may end up with pointers into nirvana or, as you point out, space taken for completely different tasks. In the short term it's impossible to fix this thoroughly I guess, because this requires a very careful overhaul of the cygtls handling. What we need is a cygtls area which is created at thread start, but which can be locked in memory as long as it's required by any thread. Some synchronization is required. Corinna --=20 Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat --2z7AKWNQ4hR/M4ga Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJUbxCmAAoJEPU2Bp2uRE+gWJwP/A5suC9p86vAo/Gir/SQmI8g QiW/8U6zDW+MUJxyZZaFWJO1fTW9e1LAGDcd4J6QpoMisSvSAI7JWYWgTyYZ+ehv dxEqtIHm5wzVwQ5uTDMx8iKGPWsO5Ma3yb4Lw8tinJMsKfpcwneFedOJusExXGVe dSI2/Su9zSym9NK7QZeaCYdcAQU82/xXQO3U6VRTdLMDS+BhjvCCm0FG3jeOaClu BmggXuDXaSph0eBEVMTxWH4IZjjRW/Ip4GMKPBEsBuVh92DG+6VUnSyP26HjY8ej QA8zrqcfJk9qKABv70v3q6fbbBpDA7NMR0QNNUgJX7GIPFyA0nMcAcyUp47VvNKN lbvpMK/34vAAffNgvZytmtDJHcRhzczRVW3R1irKq8Tl5Iq+6fc6PvDk0md7aMjU x9XNdvu791jhw3dkONzWRWUisR5au4fNkXk+ehZAvV0BtGHw5zYEIR4XgYNjQLHw tGsokkJfopfpmymIlZgyEPwf4TCyMeK5qx9XZ/XQwc+/28eLgsqrmgTfdkk7KTFP Im++HYwGeccSmVuNp6o3nP1kAy1aruIYlZRnGYvzc2CxPifiyyTbjtTd5Xlxkovn hwFlWi9rEEfQMLCGDLiFa8bmhLi34J+NrDZVGdL5Sdhwjtfr2irGC+ydNVgxJtPC KDBIHltCyoWCEnesOvq+ =8utw -----END PGP SIGNATURE----- --2z7AKWNQ4hR/M4ga--