Mail Archives: cygwin/2014/11/21/05:15:26
--2z7AKWNQ4hR/M4ga
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Nov 20 21:22, Mikulas Patocka wrote:
> > Never mind that. I can fix your testcase by calling _my_tls.remove wit=
h=20
> > INFINITE as parameter in both places. If I drop one of them, your=20
> > testcase will invariable fail at one point. With both INFINITE params=
=20
> > in place, your testcase is now running half an hour without problems.
>=20
> For me, this change doesn't fix the testcase, it just reduces the=20
> probability that it hangs.
>=20
> With this change, the testcase still locks up, but with a different=20
> stacktrace:
> thread1:
> Sleep
> _yield
> pthread::create
> sigdelayed ??
> _cygwin_exit_return ??
> _cygtls::call2
>=20
> thread2:
> SetEvent
> muto::release
> init_cygheap::find_tls
> _cygtls::init_thread
>=20
> thread3:
> WriteFile
> sig_send
> timer_thread
> cygthread::callfunc
> cygthread::stub
> _cygtls::call2
>=20
> thread4:
> VirtualFree=20
> thread_wrapper
>=20
> thread5:
> only ntdll stuff
Do you use a DLL built with optimization by any chance? I wouldn't
take the backtraces too serious in that case. For debugging it helps
a lot to use a Cygwin DLL built without -O2. Btw., are you testing
on 32 or 64 bit? I'm testing on 64 bit.
I can't reproduce your backtrace, but I can reproduce another one, which
is related to thread_exit. At one point after a couple thousand runs
through your testcase I have a variable number of threads hanging in
thread_exit, and a timer thread which is unable to send its signal. the
other threads all hang in thread_exit, waiting for a muto which is taken
by a thread which doesn't exist anymore. That's a very serious downside
of the muto implementation not being able to recognize being abandoned.
I wonder if that shouldn't be using a real OS mutex.
As a sidenote, the snapshot doesn't work well in other scenarios, too,
apparently. Yaakov reported hangs in KDE :(
> > Thinking about it, the fact that _cygtls::remove allows to apply a=20
> > non-INFINITE wait is rather strange, isn't it? Calling remove_tls with=
=20
> > a 0 wait, it allows to return the function silently, without actually=
=20
> > having removed the thread from the list. This is bound to go downhill=
=20
> > at one point and looks like a kludge to me to circumvent some potential=
=20
> > hang in another situation...
>=20
> Looking at CVS history, the "wait" argument was added to cygtls.cc versio=
n=20
> 1.2 with a comment: "Add a 'wait' argument to control how long we wait fo=
r=20
^^^
Wow. So that's really old, more than 10 years.
> > Other than that, there's certainly some room for improvement. Calling=
=20
> > threadlist[idx]->remove from the find_tls exception handler looks=20
> > extremly hairy to me. I wonder if that should be called at all at this=
=20
> > point, or if there shouldn't be better some "simplified" removal=20
> > operation which doesn't require the _cygtls pointer. If the thread=20
> > doesn't exist anymore, so does its _cygtls area.
>=20
> I suggest to remove that exception handler at all. This thing can't ever=
=20
> work reliably - it could reduce probability of crashes but not eliminate=
=20
> them. Even if we handled the page fault correctly - what happens if some=
=20
> other thread allocates a different object at the location that belonged t=
o=20
> the tls before? - then find_tls thinks that this different object is tls=
=20
> and corrupts it.
My point exactly. AFAICS, the problem is that the cygtls area of a
thread is on the thread's own stack. While this looks neat in the first
place, and works fine in most scenarios, the problem is that it gets
destroyed by the OS as soon as the thread exits. So there's a chance
that another thread using the cygtls area of this thread (the signal
thread for instance) may end up with pointers into nirvana or, as you
point out, space taken for completely different tasks.
In the short term it's impossible to fix this thoroughly I guess,
because this requires a very careful overhaul of the cygtls handling.
What we need is a cygtls area which is created at thread start, but
which can be locked in memory as long as it's required by any thread.
Some synchronization is required.
Corinna
--=20
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
--2z7AKWNQ4hR/M4ga
Content-Type: application/pgp-signature
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAEBAgAGBQJUbxCmAAoJEPU2Bp2uRE+gWJwP/A5suC9p86vAo/Gir/SQmI8g
QiW/8U6zDW+MUJxyZZaFWJO1fTW9e1LAGDcd4J6QpoMisSvSAI7JWYWgTyYZ+ehv
dxEqtIHm5wzVwQ5uTDMx8iKGPWsO5Ma3yb4Lw8tinJMsKfpcwneFedOJusExXGVe
dSI2/Su9zSym9NK7QZeaCYdcAQU82/xXQO3U6VRTdLMDS+BhjvCCm0FG3jeOaClu
BmggXuDXaSph0eBEVMTxWH4IZjjRW/Ip4GMKPBEsBuVh92DG+6VUnSyP26HjY8ej
QA8zrqcfJk9qKABv70v3q6fbbBpDA7NMR0QNNUgJX7GIPFyA0nMcAcyUp47VvNKN
lbvpMK/34vAAffNgvZytmtDJHcRhzczRVW3R1irKq8Tl5Iq+6fc6PvDk0md7aMjU
x9XNdvu791jhw3dkONzWRWUisR5au4fNkXk+ehZAvV0BtGHw5zYEIR4XgYNjQLHw
tGsokkJfopfpmymIlZgyEPwf4TCyMeK5qx9XZ/XQwc+/28eLgsqrmgTfdkk7KTFP
Im++HYwGeccSmVuNp6o3nP1kAy1aruIYlZRnGYvzc2CxPifiyyTbjtTd5Xlxkovn
hwFlWi9rEEfQMLCGDLiFa8bmhLi34J+NrDZVGdL5Sdhwjtfr2irGC+ydNVgxJtPC
KDBIHltCyoWCEnesOvq+
=8utw
-----END PGP SIGNATURE-----
--2z7AKWNQ4hR/M4ga--
- Raw text -