Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com From: "Dave Korn" To: Subject: Oh dear, pthreads and stdio still not mt-safe :-( Date: Wed, 7 Jan 2004 17:47:31 -0000 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0000_01C3D546.5355BA20" Message-ID: X-OriginalArrivalTime: 07 Jan 2004 17:47:32.0031 (UTC) FILETIME=[5368CCF0:01C3D546] ------=_NextPart_000_0000_01C3D546.5355BA20 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi everyone (and Arash in particular!), Re: my earlier message [at http://sources.redhat.com/ml/cygwin/2004-01/msg00072.html ] Well, I thought the latest snapshot had solved my problem with stdio getting messed up by threads, but there's still a bug in there somewhere. Recall my original testcase: main spawns two threads, each of which does nothing apart from spit out single chars to stdout at regular intervals. The foreground then waits for a crlf from stdin (using scanf) and set a global flag that causes the threads to exit. Well, with the latest cygwin dll snapshot, that testcase works fine - or in any case, it stopped giving SEGVs. But it turns out to be an accident of the fixed and deterministic timing relationships between the output from the various threads. When I modified the testcase ever so slightly, to randomly vary the delay between spitting out chars in the spawned threads, it breaks again. It doesn't crash, but the output gets mangled and corrupted. Here's an example: mttest1.exe is my original testcase, with intervals of 313ms and 257ms in the threads that print 'W' and 'R' respectively; in mttest2.exe, both threads output chars at random intervals between 250ms - 506ms. Now watch them run: ---snip--- dk AT mace /test/mt-test/test1> ./mttest1 Press return/enter to terminate.....Thread #1 enters tf1... Thread #2 enters tf2... RWRWRWRWRRW ****AFTER SCANF RLeaving thread func2! WLeaving thread func1! exit ok exit 2 ok thread exticodes $ 0x1 $ 0x2 ---snip--- dk AT mace /test/mt-test/test1> ./mttest1 Press return/enter to terminate.....Thread #1 enters tf1... Thread #2 enters tf2... RWRWRWRWRRWRWRWRWRWRRWRWRWRWRRWRWRWRWRWRRWRWRWRWRWR ****AFTER SCANF RLeaving thread func2! WLeaving thread func1! exit ok exit 2 ok thread exticodes $ 0x1 $ 0x2 ---snip--- OK, that's the non-random version working fine. Now here's the one with random delays. Note that as the code doesn't seed the prng, the sequence is the same every time. Note also that rand_r isn't exported in cygwin.din, so I had to snarf the code directly from the cygwin/newlib source... :) I believe this omission may be accidental, n'est-ce pas? ---snip--- dk AT mace /test/mt-test/test1> ./mttest2.exe Press return/enter to terminate.....WRWRWRWRRRRWRWWRWRRW... ****AFTER SCANFs tf2... RLeaving thread func2!WLeaving thread func1! exit ok exit 2 ok thread exticodes $ 0x1 $ 0x2 ---snip--- dk AT mace /test/mt-test/test1> ./mttest2.exe Press return/enter to terminate.....Thread #1 enters tf1... Thread #2 enters tf2... WRWRWRWRRRRWRWWR ****AFTER SCANF WRLeaving thread func2! Leaving thread func2! exit ok exit 2 ok thread exticodes $ 0x1 $ 0x2 ---snip--- dk AT mace /test/mt-test/test1> ./mttest2.exe Press return/enter to terminate.....Thread #1 enters tf1... Thread #2 enters tf2... WRWRWRWRRRRWRWWR ****AFTER SCANF WRLeaving thread func2! Leaving thread func2! exit ok exit 2 ok thread exticodes $ 0x1 $ 0x2 ---snip--- dk AT mace /test/mt-test/test1> ./mttest2.exe Press return/enter to terminate.....Thread #1 enters tf1... Thread #2 enters tf2... WRWRWRWRRRRWRWWRWR ****AFTER SCANF RLeaving thread func2! WLeaving thread func1! exit ok exit 2 ok thread exticodes $ 0x1 $ 0x2 ---snip--- dk AT mace /test/mt-test/test1> ./mttest2.exe Press return/enter to terminate.....Thread #1 enters tf1... Thread #2 enters tf2... WRWRWRWRRRRWRWWR ****AFTER SCANF WRLeaving thread func2! Leaving thread func2! exit ok exit 2 ok thread exticodes $ 0x1 $ 0x2 ---snip--- dk AT mace /test/mt-test/test1> ./mttest2.exe Press return/enter to terminate.....Thread #1 enters tf1... Thread #2 enters tf2... WRWRWRWR ****AFTER SCANF RLeaving thread func2! LLeaving thread func1! exit ok exit 2 ok thread exticodes $ 0x1 $ 0x2 ---snip--- As you can see, during the first run it outputs a spurious cursor-up character and starts overwriting earlier output. In the second run the "Leaving thread func1" message gets dropped and the "Leaving thread func2" message is duplicated - or perhaps just the '1' gets overwritten by a spurious '2'. In the third and fifth runs the same problem occurs as in the second run; it's completely reproducible by pressing CR just as the WR characters line up under the word 'enters' in the "Thread #2 enters..." message. Only the fourth run appears correct. In the sixth run, there's a duplicated L at the start of the "Leaving thread func1" message: probably one of the 'W' chars got turned to an L. And oh dear even more. I've just seen my original testcase fail, so I guess even that one wasn't completely fixed after all. Notice the spurious cursor-up again: ---snip--- dk AT mace /test/mt-test/test1> ./mttest1.exe Press return/enter to terminate.....RWread #1 enters tf1... ****AFTER SCANFs tf2... RLeaving thread func2! WLeaving thread func1! exit ok exit 2 ok thread exticodes $ 0x1 $ 0x2 dk AT mace /test/mt-test/test1> ./mttest1.exe Press return/enter to terminate.....RWRWRWR#1 enters tf1... ****AFTER SCANFs tf2... WLeaving thread func1! exit ok RLeaving thread func2! exit 2 ok thread exticodes $ 0x1 $ 0x2 dk AT mace /test/mt-test/test1> ---snip--- This post is long enough as it is, so I've omitted my cygcheck output for the moment, since it's the same as the one in my last post (url at top of this post), apart from the fact I'm now using the snapshot cygwin1-20040103.dll. Anyway, that all seems to show there's definitely still a bug in there. I'm pretty sure it's not down to me having accidentally used any non-mt-safe functions or otherwise not obeyed the C and POSIX specs, and I hope the reproducibility of these results might help whoever's working in that area track down the problem, but I'm pretty stumped myself. Any clues / hints / suggestions gratefully received. cheers, DaveK ------=_NextPart_000_0000_01C3D546.5355BA20 Content-Type: application/octet-stream; name="makefile" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="makefile" # Configurable flags for compilation CFLAGS?=3D-O0 -g -D_MT -D_REENTRANT LFLAGS?=3D-lm -lpthread ALLFLAGS?=3D-Wall all: mttest1.exe mttest2.exe mttest1.exe: mttest1.c gcc $(CFLAGS) $(LFLAGS) $(ALLFLAGS) mttest1.c -o mttest1.exe mttest2.exe: mttest1.c gcc -DRANDOM $(CFLAGS) $(LFLAGS) $(ALLFLAGS) mttest1.c -o mttest2.exe ------=_NextPart_000_0000_01C3D546.5355BA20 Content-Type: text/plain; name="mttest1.c" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="mttest1.c" #include #include #include #include #include #include #include #include static volatile int thread_exit =3D 0; int thread_delay (unsigned int time_ms) { usleep (time_ms * 1000); return 0; } int rand_r (unsigned int *seed) { long k; long s =3D (long)(*seed); if (s =3D=3D 0) s =3D 0x12345987; k =3D s / 127773; s =3D 16807 * (s - k * 127773) - 2836 * k; if (s < 0) s +=3D 2147483647; (*seed) =3D (unsigned int)s; return (int)(s & RAND_MAX); } void * thread_func1 (void *arg) { int n; #ifdef RANDOM unsigned int randseed1 =3D 0xf00dface; #endif n =3D (int)arg; fprintf (stdout, "Thread #%d enters tf1...\n", n); fflush (stdout); while (1) { #ifdef RANDOM thread_delay (250 + (rand_r (&randseed1) * 256 / RAND_MAX)); #else thread_delay (313); #endif fprintf (stdout, "W"); fflush (stdout); if (thread_exit) break; } fprintf (stdout, "Leaving thread func1!\n"); fflush (stdout); return (void *) 1UL; } void * thread_func2 (void *arg) { int n; #ifdef RANDOM unsigned int randseed2 =3D 0xf00dface; #endif n =3D (int)arg; fprintf (stdout, "Thread #%d enters tf2...\n", n); while (1) { #ifdef RANDOM thread_delay (250 + (rand_r (&randseed2) * 256 / RAND_MAX)); #else thread_delay (257); #endif fprintf (stdout, "R"); fflush (stdout); if (thread_exit) break; } fprintf (stdout, "Leaving thread func2!\n"); fflush (stdout); return (void *) 2UL; } int main (int argc, const char **argv) { int rv; pthread_t thr1, thr2; void *tr1, *tr2; // spawn two threads..... rv =3D pthread_create (&thr1, NULL, thread_func1, (void *)1UL); if (rv) fprintf (stderr, "err %d create thr1\n", errno); rv =3D pthread_create (&thr2, NULL, thread_func2, (void *)2UL); if (rv) fprintf (stderr, "err %d create thr2\n", errno); fflush (stderr); // Only actually run if both threads started ok! if (!rv) while (1) { // Control thread: wait for user input and execute it char dummy[8]; fprintf (stdout, "Press return/enter to terminate....."); fflush (stdout); // fflush (stdin); // scanf ("%*[^\r\n]%1[\r\n]", &dummy[0]); scanf ("%1c", &dummy[0]); fprintf (stderr, "****AFTER SCANF\n"); // never loop... this is just a test after all.... break; } thread_exit =3D 1; rv =3D pthread_join (thr1, &tr1); if (rv) fprintf (stderr, "pthr join errno1 %d\n", errno); else fprintf (stderr, "exit ok\n"); rv =3D pthread_join (thr2, &tr2); if (rv) fprintf (stderr, "pthr join errno1 %d\n", errno); else fprintf (stderr, "exit 2 ok\n"); fprintf (stderr, "thread exticodes $%8p $%8p\n", tr1, tr2); return 0; } ------=_NextPart_000_0000_01C3D546.5355BA20 Content-Type: text/plain; charset=us-ascii -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/ ------=_NextPart_000_0000_01C3D546.5355BA20--