From: buers AT gmx DOT de (Dieter Buerssner) Newsgroups: comp.os.msdos.djgpp Subject: [long] gcc performance and possible bug Date: 6 Mar 2000 22:25:39 GMT Lines: 152 Message-ID: <8a1b91$33j7m$1@fu-berlin.de> NNTP-Posting-Host: u-214.frankfurt3.ipdial.viaginterkom.de (62.180.18.214) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Trace: fu-berlin.de 952381539 3263734 62.180.18.214 (16 [17104]) X-Posting-Agent: Hamster/1.3.13.0 User-Agent: Xnews/03.02.04 To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com With the attached code, I get very wierd performance results. I tested the code with gcc 2952 and binutils 295 (djgpp203), with gcc 2952 and binutils 281 (djgpp202) and with gcc 260 and binutils 251 (djgpp1x) under plain DOS and in a WIN98 DOS window with the compiler options -O, -O2 and -O3. In the following table, the first number is for function mwc32, the second number for function mwc32c. usec/call (plain DOS) -O -O2 -O3 djgpp203 0.027 0.027 0.023 0.193 0.030 0.030 djgpp202 0.027 0.224 0.026 0.224 0.030 0.029 djgpp1x 0.070 0.250 0.080 0.236 0.053 0.239 usec/call (WIN98) -O -O2 -O3 djgpp203 0.027 0.027 0.023 0.197 0.030 0.030 djgpp202 0.027 0.227 0.027 0.227 0.030 0.030 djgpp1x 0.070 0.250 0.081 0.240 0.053 0.244 You will note, that there sometimes is almost an order of magnitude difference between the performance of mwc32 and mwc32c. The only difference between these functions is the type of the variable mul (static unsigned long vs. static const unsigned long). mwc32c is always slower, when there is a significant performance difference. I tested djgpp203 more thoroughly. In this case, -O and -O3 seem to result in the same performance. But with minor changes in the source code, I also got this order of magnitude difference with -O and -O3. On linux, with gcc 2952 and binutils 295 I get consistanty 0.027 usec/call for mwc32 and mwc32c. This code seems also to trigger a bug in gcc 2952. Please look at the following sample output: D:\RAND>gcc -O -Wall mwc32tst.c D:\RAND>a mwc32: s=3051870873, used 3.626 CPU seconds 0.02702 usec/call mwc32c: s=3051870873, used 3.571 CPU seconds 0.02661 usec/call D:\RAND>gcc -O2 -Wall mwc32tst.c D:\RAND>a (null): s=3051870873, used 3.077 CPU seconds 0.02292 usec/call (null): s=3051870873, used 25.934 CPU seconds 0.19322 usec/call ^^^^^^ With -O3, everything works again. I get the (null) also under linux. I do not get the (null), when compiling with gcc260. This all was tested with a AMD K6-2. Can you reproduce my wierd results? Is the some stupid bug in my code? Regards, Dieter /* mwc32tst.c */ #include #include #include unsigned long speed_loop(unsigned long (*tr)(void), unsigned long n) { unsigned long s; s = 0; do s+=tr(); while (--n != 0); return s; } /* test the speed of function tr, take function call and loop overhead into account */ void speed(unsigned long (*tr)(void), unsigned long (*dummy)(void), unsigned long n, const char *description) { clock_t anf, anfdum; unsigned long s; anfdum = clock(); speed_loop(dummy, n); anfdum = clock() - anfdum; anf = clock(); s = speed_loop(tr, n); anf = clock() - anf; anf -= anfdum; printf("%10s: s=%lu, used %.3f CPU seconds %.5f usec/call\n", description, s, (double)anf/CLOCKS_PER_SEC, 1e6/n*(double)anf/CLOCKS_PER_SEC); } #define CALLS (1UL << 27) /* Tune this as appropriate */ /* avoid inlining of these functions */ unsigned long dum_rand(void); unsigned long mwc32(void); unsigned long mwc32c(void); int main(void) { speed(mwc32, dum_rand, CALLS, "mwc32"); speed(mwc32c, dum_rand, CALLS, "mwc32c"); return 0; } /* dummy function, for comparision */ unsigned long dum_rand(void) { return 0UL; } typedef unsigned long long ul64; /* Two implemantations of the multiply with carry RNG. The only difference is the type of mul */ static ul64 zseed = ((ul64)0x12345678UL<<32) | 0x87654321UL; unsigned long mwc32(void) { unsigned long l1, l2; ul64 res; static unsigned long mul=999996864UL; l1 = (unsigned long)(zseed & 0xffffffffUL); l2 = zseed>>32; res = l2+l1*(ul64)mul; zseed = res; return (unsigned long)(res & 0xffffffffUL); } static ul64 zseedc = ((ul64)0x12345678UL<<32) | 0x87654321UL; unsigned long mwc32c(void) { unsigned long l1, l2; ul64 res; static const unsigned long mul=999996864UL; l1 = (unsigned long)(zseedc & 0xffffffffUL); l2 = zseedc>>32; res = l2+l1*(ul64)mul; zseedc = res; return (unsigned long)(res & 0xffffffffUL); }