X-pop3-spooler: POP3MAIL 2.1.0 b 3 961213 -bs- Delivered-To: pcg AT goof DOT com Message-ID: <35081A31.16F7D911@pfh.research.philips.com> Date: Thu, 12 Mar 1998 18:24:01 +0100 From: Thomas Koehler Organization: Philips X-Mailer: Mozilla 3.01 (X11; I; Linux 2.0.32 i686) MIME-Version: 1.0 To: Marc Lehmann Subject: Re: paranoia & extra precision [was -fno-float-store in pgcc] References: <199803111756 DOT AA192209001 AT typhoon DOT rose DOT hp DOT com> <19980312074656 DOT 14562 AT cerebro DOT laendle> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: Marc Lehmann Status: RO X-Status: A Content-Length: 3106 Lines: 110 Marc Lehmann wrote: > > [cc sent to beastium-list] > > The recent postings about paranoia and fpu result mismatches with > optimized/unoptimized compilations need some clarifications, I hope to > provide it now ;) > > > Marc - > > > > What do you make of the following code. PGCC produces different > > results when optimizing then when not optimization. I was > > told it has to do with -fno-float-store, but pgcc doesn't appear > > the x86 chips are not really ieee compliant. that's not too serious, as I'll > explain: > > > y = 1.0; > > x = 1.0 + y; > > oldx = 1.0; > > do > > { > > y /= 2.0; > > oldx = x; > > x = 1.0 + y; > > } while ( x < oldx ); > > gcc options value of y > -O0 5.551115e-17 > -O 2.710505e-20 > -O -ffloat-store 5.551115e-17 > > let's analyze that loop: > > when will it stop? it will stop when x !< x + y, where y is 1.0, 0.5, 0.25 > etc... (continously halved). "double"'s have 53 bit's mantissa (only 52 are > stored). 2^53 ~ 10^15, i.e. we will have roughly 16 digits precision. > > This is why the loop stops at y = 5*10^-17. This is compliant to the ieee > rule that intermediate values have to be at the same precision as the > result. > > Now, when optimizing, gcc tries to keep values in the floating point > registers. But other than, say, the m68k fpu, the x86 fpu has no notion of > different data types, all fpu registers have the same type, long double (80 > bits extended format). 80 bits extended format => 64 bits mantissa => ~19 > digits precision. > > This is why, in the second example, you get the 2*10^-20. This is fast, more > accurate than needed, but not IEEE compliant. > > When you specify -ffloat-store, you force gcc to store _every_ _intermediate_ > _value_ in memory, as to truncate the value to appropriate precision (just > like in an unoptimized program). This is why we, again, get the correct > answer with "-O -ffloat-store". > > As all of you correctly guessed, this is helplessly slow, but the only way > to force correct behaviour. There seems to be annother way, but involes the unducumented functions __getfpucw and __setfpucw. By them the fpu-behaviour can be changed, including internal rounding an SIGFPE generation. See /usr/include/i386/fpu_control.h for a short information. The following code works at least for gcc272 an may work for pgcc as well: #include #include extern unsigned short __getfpucw ( ); void main ( ) { double y, x, oldx; unsigned short u; u = __getfpucw (); u &= 0xFEFF; (void) __setfpucw ( u ); y = 1.0; x = 1.0 + y; oldx = 1.0; do { y /= 2.0; oldx = x; x = 1.0 + y; } while ( x < oldx ); printf ("%g\n", y); } btw. in constrast to the info in fpu_control.h, floating point exceptions are disabled by default. Thomas -- Thomas Koehler Philips Research Laboratories Division Technical Systems Roentgenstrasse 24-26 D-22335 Hamburg Germany phone: +40/5078-2103 e-mail: T DOT Koehler AT PFH DOT Research DOT Philips DOT com