Mail Archives: pgcc/1998/03/12/17:22:20
Marc Lehmann wrote:
>
> [cc sent to beastium-list]
>
> The recent postings about paranoia and fpu result mismatches with
> optimized/unoptimized compilations need some clarifications, I hope to
> provide it now ;)
>
> > Marc -
> >
> > What do you make of the following code. PGCC produces different
> > results when optimizing then when not optimization. I was
> > told it has to do with -fno-float-store, but pgcc doesn't appear
>
> the x86 chips are not really ieee compliant. that's not too serious, as I'll
> explain:
>
> > y = 1.0;
> > x = 1.0 + y;
> > oldx = 1.0;
> > do
> > {
> > y /= 2.0;
> > oldx = x;
> > x = 1.0 + y;
> > } while ( x < oldx );
>
> gcc options value of y
> -O0 5.551115e-17
> -O 2.710505e-20
> -O -ffloat-store 5.551115e-17
>
> let's analyze that loop:
>
> when will it stop? it will stop when x !< x + y, where y is 1.0, 0.5, 0.25
> etc... (continously halved). "double"'s have 53 bit's mantissa (only 52 are
> stored). 2^53 ~ 10^15, i.e. we will have roughly 16 digits precision.
>
> This is why the loop stops at y = 5*10^-17. This is compliant to the ieee
> rule that intermediate values have to be at the same precision as the
> result.
>
> Now, when optimizing, gcc tries to keep values in the floating point
> registers. But other than, say, the m68k fpu, the x86 fpu has no notion of
> different data types, all fpu registers have the same type, long double (80
> bits extended format). 80 bits extended format => 64 bits mantissa => ~19
> digits precision.
>
> This is why, in the second example, you get the 2*10^-20. This is fast, more
> accurate than needed, but not IEEE compliant.
>
> When you specify -ffloat-store, you force gcc to store _every_ _intermediate_
> _value_ in memory, as to truncate the value to appropriate precision (just
> like in an unoptimized program). This is why we, again, get the correct
> answer with "-O -ffloat-store".
>
> As all of you correctly guessed, this is helplessly slow, but the only way
> to force correct behaviour.
There seems to be annother way, but involes the unducumented functions
__getfpucw and __setfpucw. By them the fpu-behaviour can be changed,
including internal rounding an SIGFPE generation. See
/usr/include/i386/fpu_control.h for a short information. The following
code works at least for gcc272 an may work for pgcc as well:
#include <stdio.h>
#include <i386/fpu_control.h>
extern unsigned short __getfpucw ( );
void main ( )
{ double y, x, oldx;
unsigned short u;
u = __getfpucw ();
u &= 0xFEFF;
(void) __setfpucw ( u );
y = 1.0;
x = 1.0 + y;
oldx = 1.0;
do
{
y /= 2.0;
oldx = x;
x = 1.0 + y;
} while ( x < oldx );
printf ("%g\n", y);
}
btw. in constrast to the info in fpu_control.h, floating point
exceptions are disabled by default.
Thomas
--
Thomas Koehler
Philips Research Laboratories
Division Technical Systems
Roentgenstrasse 24-26
D-22335 Hamburg
Germany
phone: +40/5078-2103
e-mail: T DOT Koehler AT PFH DOT Research DOT Philips DOT com
- Raw text -