delorie.com/archives/browse.cgi   search  
Mail Archives: pgcc/1998/03/12/16:52:25

X-pop3-spooler: POP3MAIL 2.1.0 b 3 961213 -bs-
Delivered-To: pcg AT goof DOT com
Message-ID: <19980312074656.14562@cerebro.laendle>
Date: Thu, 12 Mar 1998 07:46:56 +0100
From: Marc Lehmann <pcg AT goof DOT com>
To: andrewc AT rosemail DOT rose DOT hp DOT com
Cc: beastium <beastium-list AT Desk DOT nl>
Subject: paranoia & extra precision [was -fno-float-store in pgcc]
References: <199803111756 DOT AA192209001 AT typhoon DOT rose DOT hp DOT com>
Mime-Version: 1.0
X-Mailer: Mutt 0.88
In-Reply-To: <199803111756.AA192209001@typhoon.rose.hp.com>; from Andrew Crabtree on Wed, Mar 11, 1998 at 09:56:40AM -0800
X-Operating-System: Linux version 2.1.85 (root AT cerebro) (gcc version pgcc-2.91.06 980129 (gcc-2.8.0 release))
Status: RO
Lines: 71

[cc sent to beastium-list]

The recent postings about paranoia and fpu result mismatches with
optimized/unoptimized compilations need some clarifications, I hope to
provide it now ;)

> Marc - 
> 
> What do you make of the following code.   PGCC produces different
> results when optimizing then when not optimization.  I was
> told it has to do with -fno-float-store, but pgcc doesn't appear

the x86 chips are not really ieee compliant. that's not too serious, as I'll
explain:

>      y = 1.0;
>      x = 1.0 + y;
>      oldx = 1.0;
>      do
>      {
>          y /= 2.0;
>          oldx = x;
>          x = 1.0 + y;
>      } while ( x < oldx );

gcc options		value of y
-O0			5.551115e-17
-O			2.710505e-20
-O -ffloat-store	5.551115e-17

let's analyze that loop:

when will it stop? it will stop when x !< x + y, where y is 1.0, 0.5, 0.25
etc... (continously halved). "double"'s have 53 bit's mantissa (only 52 are
stored). 2^53 ~ 10^15, i.e. we will have roughly 16 digits precision.

This is why the loop stops at y = 5*10^-17. This is compliant to the ieee
rule that intermediate values have to be at the same precision as the
result.

Now, when optimizing, gcc tries to keep values in the floating point
registers. But other than, say, the m68k fpu, the x86 fpu has no notion of
different data types, all fpu registers have the same type, long double (80
bits extended format). 80 bits extended format => 64 bits mantissa => ~19
digits precision.

This is why, in the second example, you get the 2*10^-20. This is fast, more
accurate than needed, but not IEEE compliant.

When you specify -ffloat-store, you force gcc to store _every_ _intermediate_
_value_ in memory, as to truncate the value to appropriate precision (just
like in an unoptimized program). This is why we, again, get the correct
answer with "-O -ffloat-store".

As all of you correctly guessed, this is helplessly slow, but the only way
to force correct behaviour. In most cases, this is of no concern, since
programs generally appreciate the extra precision they get, this is only of
concern to programs that specifically need the exact (less precise)
representation, like paranoia, which get's really confused when "double"
seems to have different precision depending on the surrounding code.

Any questions left? Don't hesitate - Ask!

      -----==-                                              |
      ----==-- _                                            |
      ---==---(_)__  __ ____  __       Marc Lehmann       +--
      --==---/ / _ \/ // /\ \/ /       pcg AT goof DOT com       |e|
      -=====/_/_//_/\_,_/ /_/\_\                          --+
    The choice of a GNU generation                        |
                                                          |

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019