X-pop3-spooler: POP3MAIL 2.1.0 b 3 961213 -bs- Delivered-To: pcg AT goof DOT com Message-ID: <19980312074656.14562@cerebro.laendle> Date: Thu, 12 Mar 1998 07:46:56 +0100 From: Marc Lehmann To: andrewc AT rosemail DOT rose DOT hp DOT com Cc: beastium Subject: paranoia & extra precision [was -fno-float-store in pgcc] References: <199803111756 DOT AA192209001 AT typhoon DOT rose DOT hp DOT com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88 In-Reply-To: <199803111756.AA192209001@typhoon.rose.hp.com>; from Andrew Crabtree on Wed, Mar 11, 1998 at 09:56:40AM -0800 X-Operating-System: Linux version 2.1.85 (root AT cerebro) (gcc version pgcc-2.91.06 980129 (gcc-2.8.0 release)) Status: RO Content-Length: 2747 Lines: 71 [cc sent to beastium-list] The recent postings about paranoia and fpu result mismatches with optimized/unoptimized compilations need some clarifications, I hope to provide it now ;) > Marc - > > What do you make of the following code. PGCC produces different > results when optimizing then when not optimization. I was > told it has to do with -fno-float-store, but pgcc doesn't appear the x86 chips are not really ieee compliant. that's not too serious, as I'll explain: > y = 1.0; > x = 1.0 + y; > oldx = 1.0; > do > { > y /= 2.0; > oldx = x; > x = 1.0 + y; > } while ( x < oldx ); gcc options value of y -O0 5.551115e-17 -O 2.710505e-20 -O -ffloat-store 5.551115e-17 let's analyze that loop: when will it stop? it will stop when x !< x + y, where y is 1.0, 0.5, 0.25 etc... (continously halved). "double"'s have 53 bit's mantissa (only 52 are stored). 2^53 ~ 10^15, i.e. we will have roughly 16 digits precision. This is why the loop stops at y = 5*10^-17. This is compliant to the ieee rule that intermediate values have to be at the same precision as the result. Now, when optimizing, gcc tries to keep values in the floating point registers. But other than, say, the m68k fpu, the x86 fpu has no notion of different data types, all fpu registers have the same type, long double (80 bits extended format). 80 bits extended format => 64 bits mantissa => ~19 digits precision. This is why, in the second example, you get the 2*10^-20. This is fast, more accurate than needed, but not IEEE compliant. When you specify -ffloat-store, you force gcc to store _every_ _intermediate_ _value_ in memory, as to truncate the value to appropriate precision (just like in an unoptimized program). This is why we, again, get the correct answer with "-O -ffloat-store". As all of you correctly guessed, this is helplessly slow, but the only way to force correct behaviour. In most cases, this is of no concern, since programs generally appreciate the extra precision they get, this is only of concern to programs that specifically need the exact (less precise) representation, like paranoia, which get's really confused when "double" seems to have different precision depending on the surrounding code. Any questions left? Don't hesitate - Ask! -----==- | ----==-- _ | ---==---(_)__ __ ____ __ Marc Lehmann +-- --==---/ / _ \/ // /\ \/ / pcg AT goof DOT com |e| -=====/_/_//_/\_,_/ /_/\_\ --+ The choice of a GNU generation | |