Xref: news2.mv.net comp.lang.c:54956 comp.lang.c++:70553 comp.os.msdos.djgpp:759 gnu.g++.help:4488 gnu.gcc.help:5692
From: bde AT zeta DOT org DOT au (Bruce Evans)
Newsgroups: comp.lang.c,comp.lang.c++,gnu.gcc.help,gnu.g++.help,comp.os.msdos.djgpp
Subject: Re: float != float and floats as return types
Date: 1 Feb 1996 23:38:13 +1100
Organization: Kralizec Dialup Unix
Lines: 92
Message-ID: <4eqc7l$ugh@godzilla.zeta.org.au>
References: <4ej9lb$mpc AT fu-berlin DOT de> <4elnjj$er4 AT server2 DOT rz DOT uni-leipzig DOT de>
NNTP-Posting-Host: godzilla.zeta.org.au
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp

In article <4elnjj$er4 AT server2 DOT rz DOT uni-leipzig DOT de>,
Steffen Winterfeldt <wfeldt AT physik DOT uni-leipzig DOT de> wrote:
>Axel Thimm (axl AT zedat DOT fu-berlin DOT de) wrote:
>: Hello,
>: I am getting confused, about how C/C++ manage float binary operations,
>: in particular multiplication. The next C++ example gives me surprising
>: results:
>: 	*** cut here: begin file t_prec.cc
>: 	#include <iostream.h>
>: 	#include <iomanip.h>
>: 	#include <math.h>
>: 	float quad( float );
>: 	int main() {
>: 	  for( int i=0; i<10; ++i ) {
>: 	    float a, b, c;
>: 	    a = i/13.123123;
>: 	    b = a*a;
>: 	    c = quad(a);
>: 	    cout << (b - c) << '\t';
>: 	    cout << (b - a*a) << '\t';
>: 	    cout << (c - quad(a)) << '\n';
>: 	  }
>: 	  return 0;
>: 	}
>: 	float quad( float x ) { return x*x; }
>: [...]
>
>(c - quad(a)) is not zero, because quad's return value is in a floating point
>register and so has higher precision than c.

Wrong.

gcc's machine description for the i386 bogusly says that the result of a
(float * float) calculation has float precision.  This is only true if
the ambient precision is 24 bits or if -msoft-float is used.  Because of
this, gcc omits the conversions that it would do if it had the correct
precision (type) information.  It clips the extra precision for
assignments from double variables or long double variables to float
variables and for returning double or long double values from functions
that return float, even if the value is originally in a floating point
register and could end up in the same register.  The ANSI standard is
ambiguous about whether these conversions must be done.  Anyway, gcc
sometimes skips them not because it chooses the most convenient
interpretation of the standard, but because of the bogus machine
description.

(c - quad(a)) is nonzero because gcc decided to spill exactly one of the
operands to memory.  Because it has the wrong precision (type) information,
it spills to a float variable.  The spilled operand ends up correct and
the unspilled value ends up wrong.

>BTW, with higher optimization (say, -O3), even the second column becomes zero.

This is wrong too.  The second column should never be zero if the
ambient precision is double or long double and the assignment to b
converts to float precision.  The higher optimization level probably
allows the value of b to be kept in a register so that it doesn't happen
to get clipped to float precision for the wrong reasons.

My standard example of the problem is:
---
#include <assert.h>
#include <float.h>
#include <stdio.h>

int main(void)
{
    float volatile a, b, c;

    /*
     * i386.md has a bogus addsf3 pattern (the result has long double
     * precision, not float precision, unless the ambient precision is
     * float), so some casts are elided.  All the variables have to be
     * volatile so that the calculations don't get done at compile time.
     * The compile-time calculations are correct and the whole test
     * would get optimized away.
     */
    a = 1.0;
    b = FLT_EPSILON / 4.0;
    c = a + b;
    assert(c == (float) (a + b));
    return 0;
}
---

If the ambient precision is 64 bits, as it is under Linux, then this
problem also afflicts double precision vs. long double precision.

Fixing this problem for the i386 would have litte effect other than slowing
down most floating point code :-(.
-- 
Bruce Evans  bde AT zeta DOT org DOT au