From: "John Carter" Organization: Dpt Water Affairs & Forestry (IWQS) To: djgpp AT delorie DOT com Date: Thu, 1 Feb 1996 08:16:57 +0200 Subject: Re: binary representation of floats Message-ID: <20E61D84C99@dwaf-hri.pwv.gov.za> Greetings, >What I want to do is to get the binary representation of a float >or double (the way these numbers are stored in memory). From my little Intel i486 programmers reference manual.... The i486 processor represents real numbers of the form :- (-1)^s 2^E (b_0 . b_1 b_2 b_3 .. b_(p-1)) where :- s is the sign bit and = 0 or 1 E = any integer between Emin and Emax inclusive b_i = 0 or 1 p = the number of bits of precision. The i486 stores real numbers in a three field binary format that resembles scientific or exponential notation. The format consists of the following fields: * The number's significant digits are held in the significand field, b_0 . b_1 b_2 b_3 ... b_(p-1) * The exponent field e = E + bias locates the binary point within the significant digits. * 1 bit sign field. The i486 usually carries the digits of the significand in normalized form that except for the value 0, the significand contains an integer bit and fractions bits like so :- 1.fff...f By normalizing the number, the integer bit is always one, so the i486 does NOT actually store this one in single and double precision formats. (However, it is physically always present in the extended format.) In order to simplify the comparing of real numbers the i486 stores the exponents in a biased form. This means that a constant is added to the true exponent. The bias varies according number format, and is chosen to force the biased exponent to always be positive. This allows to real numbers of the same format and sign to be compared as if they were unsigned binary integers. A numbers true exponent is found by subtracting the bias. While the number is in the FPU it is always in extended precision, only when stored in memory is it changed to single or double. Parameter | Format | Single | Double | Extended Width | 32 | 64 | 80 p | 24 | 53 | 64 Exponent Width | 8 | 11 | 15 Emax | 127 | 1023 |+16383 Emin |-126 | -1022 |-16382 Exp Bias |+127 | +1023 |+16383 The order in the formats is always sign as most significant bit, (Bit 31 for single / 63 for double...), then biased exponent, then significand. The least significant bit of the significand is always bit 0. As far as I can know, DJGPP maps float to intel 32 bit single precision. double to intel 64 bit double precision. Calculations are done on the FPU in the 80 bit precision whether you are dealing with floats or doubles. The results are truncated whenever they are stored in memory. (You may find that you get better precision out of your calculations if you optimize -O3, as there would be fewer truncations.) As far as I know, GCC can't store extended precision numbers. (How about a "long double" type folks?) GCC does have a nice long long int, which as seems to be a software emulated 64 bit int. I don't think this uses the intel FPU 64 bit long int, but I'm open for correction on that one. Maybe a wee bit of disassembly will tell us. John Carter Institute for Water Quality Studies. Department of Water Affairs. Internet : ece AT dwaf-hri DOT pwv DOT gov DOT za Phone : 27-12-808-0374x194 Fax : 27-12-808-0338 [Host for Afwater list server] Founder of the Council for Unnatural Scientists.