Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Message-ID: <000001c2a927$b7477f10$0b80b6c7@amr.corp.intel.com> From: "tprinceusa" To: "Mikhail Teterin" Cc: References: <200212211447 DOT gBLElNsH041540 AT corbulon DOT video-collage DOT com> Subject: Re: poor performance -- is Cygwin to blame?.. Date: Sat, 21 Dec 2002 10:22:20 -0800 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0015_01C2A8DA.D878AF80" X-Priority: 3 X-MSMail-Priority: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 ------=_NextPart_000_0015_01C2A8DA.D878AF80 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit ----- Original Message ----- From: "Mikhail Teterin" To: "Timothy C Prince" Cc: Sent: Saturday, December 21, 2002 6:47 AM Subject: Re: poor performance -- is Cygwin to blame?.. > > In my experience with MPI programs, comparing cygwin and linux, > > message passing takes longer under cygwin, but the time may be made up > > elsewhere, if the compilation is truly similar. > > > > You mention that considerable time is spent in log(), pow(), exp() > > but leave us guessing how you implemented them. > > I did not implement them. They are from whatever -lm means on Cygwin. I > use them to compute my own formula repeatedly for hundreds of different > vectors. > > > Then you imply that you think cygwin, rather than your math functions, > > is the speed determining factor, without giving us a means to judge. > > They are not mine. There must be a misunderstanding... > > > The glibc versions of these functions are much faster than the newlib > > versions, particularly if you permit the use of . > > Neither approach the potential of pentium4, but the simplest way to > > speed them up on cygwin is to employ something like , > > and to provide your own pow() (or to use a compiler and library which > > targets pentium4). > > Can this be done with just CFLAGS? I really don't want to pollute my > code with ``#ifdef CYGWIN''... Thank you, > You could simply add #include just ahead of the final #endif in whichever file is active, and supply a mathinline.h (example attached) in the include search path. You could add guards so that the is invoked in accordance with command line flags. See the glibc example for which in-line functions are made to depend on -ffast-math. I have corrected some extreme value cases in my version, so that it may not be as risky as the full glibc -ffast-math version. ------=_NextPart_000_0015_01C2A8DA.D878AF80 Content-Type: application/octet-stream; name="mathinline.h" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="mathinline.h" /* Inline math functions for i387.=0A= Copyright (C) 1995, 1996, 1997, 1998 Free Software Foundation, Inc.=0A= This file is part of the GNU C Library.=0A= Contributed by John C. Bowman , 1995.=0A= =0A= The GNU C Library is free software; you can redistribute it and/or=0A= modify it under the terms of the GNU Library General Public License as=0A= published by the Free Software Foundation; either version 2 of the=0A= License, or (at your option) any later version.=0A= =0A= The GNU C Library is distributed in the hope that it will be useful,=0A= but WITHOUT ANY WARRANTY; without even the implied warranty of=0A= MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU=0A= Library General Public License for more details.=0A= =0A= You should have received a copy of the GNU Library General Public=0A= License along with the GNU C Library; see the file COPYING.LIB. If = not,=0A= write to the Free Software Foundation, Inc., 59 Temple Place - Suite = 330,=0A= Boston, MA 02111-1307, USA. */=0A= =0A= #ifndef __INC_MATH_INLINE=0A= #define __INC_MATH_INLINE=0A= #ifndef __CONCAT=0A= #define __CONCAT(a,b) a##b=0A= #endif=0A= #ifndef __USE_MISC=0A= #define __USE_MISC=0A= #endif=0A= #ifndef __OPTIMIZE__=0A= #define __OPTIMIZE__=0A= #endif=0A= =0A= #ifdef __cplusplus=0A= # define __MATH_INLINE __inline=0A= #else=0A= # define __MATH_INLINE extern __inline=0A= #endif=0A= =0A= =0A= #if defined __USE_ISOC9X && defined __GNUC__ && __GNUC__ >=3D 2=0A= /* ISO C 9X defines some macros to perform unordered comparisons. The=0A= ix87 FPU supports this with special opcodes and we should use them.=0A= These must not be inline functions since we have to be able to handle=0A= all floating-point types. */=0A= # ifdef __i686__=0A= /* For the PentiumPro and more recent processors we can provide=0A= better code. */=0A= # define isgreater(x, y) \=0A= ({ char __result; \=0A= __asm__ ("fucomip %%st(1), %%st; seta %%al" \=0A= : "=3Da" (__result) : "u" (y), "t" (x) : "cc", "st"); \=0A= __result; })=0A= # define isgreaterequal(x, y) \=0A= ({ char __result; \=0A= __asm__ ("fucomip %%st(1), %%st; setae %%al" \=0A= : "=3Da" (__result) : "u" (y), "t" (x) : "cc", "st"); \=0A= __result; })=0A= =0A= # define isless(x, y) \=0A= ({ char __result; \=0A= __asm__ ("fucomip %%st(1), %%st; seta %%al" \=0A= : "=3Da" (__result) : "u" (x), "t" (y) : "cc", "st"); \=0A= __result; })=0A= =0A= # define islessequal(x, y) \=0A= ({ char __result; \=0A= __asm__ ("fucomip %%st(1), %%st; setae %%al" \=0A= : "=3Da" (__result) : "u" (x), "t" (y) : "cc", "st"); \=0A= __result; })=0A= =0A= # define islessgreater(x, y) \=0A= ({ char __result; \=0A= __asm__ ("fucomip %%st(1), %%st; setne %%al" \=0A= : "=3Da" (__result) : "u" (y), "t" (x) : "cc", "st"); \=0A= __result; })=0A= =0A= # define isunordered(x, y) \=0A= ({ char __result; \=0A= __asm__ ("fucomip %%st(1), %%st; setp %%al" \=0A= : "=3Da" (__result) : "u" (y), "t" (x) : "cc", "st"); \=0A= __result; })=0A= # else=0A= /* This is the dumb, portable code for i386 and above. */=0A= # define isgreater(x, y) \=0A= ({ char __result; \=0A= __asm__ ("fucompp; fnstsw; testb $0x45, %%ah; setz %%al" \=0A= : "=3Da" (__result) : "u" (y), "t" (x) : "cc", "st", "st(1)"); \=0A= __result; })=0A= =0A= # define isgreaterequal(x, y) \=0A= ({ char __result; \=0A= __asm__ ("fucompp; fnstsw; testb $0x05, %%ah; setz %%al" \=0A= : "=3Da" (__result) : "u" (y), "t" (x) : "cc", "st", "st(1)"); \=0A= __result; })=0A= =0A= # define isless(x, y) \=0A= ({ char __result; \=0A= __asm__ ("fucompp; fnstsw; testb $0x45, %%ah; setz %%al" \=0A= : "=3Da" (__result) : "u" (x), "t" (y) : "cc", "st", "st(1)"); \=0A= __result; })=0A= =0A= # define islessequal(x, y) \=0A= ({ char __result; \=0A= __asm__ ("fucompp; fnstsw; testb $0x05, %%ah; setz %%al" \=0A= : "=3Da" (__result) : "u" (x), "t" (y) : "cc", "st", "st(1)"); \=0A= __result; })=0A= =0A= # define islessgreater(x, y) \=0A= ({ char __result; \=0A= __asm__ ("fucompp; fnstsw; testb $0x44, %%ah; setz %%al" \=0A= : "=3Da" (__result) : "u" (y), "t" (x) : "cc", "st", "st(1)"); \=0A= __result; })=0A= =0A= # define isunordered(x, y) \=0A= ({ char __result; \=0A= __asm__ ("fucompp; fnstsw; sahf; setp %%al" \=0A= : "=3Da" (__result) : "u" (y), "t" (x) : "cc", "st", "st(1)"); \=0A= __result; })=0A= # endif /* __i686__ */=0A= =0A= /* Test for negative number. Used in the signbit() macro. */=0A= __MATH_INLINE int=0A= __signbitf (float __x)=0A= {=0A= union { float __f; int __i; } __u =3D { __f: __x }; return __u.__i < 0;=0A= }=0A= __MATH_INLINE int=0A= __signbit (double __x)=0A= {=0A= union { double __d; int __i[2]; } __u =3D { __d: __x }; return = __u.__i[1] < 0;=0A= }=0A= __MATH_INLINE int=0A= __signbitl (long double __x)=0A= {=0A= union { long double __l; int __i[3]; } __u =3D { __l: __x };=0A= return (__u.__i[2] & 0x8000) !=3D 0;=0A= }=0A= #endif=0A= =0A= =0A= /* The gcc, version 2.7 or below, has problems with all this inlining=0A= code. So disable it for this version of the compiler. */=0A= #if defined __GNUC__ && (__GNUC__ > 2 || (__GNUC__ =3D=3D 2 && = __GNUC_MINOR__ > 7))=0A= =0A= #if ((!defined __NO_MATH_INLINES || defined = __LIBC_INTERNAL_MATH_INLINES) \=0A= && defined __OPTIMIZE__)=0A= =0A= /* A macro to define float, double, and long double versions of various=0A= math functions for the ix87 FPU. FUNC is the function name (which = will=0A= be suffixed with f and l for the float and long double version,=0A= respectively). OP is the name of the FPU operation. */=0A= =0A= #if defined __USE_MISC || defined __USE_ISOC9X=0A= # define __inline_mathop(func, op) \=0A= __inline_mathop_ (double, func, op) \=0A= __inline_mathop_ (float, __CONCAT(func,f), op) \=0A= __inline_mathop_ (long double, __CONCAT(func,l), op)=0A= #else=0A= # define __inline_mathop(func, op) \=0A= __inline_mathop_ (double, func, op)=0A= #endif=0A= =0A= #define __inline_mathop_(float_type, func, op) \=0A= __inline_mathop_decl_ (float_type, func, op, "0" (__x))=0A= =0A= =0A= #if defined __USE_MISC || defined __USE_ISOC9X=0A= # define __inline_mathop_decl(func, op, params...) \=0A= __inline_mathop_decl_ (double, func, op, params) \=0A= __inline_mathop_decl_ (float, __CONCAT(func,f), op, params) \=0A= __inline_mathop_decl_ (long double, __CONCAT(func,l), op, params)=0A= #else=0A= # define __inline_mathop_decl(func, op, params...) \=0A= __inline_mathop_decl_ (double, func, op, params)=0A= #endif=0A= =0A= #define __inline_mathop_decl_(float_type, func, op, params...) \=0A= __MATH_INLINE float_type func (float_type); \=0A= __MATH_INLINE float_type func (float_type __x) \=0A= { \=0A= float_type __result; \=0A= __asm (op : "=3Dt" (__result) : params); \=0A= return __result; \=0A= }=0A= =0A= =0A= #if defined __USE_MISC || defined __USE_ISOC9X=0A= # define __inline_mathcode(func, arg, code) \=0A= __inline_mathcode_ (double, func, arg, code) \=0A= __inline_mathcode_ (float, __CONCAT(func,f), arg, code) \=0A= __inline_mathcode_ (long double, __CONCAT(func,l), arg, code)=0A= # define __inline_mathcode2(func, arg1, arg2, code) \=0A= __inline_mathcode2_ (double, func, arg1, arg2, code) \=0A= __inline_mathcode2_ (float, __CONCAT(func,f), arg1, arg2, code) \=0A= __inline_mathcode2_ (long double, __CONCAT(func,l), arg1, arg2, code)=0A= # define __inline_mathcode3(func, arg1, arg2, arg3, code) \=0A= __inline_mathcode3_ (double, func, arg1, arg2, arg3, code) \=0A= __inline_mathcode3_ (float, __CONCAT(func,f), arg1, arg2, arg3, code) = \=0A= __inline_mathcode3_ (long double, __CONCAT(func,l), arg1, arg2, arg3, = code)=0A= #else=0A= # define __inline_mathcode(func, arg, code) \=0A= __inline_mathcode_ (double, func, (arg), code)=0A= # define __inline_mathcode2(func, arg1, arg2, code) \=0A= __inline_mathcode2_ (double, func, arg1, arg2, code)=0A= # define __inline_mathcode3(func, arg1, arg2, arg3, code) \=0A= __inline_mathcode3_ (double, func, arg1, arg2, arg3, code)=0A= #endif=0A= =0A= #define __inline_mathcode_(float_type, func, arg, code) \=0A= __MATH_INLINE float_type func (float_type); \=0A= __MATH_INLINE float_type func (float_type arg) \=0A= { \=0A= code; \=0A= }=0A= =0A= #define __inline_mathcode2_(float_type, func, arg1, arg2, code) \=0A= __MATH_INLINE float_type func (float_type, float_type); \=0A= __MATH_INLINE float_type func (float_type arg1, float_type arg2) = \=0A= { \=0A= code; \=0A= }=0A= =0A= #define __inline_mathcode3_(float_type, func, arg1, arg2, arg3, code) \=0A= __MATH_INLINE float_type func (float_type, float_type, float_type); = \=0A= __MATH_INLINE float_type func (float_type arg1, float_type arg2, = \=0A= float_type arg3) \=0A= { \=0A= code; \=0A= }=0A= #endif=0A= =0A= =0A= #if !defined __NO_MATH_INLINES && defined __OPTIMIZE__=0A= /* Miscellaneous functions */=0A= =0A= __inline_mathcode (__sgn, __x, \=0A= return __x =3D=3D 0.0 ? 0.0 : (__x > 0.0 ? 1.0 : -1.0))=0A= =0A= __inline_mathcode (__pow2, __x, \=0A= long double __value; \=0A= long double __exponent; \=0A= __extension__ long long int __p =3D (long long int) __x; \=0A= if (__x =3D=3D (long double) __p) \=0A= { \=0A= __asm \=0A= ("fscale" \=0A= : "=3Dt" (__value) : "0" (1.0), "u" (__x)); \=0A= return __value; \=0A= } \=0A= __asm \=0A= ("fld %%st(0)\n\t" \=0A= "frndint # int(x)\n\t" \=0A= "fxch\n\t" \=0A= "fsub %%st(1) # fract(x)\n\t" \=0A= "f2xm1 # 2^(fract(x)) - 1\n\t" \=0A= : "=3Dt" (__value), "=3Du" (__exponent) : "0" (__x)); \=0A= __value +=3D 1.0; \=0A= __asm \=0A= ("fscale" \=0A= : "=3Dt" (__value) : "0" (__value), "u" (__exponent)); \=0A= return __value)=0A= =0A= #define __sincos_code \=0A= long double __cosr; \=0A= long double __sinr; \=0A= __asm \=0A= ("fsincos\n\t" \=0A= : "=3Dt" (__cosr), "=3Du" (__sinr) : "0" (__x)); \=0A= *__sinx =3D __sinr; \=0A= *__cosx =3D __cosr=0A= =0A= __MATH_INLINE void __sincos (double __x, double *__sinx, double *__cosx);=0A= __MATH_INLINE void=0A= __sincos (double __x, double *__sinx, double *__cosx)=0A= {=0A= __sincos_code;=0A= }=0A= =0A= __MATH_INLINE void __sincosf (float __x, float *__sinx, float *__cosx);=0A= __MATH_INLINE void=0A= __sincosf (float __x, float *__sinx, float *__cosx)=0A= {=0A= __sincos_code;=0A= }=0A= =0A= __MATH_INLINE void __sincosl (long double __x, long double *__sinx,=0A= long double *__cosx);=0A= __MATH_INLINE void=0A= __sincosl (long double __x, long double *__sinx, long double *__cosx)=0A= {=0A= __sincos_code;=0A= }=0A= =0A= =0A= /* Optimized inline implementation, sometimes with reduced precision=0A= and/or argument range. */=0A= =0A= __inline_mathop(rint, "frndint")=0A= =0A= #define __expm1_code \=0A= long double __value; \=0A= long double __exponent; \=0A= long double __temp; \=0A= __asm ("fldl2e" : "=3Dt" (__temp) ); \=0A= __x -=3D .693147178739309310913*(__exponent =3D rintl(__x * __temp)); \=0A= __x =3D (__x-1.82063599850414622404E-09*__exponent)*__temp; \=0A= __asm ("f2xm1" : "=3Dt" (__value) : "0" (__x)); \=0A= __asm ("fscale" : "=3Dt" (__value) : "0" (__value), "u" (__exponent)); = \=0A= __asm ("fscale" : "=3Dt" (__temp) : "0" (1.0), "u" (__exponent)); \=0A= return (__temp - 1) + __value=0A= __inline_mathcode_ (long double, __expm1l, __x, __expm1_code)=0A= =0A= /* generate (exp(x)-1)/2 for use in sinh()=0A= * result ought to be finite up to |x| =3D 11357.2 but it seems to be = unstable=0A= * beyond the limit applied below, producing NaN in division later */=0A= #define __expm1h_code \=0A= long double __value; \=0A= long double __exponent; \=0A= long double __temp; \=0A= if(__x > 11356.8) __x =3D 11356.8; \=0A= __asm ("fldl2e" : "=3Dt" (__temp) ); \=0A= __x -=3D .693147178739309310913*(__exponent =3D rintl(__x * __temp)); \=0A= __x =3D (__x-1.82063599850414622404E-09*__exponent)*__temp; \=0A= __asm ("f2xm1" : "=3Dt" (__value) : "0" (__x)); \=0A= __asm ("fscale" : "=3Dt" (__value) : "0" (__value), "u" = (__exponent-1));\=0A= __asm ("fscale" : "=3Dt" (__temp) : "0" (1.0), "u" (__exponent-1)); \=0A= return (__temp - .5) + __value=0A= __inline_mathcode_ (long double, __expm1h, __x, __expm1h_code)=0A= =0A= =0A= /* extra precision range reduction saves 11 bits in elefunt expl() */=0A= #define __exp_code \=0A= long double __value; \=0A= long double __exponent; \=0A= long double __temp; \=0A= __asm ("fldl2e" : "=3Dt" (__temp) ); \=0A= __x -=3D .693147178739309310913*(__exponent =3D rintl(__x * __temp)); \=0A= __x =3D (__x-1.82063599850414622404E-09*__exponent)*__temp; \=0A= __asm ("f2xm1" : "=3Dt" (__value) : "0" (__x)); \=0A= __asm ("fscale" : "=3Dt" (__value) : "0" (__value+1), "u" = (__exponent)); \=0A= return __value=0A= __inline_mathcode_ (long double, __expl, __x, __exp_code)=0A= __inline_mathcode (exp, __x, if(__x > 20768)__x =3D 20768;return = __expl(__x))=0A= =0A= #define __exp_h_code \=0A= long double __value; \=0A= long double __exponent; \=0A= long double __temp; \=0A= if(__x > 20768) __x =3D 20768; \=0A= __asm ("fldl2e" : "=3Dt" (__temp) ); \=0A= __x -=3D .693147178739309310913*(__exponent =3D rintl(__x * __temp)); \=0A= __x =3D (__x-1.82063599850414622404E-09*__exponent)*__temp; \=0A= __asm ("f2xm1" : "=3Dt" (__value) : "0" (__x)); \=0A= __asm ("fscale" : "=3Dt" (__value) : "0" (__value+1), "u" = (__exponent-1)); \=0A= return __value=0A= __inline_mathcode_ (long double, __exph, __x, __exp_h_code)=0A= =0A= =0A= __inline_mathcode (tan, __x, \=0A= long double __value; \=0A= long double __value2; /* unused */ \=0A= __asm \=0A= ("fptan" \=0A= : "=3Dt" (__value2), "=3Du" (__value) : "0" (__x)); \=0A= return __value)=0A= =0A= =0A= #define __atan2_code \=0A= long double __value; \=0A= __asm \=0A= ("fpatan\n\t" \=0A= : "=3Dt" (__value) : "0" (__x), "u" (__y) : "st(1)"); \=0A= return __value=0A= __inline_mathcode2 (atan2, __y, __x, __atan2_code)=0A= __inline_mathcode2_ (long double, __atan2l, __y, __x, __atan2_code)=0A= __inline_mathcode (atan, __y, return __atan2l(__y,1.0))=0A= =0A= =0A= __inline_mathcode2 (fmod, __x, __y, \=0A= long double __value; \=0A= __asm \=0A= ("1: fprem\n\t" \=0A= "fnstsw %%ax\n\t" \=0A= "sahf\n\t" \=0A= "jp 1b" \=0A= : "=3Dt" (__value) : "0" (__x), "u" (__y) : "ax", "cc"); \=0A= return __value)=0A= =0A= =0A= #if 0=0A= /* while this version is faster and more accurate than the one in = glibc-2.1.2,=0A= * there is no speed gain over separate compilation, and there is a loss = of 9=0A= * significant bits. */=0A= __inline_mathcode2 (pow, __x, __y, \=0A= long double __value; \=0A= long double __exponent; \=0A= __extension__ long long int __p =3D (long long int) __y; \=0A= long double __r; \=0A= if (__x =3D=3D 0.0 && __y > 0.0) \=0A= return __x; \=0A= if(__y =3D=3D __p) \=0A= __r =3D 1; \=0A= else{ \=0A= __asm \=0A= ("fyl2x" : "=3Dt" (__value) : "0" (__x), "u" (__y - __p) : = "st(1)"); \=0A= __value -=3D (__exponent =3D rintl(__value)); \=0A= __asm ( "f2xm1 # 2^(fract(y * log2(x))) - 1\n\t" \=0A= : "=3Dt" (__value) : "0" (__value)); \=0A= __asm ("fscale" \=0A= : "=3Dt" (__r) : "0" (__value + 1), "u" (__exponent)); \=0A= } \=0A= if (__p < 0) \=0A= { \=0A= __p =3D -__p; \=0A= __x =3D 1.0 / __x; \=0A= } \=0A= if (__p & 1) \=0A= __r *=3D __x; \=0A= for ( ; __p > 1; __r *=3D __x) \=0A= do \=0A= __x *=3D __x; \=0A= while(!((__p >>=3D 1) & 1)); \=0A= return __r)=0A= #endif=0A= =0A= =0A= =0A= #if defined __GNUC__ && (__GNUC__ > 2 || __GNUC__ =3D=3D 2 && = __GNUC_MINOR__ >=3D 8)=0A= __inline_mathcode_ (double, fabs, __x, return __builtin_fabs (__x))=0A= __inline_mathcode_ (float, fabsf, __x, return __builtin_fabsf (__x))=0A= __inline_mathcode_ (long double, fabsl, __x, return __builtin_fabsl = (__x))=0A= __inline_mathcode_ (long double, __fabsl, __x, return __builtin_fabsl = (__x))=0A= __inline_mathcode_ (long double, __sqrtl, __x, return __builtin_sqrtl = (__x))=0A= #else=0A= __inline_mathop (fabs, "fabs")=0A= __inline_mathop_ ( double, __fabsl, "fabs")=0A= __inline_mathop_ (long double, __sqrtl, "fsqrt")=0A= #endif=0A= =0A= /* The argument range of this inline version is reduced. */=0A= __inline_mathop (sin, "fsin")=0A= /* The argument range of this inline version is reduced. */=0A= __inline_mathop (cos, "fcos")=0A= =0A= /* this and all other uses of fyl2x have been corrected to include = clobber */=0A= #define __log_code \=0A= long double __value; \=0A= __asm ("fldln2": "=3Dt" (__value)); \=0A= __asm ("fyl2x" : "=3Dt" (__value) : "0" (__x), "u" (__value) : = "st(1)" ); \=0A= return __value=0A= =0A= __inline_mathcode (log, __x, __log_code)=0A= =0A= #define __log10_code \=0A= long double __value; \=0A= __asm ("fldlg2": "=3Dt" (__value)); \=0A= __asm ("fyl2x" : "=3Dt" (__value) : "0" (__x), "u" (__value) : = "st(1)" ); \=0A= return __value=0A= =0A= __inline_mathcode (log10, __x, __log10_code)=0A= =0A= /* sqrt(1-x*x) is good only on the non-IEEE fused multiply-accumulate = arithmetic=0A= * this saves 9 bits in elefunt asinl() but still misses 20% of = achieveable=0A= * correctly rounded results without a net gain in speed over a library=0A= * function following the suggestions of Plauger. */=0A= #if 0=0A= __inline_mathcode (asin, __x, return __atan2l (__x, __sqrtl = ((1-__x)*(1+__x))))=0A= __inline_mathcode (acos, __x, return __atan2l (__sqrtl = ((1-__x)*(1+__x)), __x))=0A= #endif=0A= __inline_mathcode_ (long double, __sgn1l, __x, return __x >=3D 0.0 ? 1.0 = : -1.0)=0A= =0A= /* i286 compatibility is not retained. */=0A= __inline_mathcode (sinh, __x, \=0A= long double __exm1 =3D __expm1h (__x); \=0A= return __exm1*.5 / (__exm1 + .5) + __exm1;)=0A= =0A= __inline_mathcode (cosh, __x, \=0A= long double __ex =3D __exph (__fabsl(__x)); \=0A= return __ex + .25 / __ex);=0A= =0A= /* this corrects the sign returned when x =3D=3D 0 and improves accuracy = and speed.=0A= * Limits in expl() above solve problem of NaN returned for large |__x|.=0A= * Use Chebyshev economized polynomial for small |__x| to improve = accuracy=0A= * and speed, at the expense of code size. */=0A= __inline_mathcode (tanh, __x, \=0A= if(__fabsl(__x) <=3D .34657){ \=0A= long double __x2 =3D __x * __x; \=0A= long double __x4 =3D __x2*__x2; \=0A= return __x + __x2*__x*( \=0A= -0.3333333333333333333028L \=0A= +__x2*(0.133333333333333321200L \=0A= +__x2*-0.5396825396825207695E-01L \=0A= +__x4*(0.218694885360028124E-01L \=0A= +__x2*-0.88632355226515778E-02 \=0A= +__x4*(0.3592127817609080E-02 \=0A= +__x2*-0.14558300258105E-02) \=0A= +__x4*__x4*(0.5899693119329E-03 \=0A= +__x2*-0.238614526828E-03 \=0A= +__x4*(0.9399418484E-04 \=0A= +__x2*-0.294863013E-04)))));} \=0A= return 1 - 2 / (expl(__x + __x) + 1))=0A= =0A= =0A= __inline_mathcode (floor, __x, \=0A= long double __value; \=0A= unsigned short int __cw; \=0A= unsigned short int __cwtmp; \=0A= __asm ("fnstcw %0" : "=3Dm" (__cw)); \=0A= __cwtmp =3D (__cw & 0xf3ff) | 0x0400; /* rounding down */ \=0A= __asm ("fldcw %0" : : "m" (__cwtmp)); \=0A= __asm ("frndint" : "=3Dt" (__value) : "0" (__x)); \=0A= __asm ("fldcw %0" : : "m" (__cw)); \=0A= return __value)=0A= =0A= __inline_mathcode (ceil, __x, \=0A= long double __value; \=0A= unsigned short int __cw; \=0A= unsigned short int __cwtmp; \=0A= __asm ("fnstcw %0" : "=3Dm" (__cw)); \=0A= __cwtmp =3D (__cw & 0xf3ff) | 0x0800; /* rounding up */ \=0A= __asm ("fldcw %0" : : "m" (__cwtmp)); \=0A= __asm ("frndint" : "=3Dt" (__value) : "0" (__x)); \=0A= __asm ("fldcw %0" : : "m" (__cw)); \=0A= return __value)=0A= =0A= #define __ldexp_code \=0A= long double __value; \=0A= __asm \=0A= ("fscale" \=0A= : "=3Dt" (__value) : "0" (__x), "u" ((long double) __y)); \=0A= return __value=0A= =0A= __MATH_INLINE double ldexp (double __x, int __y);=0A= __MATH_INLINE double=0A= ldexp (double __x, int __y)=0A= {=0A= __ldexp_code;=0A= }=0A= =0A= =0A= /* Optimized versions for some non-standardized functions. */=0A= #if defined __USE_ISOC9X || defined __USE_MISC=0A= =0A= __inline_mathcode (expm1, __x, __expm1_code)=0A= =0A= /* We cannot rely on M_SQRT being defined. So we do it for ourself=0A= here. */=0A= # define __M_SQRT2 1.41421356237309504880L /* sqrt(2) */=0A= =0A= __inline_mathcode (log1p, __x, \=0A= long double __value; \=0A= if (__fabsl (__x) >=3D 1.0 - 0.5 * __M_SQRT2) \=0A= return logl (1.0 + __x); \=0A= __asm ("fldln2":"=3Dt" (__value)); \=0A= __asm ("fyl2xp1" : "=3Dt" (__value) : "0" (__x),"u" (__value) : = "st(1)"); \=0A= return __value)=0A= =0A= =0A= /* The argument range of the inline version of asinhl is slightly = reduced. */=0A= __inline_mathcode (asinh, __x, \=0A= long double __y =3D __fabsl (__x); \=0A= __y =3D log1pl (__x * __x / (__sqrtl (__x * __x + 1.0) + 1.0) + __y); = \=0A= return __x >=3D 0 ? __y : -__y)=0A= =0A= __inline_mathcode (acosh, __x, \=0A= return logl (__x + __sqrtl (__x - 1.0) * __sqrtl (__x + 1.0)))=0A= =0A= __inline_mathcode (atanh, __x, \=0A= return 0.5 * log1pl (2 * ( __x / (1.0 + __fabsl (__x)))))=0A= =0A= /* The argument range of the inline version of hypotl is slightly = reduced. */=0A= __inline_mathcode2 (hypot, __x, __y, return __sqrtl (__x * __x + __y * = __y))=0A= =0A= __inline_mathcode(logb, __x, \=0A= long double __value; \=0A= long double __junk; \=0A= __asm \=0A= ("fxtract\n\t" \=0A= : "=3Dt" (__junk), "=3Du" (__value) : "0" (__x)); \=0A= return __value)=0A= =0A= #endif=0A= =0A= #ifdef __USE_ISOC9X=0A= __inline_mathcode (log2, __x, \=0A= long double __value; \=0A= __asm__ ("fyl2x" : "=3Dt" (__value) : "0" (__x), "u" (1.0) : = "st(1)"); \=0A= return __value)=0A= =0A= __MATH_INLINE float ldexpf (float __x, int __y);=0A= __MATH_INLINE float=0A= ldexpf (float __x, int __y)=0A= {=0A= __ldexp_code;=0A= }=0A= =0A= __MATH_INLINE long double ldexpl (long double __x, int __y);=0A= __MATH_INLINE long double=0A= ldexpl (long double __x, int __y)=0A= {=0A= __ldexp_code;=0A= }=0A= =0A= __inline_mathcode3 (fma, __x, __y, __z, return (__x * __y) + __z)=0A= =0A= #define __lrint_code \=0A= long int __lrintres; \=0A= __asm__ \=0A= ("fistpl %0" \=0A= : "=3Dm" (__lrintres) : "t" (__x) : "st"); \=0A= return __lrintres=0A= __MATH_INLINE long int=0A= lrintf (float __x)=0A= {=0A= __lrint_code;=0A= }=0A= __MATH_INLINE long int=0A= lrint (double __x)=0A= {=0A= __lrint_code;=0A= }=0A= __MATH_INLINE long int=0A= lrintl (long double __x)=0A= {=0A= __lrint_code;=0A= }=0A= #undef __lrint_code=0A= =0A= #define __llrint_code \=0A= long long int __llrintres; \=0A= __asm__ \=0A= ("fistpll %0" \=0A= : "=3Dm" (__llrintres) : "t" (__x) : "st"); \=0A= return __llrintres=0A= __MATH_INLINE long long int=0A= llrintf (float __x)=0A= {=0A= __llrint_code;=0A= }=0A= __MATH_INLINE long long int=0A= llrint (double __x)=0A= {=0A= __llrint_code;=0A= }=0A= __MATH_INLINE long long int=0A= llrintl (long double __x)=0A= {=0A= __llrint_code;=0A= }=0A= #undef __llrint_code=0A= =0A= #endif=0A= =0A= =0A= #ifdef __USE_MISC=0A= =0A= __inline_mathcode2 (drem, __x, __y, \=0A= double __value; \=0A= int __clobbered; \=0A= __asm \=0A= ("1: fprem1\n\t" \=0A= "fstsw %%ax\n\t" \=0A= "sahf\n\t" \=0A= "jp 1b" \=0A= : "=3Dt" (__value), "=3D&a" (__clobbered) : "0" (__x), "u" (__y) : = "cc"); \=0A= return __value)=0A= =0A= =0A= /* This function is used in the `isfinite' macro. */=0A= __MATH_INLINE int __finite (double __x) __attribute__ ((__const__));=0A= __MATH_INLINE int=0A= __finite (double __x)=0A= {=0A= return (__extension__=0A= (((((union { double __d; int __i[2]; }) {__d: __x}).__i[1]=0A= | 0x800fffff) + 1) >> 31));=0A= }=0A= =0A= /* Miscellaneous functions */=0A= =0A= __inline_mathcode (__coshm1, __x, \=0A= long double __exm1 =3D __expm1l (__fabsl (__x)); \=0A= return 0.5 * (__exm1 / (__exm1 + 1.0)) * __exm1)=0A= =0A= __inline_mathcode (__acosh1p, __x, \=0A= return log1pl (__x + __sqrtl (__x) * __sqrtl (__x + 2.0)))=0A= =0A= #endif /* __USE_MISC */=0A= =0A= /* Undefine some of the large macros which are not used anymore. */=0A= #undef __expm1_code=0A= #undef __exp_code=0A= #undef __atan2_code=0A= #undef __sincos_code=0A= =0A= #endif /* __NO_MATH_INLINES */=0A= =0A= =0A= /* This code is used internally in the GNU libc. */=0A= #ifdef __LIBC_INTERNAL_MATH_INLINES=0A= __inline_mathcode2 (__ieee754_atan2, __y, __x,=0A= long double __value;=0A= __asm ("fpatan\n\t"=0A= : "=3Dt" (__value)=0A= : "0" (__x), "u" (__y) : "st(1)");=0A= return __value;)=0A= #endif=0A= =0A= #endif /* __GNUC__ */=0A= #endif =0A= ------=_NextPart_000_0015_01C2A8DA.D878AF80 Content-Type: text/plain; charset=us-ascii -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/ ------=_NextPart_000_0015_01C2A8DA.D878AF80-- --------------------------------------------- Introducing NetZero Long Distance 1st month Free! Sign up today at: www.netzerolongdistance.com