Mail Archives: cygwin/2002/12/21/14:33:19

delorie.com/archives/browse.cgi
search
Mail Archives: cygwin/2002/12/21/14:33:19
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm

List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>

List-Archive: <http://sources.redhat.com/ml/cygwin/>

List-Post: <mailto:cygwin AT cygwin DOT com>

List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sources.redhat.com/ml/#faqs>

Sender: cygwin-owner AT cygwin DOT com

Mail-Followup-To: cygwin AT cygwin DOT com

Delivered-To: mailing list cygwin AT cygwin DOT com

Message-ID: <000001c2a927$b7477f10$0b80b6c7@amr.corp.intel.com>

From: "tprinceusa" <tprinceusa AT netzero DOT net>

To: "Mikhail Teterin" <mi AT corbulon DOT video-collage DOT com>

Cc: <cygwin AT cygwin DOT com>

References: <200212211447 DOT gBLElNsH041540 AT corbulon DOT video-collage DOT com>

Subject: Re: poor performance -- is Cygwin to blame?..

Date: Sat, 21 Dec 2002 10:22:20 -0800

MIME-Version: 1.0

X-Priority: 3

X-MSMail-Priority: Normal

X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000
------=_NextPart_000_0015_01C2A8DA.D878AF80
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit


----- Original Message -----
From: "Mikhail Teterin" <mi AT corbulon DOT video-collage DOT com>
To: "Timothy C Prince" <tprince AT myrealbox DOT com>
Cc: <cygwin AT cygwin DOT com>
Sent: Saturday, December 21, 2002 6:47 AM
Subject: Re: poor performance -- is Cygwin to blame?..


> > In my experience with MPI programs, comparing cygwin and linux,
> > message passing takes longer under cygwin, but the time may be made up
> > elsewhere, if the compilation is truly similar.
> >
> > You mention that considerable time is spent in log(), pow(), exp()
> > but leave us guessing how you implemented them.
>
> I did not implement them. They are from whatever -lm means on Cygwin. I
> use them to compute my own formula repeatedly for hundreds of different
> vectors.
>
> > Then you imply that you think cygwin, rather than your math functions,
> > is the speed determining factor, without giving us a means to judge.
>
> They are not mine. There must be a misunderstanding...
>
> > The glibc versions of these functions are much faster than the newlib
> > versions, particularly if you permit the use of <mathinline.h>.
> > Neither approach the potential of pentium4, but the simplest way to
> > speed them up on cygwin is to employ something like <mathinline.h>,
> > and to provide your own pow() (or to use a compiler and library which
> > targets pentium4).
>
> Can this be done with just CFLAGS? I really don't want to pollute my
> code with ``#ifdef CYGWIN''... Thank you,
>
You could simply add
#include <mathinline.h>
just ahead of the final #endif in whichever <math.h> file is active, and supply a mathinline.h (example attached) in the include
search path.  You could add guards so that the <mathinline.h> is invoked in accordance with command line flags.  See the glibc
example for which in-line functions are made to depend on -ffast-math.  I have corrected some extreme value cases in my version, so
that it may not be as risky as the full glibc -ffast-math version.


------=_NextPart_000_0015_01C2A8DA.D878AF80
Content-Type: application/octet-stream;
	name="mathinline.h"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="mathinline.h"

/* Inline math functions for i387.=0A=
   Copyright (C) 1995, 1996, 1997, 1998 Free Software Foundation, Inc.=0A=
   This file is part of the GNU C Library.=0A=
   Contributed by John C. Bowman <bowman AT math DOT ualberta DOT ca>, 1995.=0A=
=0A=
   The GNU C Library is free software; you can redistribute it and/or=0A=
   modify it under the terms of the GNU Library General Public License as=0A=
   published by the Free Software Foundation; either version 2 of the=0A=
   License, or (at your option) any later version.=0A=
=0A=
   The GNU C Library is distributed in the hope that it will be useful,=0A=
   but WITHOUT ANY WARRANTY; without even the implied warranty of=0A=
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU=0A=
   Library General Public License for more details.=0A=
=0A=
   You should have received a copy of the GNU Library General Public=0A=
   License along with the GNU C Library; see the file COPYING.LIB.  If =
not,=0A=
   write to the Free Software Foundation, Inc., 59 Temple Place - Suite =
330,=0A=
   Boston, MA 02111-1307, USA.  */=0A=
=0A=
#ifndef __INC_MATH_INLINE=0A=
#define __INC_MATH_INLINE=0A=
#ifndef __CONCAT=0A=
#define __CONCAT(a,b) a##b=0A=
#endif=0A=
#ifndef __USE_MISC=0A=
#define __USE_MISC=0A=
#endif=0A=
#ifndef __OPTIMIZE__=0A=
#define __OPTIMIZE__=0A=
#endif=0A=
=0A=
#ifdef __cplusplus=0A=
# define __MATH_INLINE __inline=0A=
#else=0A=
# define __MATH_INLINE extern __inline=0A=
#endif=0A=
=0A=
=0A=
#if defined __USE_ISOC9X && defined __GNUC__ && __GNUC__ >=3D 2=0A=
/* ISO C 9X defines some macros to perform unordered comparisons.  The=0A=
   ix87 FPU supports this with special opcodes and we should use them.=0A=
   These must not be inline functions since we have to be able to handle=0A=
   all floating-point types.  */=0A=
# ifdef __i686__=0A=
/* For the PentiumPro and more recent processors we can provide=0A=
   better code.  */=0A=
#  define isgreater(x, y) \=0A=
     ({  char __result;						      \=0A=
	__asm__ ("fucomip %%st(1), %%st; seta %%al"			      \=0A=
		 : "=3Da" (__result) : "u" (y), "t" (x) : "cc", "st");	      \=0A=
	__result; })=0A=
#  define isgreaterequal(x, y) \=0A=
     ({  char __result;						      \=0A=
	__asm__ ("fucomip %%st(1), %%st; setae %%al"			      \=0A=
		 : "=3Da" (__result) : "u" (y), "t" (x) : "cc", "st");	      \=0A=
	__result; })=0A=
=0A=
#  define isless(x, y) \=0A=
     ({  char __result;						      \=0A=
	__asm__ ("fucomip %%st(1), %%st; seta %%al"			      \=0A=
		 : "=3Da" (__result) : "u" (x), "t" (y) : "cc", "st");	      \=0A=
	__result; })=0A=
=0A=
#  define islessequal(x, y) \=0A=
     ({  char __result;						      \=0A=
	__asm__ ("fucomip %%st(1), %%st; setae %%al"			      \=0A=
		 : "=3Da" (__result) : "u" (x), "t" (y) : "cc", "st");	      \=0A=
	__result; })=0A=
=0A=
#  define islessgreater(x, y) \=0A=
     ({  char __result;						      \=0A=
	__asm__ ("fucomip %%st(1), %%st; setne %%al"			      \=0A=
		 : "=3Da" (__result) : "u" (y), "t" (x) : "cc", "st");	      \=0A=
	__result; })=0A=
=0A=
#  define isunordered(x, y) \=0A=
     ({  char __result;						      \=0A=
	__asm__ ("fucomip %%st(1), %%st; setp %%al"			      \=0A=
		 : "=3Da" (__result) : "u" (y), "t" (x) : "cc", "st");	      \=0A=
	__result; })=0A=
# else=0A=
/* This is the dumb, portable code for i386 and above.  */=0A=
#  define isgreater(x, y) \=0A=
     ({  char __result;						      \=0A=
	__asm__ ("fucompp; fnstsw; testb $0x45, %%ah; setz %%al"	      \=0A=
		 : "=3Da" (__result) : "u" (y), "t" (x) : "cc", "st", "st(1)"); \=0A=
	__result; })=0A=
=0A=
#  define isgreaterequal(x, y) \=0A=
     ({  char __result;						      \=0A=
	__asm__ ("fucompp; fnstsw; testb $0x05, %%ah; setz %%al"	      \=0A=
		 : "=3Da" (__result) : "u" (y), "t" (x) : "cc", "st", "st(1)"); \=0A=
	__result; })=0A=
=0A=
#  define isless(x, y) \=0A=
     ({  char __result;						      \=0A=
	__asm__ ("fucompp; fnstsw; testb $0x45, %%ah; setz %%al"	      \=0A=
		 : "=3Da" (__result) : "u" (x), "t" (y) : "cc", "st", "st(1)"); \=0A=
	__result; })=0A=
=0A=
#  define islessequal(x, y) \=0A=
     ({  char __result;						      \=0A=
	__asm__ ("fucompp; fnstsw; testb $0x05, %%ah; setz %%al"	      \=0A=
		 : "=3Da" (__result) : "u" (x), "t" (y) : "cc", "st", "st(1)"); \=0A=
	__result; })=0A=
=0A=
#  define islessgreater(x, y) \=0A=
     ({  char __result;						      \=0A=
	__asm__ ("fucompp; fnstsw; testb $0x44, %%ah; setz %%al"	      \=0A=
		 : "=3Da" (__result) : "u" (y), "t" (x) : "cc", "st", "st(1)"); \=0A=
	__result; })=0A=
=0A=
#  define isunordered(x, y) \=0A=
     ({  char __result;						      \=0A=
	__asm__ ("fucompp; fnstsw; sahf; setp %%al"			      \=0A=
		 : "=3Da" (__result) : "u" (y), "t" (x) : "cc", "st", "st(1)"); \=0A=
	__result; })=0A=
# endif	/* __i686__ */=0A=
=0A=
/* Test for negative number.  Used in the signbit() macro.  */=0A=
__MATH_INLINE int=0A=
__signbitf (float __x)=0A=
{=0A=
  union { float __f; int __i; } __u =3D { __f: __x }; return __u.__i < 0;=0A=
}=0A=
__MATH_INLINE int=0A=
__signbit (double __x)=0A=
{=0A=
  union { double __d; int __i[2]; } __u =3D { __d: __x }; return =
__u.__i[1] < 0;=0A=
}=0A=
__MATH_INLINE int=0A=
__signbitl (long double __x)=0A=
{=0A=
  union { long double __l; int __i[3]; } __u =3D { __l: __x };=0A=
  return (__u.__i[2] & 0x8000) !=3D 0;=0A=
}=0A=
#endif=0A=
=0A=
=0A=
/* The gcc, version 2.7 or below, has problems with all this inlining=0A=
   code.  So disable it for this version of the compiler.  */=0A=
#if defined __GNUC__ && (__GNUC__ > 2 || (__GNUC__ =3D=3D 2 && =
__GNUC_MINOR__ > 7))=0A=
=0A=
#if ((!defined __NO_MATH_INLINES || defined =
__LIBC_INTERNAL_MATH_INLINES) \=0A=
     && defined __OPTIMIZE__)=0A=
=0A=
/* A macro to define float, double, and long double versions of various=0A=
   math functions for the ix87 FPU.  FUNC is the function name (which =
will=0A=
   be suffixed with f and l for the float and long double version,=0A=
   respectively).  OP is the name of the FPU operation.  */=0A=
=0A=
#if defined __USE_MISC || defined __USE_ISOC9X=0A=
# define __inline_mathop(func, op) \=0A=
  __inline_mathop_ (double, func, op)					      \=0A=
  __inline_mathop_ (float, __CONCAT(func,f), op)			      \=0A=
  __inline_mathop_ (long double, __CONCAT(func,l), op)=0A=
#else=0A=
# define __inline_mathop(func, op) \=0A=
  __inline_mathop_ (double, func, op)=0A=
#endif=0A=
=0A=
#define __inline_mathop_(float_type, func, op) \=0A=
  __inline_mathop_decl_ (float_type, func, op, "0" (__x))=0A=
=0A=
=0A=
#if defined __USE_MISC || defined __USE_ISOC9X=0A=
# define __inline_mathop_decl(func, op, params...) \=0A=
  __inline_mathop_decl_ (double, func, op, params)			      \=0A=
  __inline_mathop_decl_ (float, __CONCAT(func,f), op, params)		      \=0A=
  __inline_mathop_decl_ (long double, __CONCAT(func,l), op, params)=0A=
#else=0A=
# define __inline_mathop_decl(func, op, params...) \=0A=
  __inline_mathop_decl_ (double, func, op, params)=0A=
#endif=0A=
=0A=
#define __inline_mathop_decl_(float_type, func, op, params...) \=0A=
  __MATH_INLINE float_type func (float_type);				      \=0A=
  __MATH_INLINE float_type func (float_type __x)			      \=0A=
  {									      \=0A=
    float_type __result;					      \=0A=
    __asm (op : "=3Dt" (__result) : params);			      \=0A=
    return __result;							      \=0A=
  }=0A=
=0A=
=0A=
#if defined __USE_MISC || defined __USE_ISOC9X=0A=
# define __inline_mathcode(func, arg, code) \=0A=
  __inline_mathcode_ (double, func, arg, code)				      \=0A=
  __inline_mathcode_ (float, __CONCAT(func,f), arg, code)		      \=0A=
  __inline_mathcode_ (long double, __CONCAT(func,l), arg, code)=0A=
# define __inline_mathcode2(func, arg1, arg2, code) \=0A=
  __inline_mathcode2_ (double, func, arg1, arg2, code)			      \=0A=
  __inline_mathcode2_ (float, __CONCAT(func,f), arg1, arg2, code)	      \=0A=
  __inline_mathcode2_ (long double, __CONCAT(func,l), arg1, arg2, code)=0A=
# define __inline_mathcode3(func, arg1, arg2, arg3, code) \=0A=
  __inline_mathcode3_ (double, func, arg1, arg2, arg3, code)		      \=0A=
  __inline_mathcode3_ (float, __CONCAT(func,f), arg1, arg2, arg3, code)	 =
     \=0A=
  __inline_mathcode3_ (long double, __CONCAT(func,l), arg1, arg2, arg3, =
code)=0A=
#else=0A=
# define __inline_mathcode(func, arg, code) \=0A=
  __inline_mathcode_ (double, func, (arg), code)=0A=
# define __inline_mathcode2(func, arg1, arg2, code) \=0A=
  __inline_mathcode2_ (double, func, arg1, arg2, code)=0A=
# define __inline_mathcode3(func, arg1, arg2, arg3, code) \=0A=
  __inline_mathcode3_ (double, func, arg1, arg2, arg3, code)=0A=
#endif=0A=
=0A=
#define __inline_mathcode_(float_type, func, arg, code) \=0A=
  __MATH_INLINE float_type func (float_type);				      \=0A=
  __MATH_INLINE float_type func (float_type arg)			      \=0A=
  {									      \=0A=
    code;								      \=0A=
  }=0A=
=0A=
#define __inline_mathcode2_(float_type, func, arg1, arg2, code) \=0A=
  __MATH_INLINE float_type func (float_type, float_type);		      \=0A=
  __MATH_INLINE float_type func (float_type arg1, float_type arg2)	      =
\=0A=
  {									      \=0A=
    code;								      \=0A=
  }=0A=
=0A=
#define __inline_mathcode3_(float_type, func, arg1, arg2, arg3, code) \=0A=
  __MATH_INLINE float_type func (float_type, float_type, float_type);	   =
   \=0A=
  __MATH_INLINE float_type func (float_type arg1, float_type arg2,	      =
\=0A=
				 float_type arg3)			      \=0A=
  {									      \=0A=
    code;								      \=0A=
  }=0A=
#endif=0A=
=0A=
=0A=
#if !defined __NO_MATH_INLINES && defined __OPTIMIZE__=0A=
/* Miscellaneous functions */=0A=
=0A=
__inline_mathcode (__sgn, __x, \=0A=
  return __x =3D=3D 0.0 ? 0.0 : (__x > 0.0 ? 1.0 : -1.0))=0A=
=0A=
__inline_mathcode (__pow2, __x, \=0A=
   long double __value;						      \=0A=
   long double __exponent;					      \=0A=
  __extension__ long long int __p =3D (long long int) __x;		      \=0A=
  if (__x =3D=3D (long double) __p)						      \=0A=
    {									      \=0A=
      __asm 						      \=0A=
	("fscale"							      \=0A=
	 : "=3Dt" (__value) : "0" (1.0), "u" (__x));			      \=0A=
      return __value;							      \=0A=
    }									      \=0A=
  __asm 							      \=0A=
    ("fld	%%st(0)\n\t"						      \=0A=
     "frndint			# int(x)\n\t"				      \=0A=
     "fxch\n\t"								      \=0A=
     "fsub	%%st(1)		# fract(x)\n\t"				      \=0A=
     "f2xm1			# 2^(fract(x)) - 1\n\t"			      \=0A=
     : "=3Dt" (__value), "=3Du" (__exponent) : "0" (__x));			      \=0A=
  __value +=3D 1.0;							      \=0A=
  __asm 							      \=0A=
    ("fscale"								      \=0A=
     : "=3Dt" (__value) : "0" (__value), "u" (__exponent));		      \=0A=
  return __value)=0A=
=0A=
#define __sincos_code \=0A=
  long double __cosr;						      \=0A=
  long double __sinr;						      \=0A=
  __asm 							      \=0A=
    ("fsincos\n\t"							      \=0A=
     : "=3Dt" (__cosr), "=3Du" (__sinr) : "0" (__x));			      \=0A=
  *__sinx =3D __sinr;							      \=0A=
  *__cosx =3D __cosr=0A=
=0A=
__MATH_INLINE void __sincos (double __x, double *__sinx, double *__cosx);=0A=
__MATH_INLINE void=0A=
__sincos (double __x, double *__sinx, double *__cosx)=0A=
{=0A=
  __sincos_code;=0A=
}=0A=
=0A=
__MATH_INLINE void __sincosf (float __x, float *__sinx, float *__cosx);=0A=
__MATH_INLINE void=0A=
__sincosf (float __x, float *__sinx, float *__cosx)=0A=
{=0A=
  __sincos_code;=0A=
}=0A=
=0A=
__MATH_INLINE void __sincosl (long double __x, long double *__sinx,=0A=
			      long double *__cosx);=0A=
__MATH_INLINE void=0A=
__sincosl (long double __x, long double *__sinx, long double *__cosx)=0A=
{=0A=
  __sincos_code;=0A=
}=0A=
=0A=
=0A=
/* Optimized inline implementation, sometimes with reduced precision=0A=
   and/or argument range.  */=0A=
=0A=
__inline_mathop(rint, "frndint")=0A=
=0A=
#define __expm1_code \=0A=
  long double __value;						\=0A=
  long double __exponent;					\=0A=
  long double __temp; 						\=0A=
  __asm ("fldl2e" : "=3Dt" (__temp) );					\=0A=
  __x -=3D .693147178739309310913*(__exponent =3D rintl(__x * __temp));	\=0A=
  __x =3D (__x-1.82063599850414622404E-09*__exponent)*__temp;		\=0A=
  __asm ("f2xm1" : "=3Dt" (__value) : "0" (__x));				\=0A=
  __asm ("fscale" : "=3Dt" (__value) : "0" (__value), "u" (__exponent));	=
\=0A=
  __asm ("fscale" : "=3Dt" (__temp) : "0" (1.0), "u" (__exponent));	\=0A=
  return (__temp - 1) + __value=0A=
__inline_mathcode_ (long double, __expm1l, __x, __expm1_code)=0A=
=0A=
/* generate (exp(x)-1)/2 for use in sinh()=0A=
 * result ought to be finite up to |x| =3D 11357.2 but it seems to be =
unstable=0A=
 * beyond the limit applied below, producing NaN in division later */=0A=
#define __expm1h_code \=0A=
  long double __value;						\=0A=
  long double __exponent;					\=0A=
  long double __temp; 						\=0A=
  if(__x > 11356.8) __x =3D 11356.8; \=0A=
  __asm ("fldl2e" : "=3Dt" (__temp) );				\=0A=
  __x -=3D .693147178739309310913*(__exponent =3D rintl(__x * __temp));	\=0A=
  __x =3D (__x-1.82063599850414622404E-09*__exponent)*__temp;		\=0A=
  __asm ("f2xm1" : "=3Dt" (__value) : "0" (__x));				\=0A=
  __asm ("fscale" : "=3Dt" (__value) : "0" (__value), "u" =
(__exponent-1));\=0A=
  __asm ("fscale" : "=3Dt" (__temp) : "0" (1.0), "u" (__exponent-1));	\=0A=
  return (__temp - .5) + __value=0A=
__inline_mathcode_ (long double, __expm1h, __x, __expm1h_code)=0A=
=0A=
=0A=
/* extra precision range reduction saves 11 bits in elefunt expl() */=0A=
#define __exp_code \=0A=
  long double __value;						\=0A=
  long double __exponent;					\=0A=
  long double __temp; 						\=0A=
  __asm ("fldl2e" : "=3Dt" (__temp) );					\=0A=
  __x -=3D .693147178739309310913*(__exponent =3D rintl(__x * __temp));	\=0A=
  __x =3D (__x-1.82063599850414622404E-09*__exponent)*__temp;		\=0A=
  __asm ("f2xm1" : "=3Dt" (__value) : "0" (__x));				\=0A=
  __asm ("fscale" : "=3Dt" (__value) : "0" (__value+1), "u" =
(__exponent)); \=0A=
  return __value=0A=
__inline_mathcode_ (long double, __expl, __x, __exp_code)=0A=
__inline_mathcode (exp, __x, if(__x > 20768)__x =3D 20768;return =
__expl(__x))=0A=
=0A=
#define __exp_h_code \=0A=
  long double __value;						\=0A=
  long double __exponent;					\=0A=
  long double __temp; 						\=0A=
  if(__x > 20768) __x =3D 20768; \=0A=
  __asm ("fldl2e" : "=3Dt" (__temp) );					\=0A=
  __x -=3D .693147178739309310913*(__exponent =3D rintl(__x * __temp));	\=0A=
  __x =3D (__x-1.82063599850414622404E-09*__exponent)*__temp;		\=0A=
  __asm ("f2xm1" : "=3Dt" (__value) : "0" (__x));				\=0A=
  __asm ("fscale" : "=3Dt" (__value) : "0" (__value+1), "u" =
(__exponent-1)); \=0A=
  return __value=0A=
__inline_mathcode_ (long double, __exph, __x, __exp_h_code)=0A=
=0A=
=0A=
__inline_mathcode (tan, __x, \=0A=
  long double __value;						      \=0A=
  long double __value2;	/* unused */		      \=0A=
  __asm 							      \=0A=
    ("fptan"								      \=0A=
     : "=3Dt" (__value2), "=3Du" (__value) : "0" (__x));			      \=0A=
  return __value)=0A=
=0A=
=0A=
#define __atan2_code \=0A=
  long double __value;						      \=0A=
  __asm 							      \=0A=
    ("fpatan\n\t"							      \=0A=
     : "=3Dt" (__value) : "0" (__x), "u" (__y) : "st(1)");		      \=0A=
  return __value=0A=
__inline_mathcode2 (atan2, __y, __x, __atan2_code)=0A=
__inline_mathcode2_ (long double, __atan2l, __y, __x, __atan2_code)=0A=
__inline_mathcode (atan, __y, return __atan2l(__y,1.0))=0A=
=0A=
=0A=
__inline_mathcode2 (fmod, __x, __y, \=0A=
  long double __value;						      \=0A=
  __asm 							      \=0A=
    ("1:	fprem\n\t"						      \=0A=
     "fnstsw	%%ax\n\t"						      \=0A=
     "sahf\n\t"								      \=0A=
     "jp	1b"							      \=0A=
     : "=3Dt" (__value) : "0" (__x), "u" (__y) : "ax", "cc");		      \=0A=
  return __value)=0A=
=0A=
=0A=
#if 0=0A=
/* while this version is faster and more accurate than the one in =
glibc-2.1.2,=0A=
 * there is no speed gain over separate compilation, and there is a loss =
of 9=0A=
 * significant bits.  */=0A=
__inline_mathcode2 (pow, __x, __y, \=0A=
  long double __value;						      \=0A=
  long double __exponent;					      \=0A=
  __extension__ long long int __p =3D (long long int) __y;		      \=0A=
  long double __r;						     	\=0A=
  if (__x =3D=3D 0.0 && __y > 0.0)					     	\=0A=
    return __x;								\=0A=
  if(__y =3D=3D __p)							\=0A=
    __r =3D 1;								\=0A=
  else{									\=0A=
    __asm 							\=0A=
      ("fyl2x" : "=3Dt" (__value) : "0" (__x), "u" (__y - __p) : =
"st(1)"); \=0A=
      __value -=3D (__exponent =3D rintl(__value));				\=0A=
    __asm ( "f2xm1			# 2^(fract(y * log2(x))) - 1\n\t" \=0A=
       : "=3Dt" (__value) : "0" (__value));				\=0A=
    __asm ("fscale"							\=0A=
       : "=3Dt" (__r) : "0" (__value + 1), "u" (__exponent));		\=0A=
    }									\=0A=
  if (__p < 0)							     	\=0A=
    {								     	\=0A=
      __p =3D -__p;							\=0A=
      __x =3D 1.0 / __x;						     	\=0A=
    }								     	\=0A=
  if (__p & 1)							     	\=0A=
    __r *=3D __x;							     	\=0A=
  for ( ; __p > 1; __r *=3D __x)						\=0A=
    do									\=0A=
      __x *=3D __x;							\=0A=
    while(!((__p >>=3D 1) & 1));						\=0A=
  return __r)=0A=
#endif=0A=
=0A=
=0A=
=0A=
#if defined __GNUC__ && (__GNUC__ > 2 || __GNUC__ =3D=3D 2 && =
__GNUC_MINOR__ >=3D 8)=0A=
__inline_mathcode_ (double, fabs, __x, return __builtin_fabs (__x))=0A=
__inline_mathcode_ (float, fabsf, __x, return __builtin_fabsf (__x))=0A=
__inline_mathcode_ (long double, fabsl, __x, return __builtin_fabsl =
(__x))=0A=
__inline_mathcode_ (long double, __fabsl, __x, return __builtin_fabsl =
(__x))=0A=
__inline_mathcode_ (long double, __sqrtl, __x, return __builtin_sqrtl =
(__x))=0A=
#else=0A=
__inline_mathop (fabs, "fabs")=0A=
__inline_mathop_ ( double, __fabsl, "fabs")=0A=
__inline_mathop_ (long double, __sqrtl, "fsqrt")=0A=
#endif=0A=
=0A=
/* The argument range of this inline version is reduced.  */=0A=
__inline_mathop (sin, "fsin")=0A=
/* The argument range of this inline version is reduced.  */=0A=
__inline_mathop (cos, "fcos")=0A=
=0A=
/* this and all other uses of fyl2x have been corrected to include =
clobber */=0A=
#define __log_code						\=0A=
  long double __value;						      \=0A=
    __asm ("fldln2": "=3Dt" (__value));				      \=0A=
    __asm ("fyl2x" : "=3Dt" (__value) : "0" (__x), "u" (__value) : =
"st(1)" );  \=0A=
  return __value=0A=
=0A=
__inline_mathcode (log, __x, __log_code)=0A=
=0A=
#define __log10_code						\=0A=
  long double __value;						      \=0A=
    __asm ("fldlg2": "=3Dt" (__value));				      \=0A=
    __asm ("fyl2x" : "=3Dt" (__value) : "0" (__x), "u" (__value) : =
"st(1)" );  \=0A=
  return __value=0A=
=0A=
__inline_mathcode (log10, __x, __log10_code)=0A=
=0A=
/* sqrt(1-x*x) is good only on the non-IEEE fused multiply-accumulate =
arithmetic=0A=
 * this saves 9 bits in elefunt asinl() but still misses 20% of =
achieveable=0A=
 * correctly rounded results without a net gain in speed over a library=0A=
 * function following the suggestions of Plauger. */=0A=
#if 0=0A=
__inline_mathcode (asin, __x, return __atan2l (__x, __sqrtl =
((1-__x)*(1+__x))))=0A=
__inline_mathcode (acos, __x, return __atan2l (__sqrtl =
((1-__x)*(1+__x)), __x))=0A=
#endif=0A=
__inline_mathcode_ (long double, __sgn1l, __x, return __x >=3D 0.0 ? 1.0 =
: -1.0)=0A=
=0A=
/* i286 compatibility is not retained. */=0A=
__inline_mathcode (sinh, __x,					\=0A=
  long double __exm1 =3D __expm1h (__x);				\=0A=
	  return __exm1*.5 / (__exm1 + .5) + __exm1;)=0A=
=0A=
__inline_mathcode (cosh, __x,					\=0A=
  long double __ex =3D __exph (__fabsl(__x));			\=0A=
	  return __ex + .25 / __ex);=0A=
=0A=
/* this corrects the sign returned when x =3D=3D 0 and improves accuracy =
and speed.=0A=
 * Limits in expl() above solve problem of NaN returned for large |__x|.=0A=
 * Use Chebyshev economized polynomial for small |__x| to improve =
accuracy=0A=
 * and speed, at the expense of code size. */=0A=
__inline_mathcode (tanh, __x, \=0A=
  if(__fabsl(__x) <=3D .34657){				\=0A=
  	long double __x2 =3D __x * __x;	\=0A=
  	long double __x4 =3D __x2*__x2;	\=0A=
	  return  __x +	__x2*__x*(	\=0A=
 -0.3333333333333333333028L   		\=0A=
  +__x2*(0.133333333333333321200L    	\=0A=
 +__x2*-0.5396825396825207695E-01L	\=0A=
  +__x4*(0.218694885360028124E-01L	\=0A=
 +__x2*-0.88632355226515778E-02		\=0A=
  +__x4*(0.3592127817609080E-02		\=0A=
 +__x2*-0.14558300258105E-02)		\=0A=
  +__x4*__x4*(0.5899693119329E-03	\=0A=
 +__x2*-0.238614526828E-03		\=0A=
  +__x4*(0.9399418484E-04		\=0A=
 +__x2*-0.294863013E-04)))));}		\=0A=
  return 1 - 2 / (expl(__x + __x) + 1))=0A=
=0A=
=0A=
__inline_mathcode (floor, __x, \=0A=
  long double __value;						      \=0A=
   unsigned short int __cw;					      \=0A=
   unsigned short int __cwtmp;				      \=0A=
  __asm  ("fnstcw %0" : "=3Dm" (__cw));				      \=0A=
  __cwtmp =3D (__cw & 0xf3ff) | 0x0400; /* rounding down */		      \=0A=
  __asm  ("fldcw %0" : : "m" (__cwtmp));			      \=0A=
  __asm  ("frndint" : "=3Dt" (__value) : "0" (__x));		      \=0A=
  __asm  ("fldcw %0" : : "m" (__cw));				      \=0A=
  return __value)=0A=
=0A=
__inline_mathcode (ceil, __x, \=0A=
  long double __value;						      \=0A=
   unsigned short int __cw;					      \=0A=
   unsigned short int __cwtmp;				      \=0A=
  __asm  ("fnstcw %0" : "=3Dm" (__cw));				      \=0A=
  __cwtmp =3D (__cw & 0xf3ff) | 0x0800; /* rounding up */			      \=0A=
  __asm  ("fldcw %0" : : "m" (__cwtmp));			      \=0A=
  __asm  ("frndint" : "=3Dt" (__value) : "0" (__x));		      \=0A=
  __asm  ("fldcw %0" : : "m" (__cw));				      \=0A=
  return __value)=0A=
=0A=
#define __ldexp_code \=0A=
  long double __value;						      \=0A=
  __asm 							      \=0A=
    ("fscale"								      \=0A=
     : "=3Dt" (__value) : "0" (__x), "u" ((long double) __y));		      \=0A=
  return __value=0A=
=0A=
__MATH_INLINE double ldexp (double __x, int __y);=0A=
__MATH_INLINE double=0A=
ldexp (double __x, int __y)=0A=
{=0A=
  __ldexp_code;=0A=
}=0A=
=0A=
=0A=
/* Optimized versions for some non-standardized functions.  */=0A=
#if defined __USE_ISOC9X || defined __USE_MISC=0A=
=0A=
__inline_mathcode (expm1, __x, __expm1_code)=0A=
=0A=
/* We cannot rely on M_SQRT being defined.  So we do it for ourself=0A=
   here.  */=0A=
# define __M_SQRT2	1.41421356237309504880L	/* sqrt(2) */=0A=
=0A=
__inline_mathcode (log1p, __x, \=0A=
  long double __value;						      \=0A=
  if (__fabsl (__x) >=3D 1.0 - 0.5 * __M_SQRT2)				\=0A=
    return logl (1.0 + __x);						\=0A=
    __asm ("fldln2":"=3Dt" (__value));					\=0A=
    __asm ("fyl2xp1" : "=3Dt" (__value) : "0" (__x),"u" (__value) : =
"st(1)"); \=0A=
  return __value)=0A=
=0A=
=0A=
/* The argument range of the inline version of asinhl is slightly =
reduced.  */=0A=
__inline_mathcode (asinh, __x, \=0A=
   long double  __y =3D __fabsl (__x);				      \=0A=
  __y =3D log1pl (__x * __x / (__sqrtl (__x * __x + 1.0) + 1.0) + __y);	 =
     \=0A=
  return  __x >=3D 0 ? __y : -__y)=0A=
=0A=
__inline_mathcode (acosh, __x, \=0A=
  return logl (__x + __sqrtl (__x - 1.0) * __sqrtl (__x + 1.0)))=0A=
=0A=
__inline_mathcode (atanh, __x, \=0A=
  return 0.5 * log1pl (2 * ( __x / (1.0 + __fabsl (__x)))))=0A=
=0A=
/* The argument range of the inline version of hypotl is slightly =
reduced.  */=0A=
__inline_mathcode2 (hypot, __x, __y, return __sqrtl (__x * __x + __y * =
__y))=0A=
=0A=
__inline_mathcode(logb, __x, \=0A=
   long double __value;						      \=0A=
   long double __junk;						      \=0A=
  __asm 							      \=0A=
    ("fxtract\n\t"							      \=0A=
     : "=3Dt" (__junk), "=3Du" (__value) : "0" (__x));			      \=0A=
  return __value)=0A=
=0A=
#endif=0A=
=0A=
#ifdef __USE_ISOC9X=0A=
__inline_mathcode (log2, __x, \=0A=
  long double __value;						      \=0A=
    __asm__ ("fyl2x" : "=3Dt" (__value) : "0" (__x), "u" (1.0) : =
"st(1)");  \=0A=
  return __value)=0A=
=0A=
__MATH_INLINE float ldexpf (float __x, int __y);=0A=
__MATH_INLINE float=0A=
ldexpf (float __x, int __y)=0A=
{=0A=
  __ldexp_code;=0A=
}=0A=
=0A=
__MATH_INLINE long double ldexpl (long double __x, int __y);=0A=
__MATH_INLINE long double=0A=
ldexpl (long double __x, int __y)=0A=
{=0A=
  __ldexp_code;=0A=
}=0A=
=0A=
__inline_mathcode3 (fma, __x, __y, __z, return (__x * __y) + __z)=0A=
=0A=
#define __lrint_code \=0A=
  long int __lrintres;							      \=0A=
  __asm__ 							      \=0A=
    ("fistpl %0"							      \=0A=
     : "=3Dm" (__lrintres) : "t" (__x) : "st");				      \=0A=
  return __lrintres=0A=
__MATH_INLINE long int=0A=
lrintf (float __x)=0A=
{=0A=
  __lrint_code;=0A=
}=0A=
__MATH_INLINE long int=0A=
lrint (double __x)=0A=
{=0A=
  __lrint_code;=0A=
}=0A=
__MATH_INLINE long int=0A=
lrintl (long double __x)=0A=
{=0A=
  __lrint_code;=0A=
}=0A=
#undef __lrint_code=0A=
=0A=
#define __llrint_code \=0A=
  long long int __llrintres;						      \=0A=
  __asm__ 							      \=0A=
    ("fistpll %0"							      \=0A=
     : "=3Dm" (__llrintres) : "t" (__x) : "st");				      \=0A=
  return __llrintres=0A=
__MATH_INLINE long long int=0A=
llrintf (float __x)=0A=
{=0A=
  __llrint_code;=0A=
}=0A=
__MATH_INLINE long long int=0A=
llrint (double __x)=0A=
{=0A=
  __llrint_code;=0A=
}=0A=
__MATH_INLINE long long int=0A=
llrintl (long double __x)=0A=
{=0A=
  __llrint_code;=0A=
}=0A=
#undef __llrint_code=0A=
=0A=
#endif=0A=
=0A=
=0A=
#ifdef __USE_MISC=0A=
=0A=
__inline_mathcode2 (drem, __x, __y, \=0A=
   double __value;						      \=0A=
   int __clobbered;						      \=0A=
  __asm 							      \=0A=
    ("1:	fprem1\n\t"						      \=0A=
     "fstsw	%%ax\n\t"						      \=0A=
     "sahf\n\t"								      \=0A=
     "jp	1b"							      \=0A=
     : "=3Dt" (__value), "=3D&a" (__clobbered) : "0" (__x), "u" (__y) : =
"cc");    \=0A=
  return __value)=0A=
=0A=
=0A=
/* This function is used in the `isfinite' macro.  */=0A=
__MATH_INLINE int __finite (double __x) __attribute__ ((__const__));=0A=
__MATH_INLINE int=0A=
__finite (double __x)=0A=
{=0A=
  return (__extension__=0A=
	  (((((union { double __d; int __i[2]; }) {__d: __x}).__i[1]=0A=
	     | 0x800fffff) + 1) >> 31));=0A=
}=0A=
=0A=
/* Miscellaneous functions */=0A=
=0A=
__inline_mathcode (__coshm1, __x, \=0A=
   long double __exm1 =3D __expm1l (__fabsl (__x));		      \=0A=
  return 0.5 * (__exm1 / (__exm1 + 1.0)) * __exm1)=0A=
=0A=
__inline_mathcode (__acosh1p, __x, \=0A=
  return log1pl (__x + __sqrtl (__x) * __sqrtl (__x + 2.0)))=0A=
=0A=
#endif /* __USE_MISC  */=0A=
=0A=
/* Undefine some of the large macros which are not used anymore.  */=0A=
#undef __expm1_code=0A=
#undef __exp_code=0A=
#undef __atan2_code=0A=
#undef __sincos_code=0A=
=0A=
#endif /* __NO_MATH_INLINES  */=0A=
=0A=
=0A=
/* This code is used internally in the GNU libc.  */=0A=
#ifdef __LIBC_INTERNAL_MATH_INLINES=0A=
__inline_mathcode2 (__ieee754_atan2, __y, __x,=0A=
		    long double __value;=0A=
		    __asm  ("fpatan\n\t"=0A=
					: "=3Dt" (__value)=0A=
					: "0" (__x), "u" (__y) : "st(1)");=0A=
		    return __value;)=0A=
#endif=0A=
=0A=
#endif /* __GNUC__  */=0A=
#endif =0A=


------=_NextPart_000_0015_01C2A8DA.D878AF80
Content-Type: text/plain; charset=us-ascii

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/
------=_NextPart_000_0015_01C2A8DA.D878AF80--

---------------------------------------------
Introducing NetZero Long Distance
1st month Free!
Sign up today at: www.netzerolongdistance.com
- Raw text -
webmaster	delorie software privacy
Copyright � 2019 by DJ Delorie	Updated Jul 2019
Mailing-List:	contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Subscribe:	<mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive:	<http://sources.redhat.com/ml/cygwin/>
List-Post:	<mailto:cygwin AT cygwin DOT com>
List-Help:	<mailto:cygwin-help AT cygwin DOT com>, <http://sources.redhat.com/ml/#faqs>
Sender:	cygwin-owner AT cygwin DOT com
Mail-Followup-To:	cygwin AT cygwin DOT com
Delivered-To:	mailing list cygwin AT cygwin DOT com
Message-ID:	<000001c2a927$b7477f10$0b80b6c7@amr.corp.intel.com>
From:	"tprinceusa" <tprinceusa AT netzero DOT net>
To:	"Mikhail Teterin" <mi AT corbulon DOT video-collage DOT com>
Cc:	<cygwin AT cygwin DOT com>
References:	<200212211447 DOT gBLElNsH041540 AT corbulon DOT video-collage DOT com>
Subject:	Re: poor performance -- is Cygwin to blame?..
Date:	Sat, 21 Dec 2002 10:22:20 -0800
MIME-Version:	1.0
X-Priority:	3
X-MSMail-Priority:	Normal
X-MimeOLE:	Produced By Microsoft MimeOLE V6.00.2600.0000