delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp-workers/2003/11/01/07:25:22

X-Authentication-Warning: delorie.com: mail set sender to djgpp-workers-bounces using -f
From: <ams AT ludd DOT luth DOT se>
Message-Id: <200311011224.hA1COZGn000429@speedy.ludd.luth.se>
Subject: isfinite optimisation
To: DJGPP-WORKERS <djgpp-workers AT delorie DOT com>
Date: Sat, 1 Nov 2003 13:24:35 +0100 (CET)
X-Mailer: ELM [version 2.4ME+ PL78 (25)]
MIME-Version: 1.0
X-MailScanner: Found to be clean
Reply-To: djgpp-workers AT delorie DOT com

Hello.

I've thought of a small optimisation of isfinite().

The current code is (from include/math.h):

#define FP_INFINITE     0
#define FP_NAN          1
#define FP_NORMAL       2
#define FP_SUBNORMAL    3
#define FP_ZERO         4
/* Extended with Unnormals (for long doubles). */
#define FP_UNNORMAL     1024

#define fpclassify(x) ((sizeof(x)==sizeof(float))? __fpclassifyf(x) : \
                       (sizeof(x)==sizeof(double))? __fpclassifyd(x) : \
                       __fpclassifyld(x))

#define isfinite(x)     (fpclassify(x)==FP_NORMAL || \
                         fpclassify(x)==FP_SUBNORMAL || \
                         fpclassify(x)==FP_ZERO)

With this gcc (2.953 and 3.3.1) generates code like this:

	call	__fpclassifyf
	<adjust esp>
	cmpl    $2, %eax
        je      L4
	cmpl    $3, %eax
        je      L4
        cmpl    $4, %eax
        je      L4

I. e. three cmp and three je instructions.


I propose:

#define FP_INFINITE     0x00000001
#define FP_NAN          0x00000002
#define FP_NORMAL       0x00000004
#define FP_SUBNORMAL    0x00000008
#define FP_ZERO         0x00000010
/* Extended with Unnormals (for long doubles). */
#define FP_UNNORMAL     0x00010000

#define fpclassify(x) ((sizeof(x)==sizeof(float))? __fpclassifyf(x) : \
                       (sizeof(x)==sizeof(double))? __fpclassifyd(x) : \
                       __fpclassifyld(x))

#define isfinite(x)   ((fpclassify(x) & (FP_NORMAL|FP_SUBNORMAL|FP_ZERO)) != 0)


This will make gcc generate this:
	call    __fpclassifyf
        <adjust esp>
	testb	$28, %al
	je	L3

I. e. one test and one je.

The second version is much better. Plus the coder that needs some
other combination of the type of numbers can use the same idea.


If nobody complains I'll commit this in a week.


Right,

						MartinS

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019