Date: Wed, 11 Feb 1998 16:06:12 +0100 (MET)
From: Hans-Bernhard Broeker <broeker AT physik DOT rwth-aachen DOT de>
To: Vik Heyndrickx <Vik DOT Heyndrickx AT rug DOT ac DOT be>
cc: DJ Delorie <dj AT delorie DOT com>, djgpp-workers AT delorie DOT com
Subject: Re: char != unsigned char... sometimes, sigh
In-Reply-To: <34E18CB9.5D9E@rug.ac.be>
Message-ID: <Pine.LNX.3.93.980211151419.1113D-100000@acp3bf>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Precedence: bulk

On Wed, 11 Feb 1998, Vik Heyndrickx wrote:

> Hans-Bernhard Broeker wrote:
> > On Tue, 10 Feb 1998, Vik Heyndrickx wrote:
[...]
> > Which brings me to an idea: why should we provide is* _macros_ at all?  As
> > the DJGPP libc is for gcc, anyway, why not use 'extern inline' functions
> > instead? That'd save all the special casing and whatnot, and automatically
> > turn it into a function call whenever gcc thinks that's the better idea...
> 
> Why would this save us the special casing?

Because the compiler knows the *prototype* of the extern inline function,
so he actually knows that this inline function expects and *int* as its
argument, and will automatically insert the cast for us, whenever that's
necessary. In the same turn, knows that the function returns int. Both of
these may also be helpful in generating compiler warnings for some kinds
of incorrect use of these functions that I can't imagine right now. 

> > I.e., why not put *this* in inlines/ctype.ha:
> > 
> >         extern int isupper(int c);
> >         extern inline int isupper(int c) {
> >           return __dj_ctype_flags[c+1]&__dj_ISUPPER;
> >         }
> 
> The macro's I offered were also acceptable. Every library I've seen so
> far (bo*nd, djgpp, FreeBSD, gnulibc) does offer a set of macro's.

That may well be so, but does that mean we have to follow what is actually
just a tradition, based in the day when 'inline' functions weren't
available yet in most compilers? After all, as the gcc docs state it 'an
inline function is as fast as a macro', but it does have the benefit of
type-safeness, and automatic demotion to a function call if the compiler
thinks that's a better idea in a given situation.

But I didn't mean to imply your macro implementation weren't "acceptable",
anyway. I just wanted to point a possible alternative. And your
implementation using the ({ }) syntax is already rather close to an inline
function, as well, so why not take that final step as well? 

> > > Note that ANY change you will make to these macro's will turn them less
> > > efficient. I know where I am talking about.
> > 
> > Could you unclose some of that knowledge? Like: why should your solution
> > with two temporary variables be more efficient than a simple
> > 
> >         ((int)((__dj_ctype_flags[((int)(t))+1]&__dj_ISUPPER))
> 
> Because of the gnu c compiler optimization strategies. According to the
> manual, you can't just assign enough to temporary variables. If you have
> a look at the produced asm code, you will that this is correct. 

OK. For now, I'll have to take your word on this. But I may investigate
myself some time... 

> > And how could a comparison ('!=0' at the end of your code) be more
> > efficient than a simple cast to int? The is* functions aren't required to
> > return only 1 or 0, after all, so there's no real need to translate '!=0'
> > to 1.
> 
> My proposal never yields worse code, and in case the user wants to
> assign the result from this macro to e.g. a char (for memory
> conservation) this will always work, something that cannot be said from
> your example. In case that I have two choices which offer equivalent
> efficient code, I always choose the safer solution.

You're still trying to fix what deserves to be (and stay) broken code
behind the user's back. By definition, the only thing a user may assume
about the return value of any of the is*() calls is that it's either zero
for 'false' or non-zero for 'true'. And the variable that it returns is an
*int* so assigning it to a char uncasted invokes implementation-defined
behaviour. 

Let me cite the Linux implementation, as an example:

/* Some broken codes assume isxxxxx () return a char.*/
#ifndef _BROKEN_CTYPE
#define _BROKEN_CTYPE 1
#endif
/* ... */
#if _BROKEN_CTYPE
#define __isctype(c, type) \
  ((__ctype_b[(int) (c)] & (unsigned short int) type) != 0)
#else
#define __isctype(c, type) \
  (__ctype_b[(int) (c)] & (unsigned short int) type)
#endif

As you can see, they explictly call the kind of code you want to support
"broken", and only because of that broken-ness, they provide an
alternative implementation that adds '!=0' to the code. 

BTW: I have checked briefly, and it actually seems to make no difference
wether there's a "!=0" at the end or not, at least as long as I use is*()
calls in logical expressions directly (where an implied hidden '!=0' is
added by the compiler anyway). 

Hans-Bernhard Broeker (broeker AT physik DOT rwth-aachen DOT de)
Even if all the snow were burnt, ashes would remain.