delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp-workers/1998/02/06/03:00:49

Sender: vheyndri AT rug DOT ac DOT be
Message-Id: <34DAC2A1.7E06@rug.ac.be>
Date: Fri, 06 Feb 1998 08:58:25 +0100
From: Vik Heyndrickx <Vik DOT Heyndrickx AT rug DOT ac DOT be>
Mime-Version: 1.0
To: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
Cc: DJ Delorie <dj AT delorie DOT com>, djgpp-workers AT delorie DOT com
Subject: Re: char != unsigned char... sometimes, sigh
References: <Pine DOT SUN DOT 3 DOT 91 DOT 980205114743 DOT 28596C-100000 AT is>

Eli Zaretskii wrote:
> On Wed, 4 Feb 1998, DJ Delorie wrote:
> 
> > OK, then, how do we fix it? Is there ever a case where the is*()
> > functions/macros *care* if it's EOF or 0xff?  

Probably not, but but the (char)0xff character might be used in some
locales, I don't know really. 

>                                                 The only ones I know of
> > are tolower/toupper, which return 0 for EOF (funny, toupper/tolower
> > return *unsigned* char!).  If we change that to return 0xff for EOF,
> > then it won't matter if EOF==0xff, and we can just mask the value
> > we're given with 0xff and be done with it (not even add 1).

Perfectly possible, but some people (not me) have working code that use
djgpp implementation specific code by writing to the character
information table in order to support different extended character sets.

> You could always treat EOF as a special case in toupper/tolower.

That is IMHO the best solution if we don't change the default of char.
However, this complicates the object code. And since the is* macro's are
used quite often, this will mean that the libc would get slower.
 
> If ANDing with 0xff is not a requirement, we could say something like
> this:
> 
>  static unsigned short ctype_flags[] = {
>    ... /* put here what's now in __dj_ctype_flags[] */
>  };
> 
>  unsigned short *__dj_ctype_flags = &ctype_flags[1];
> 
> and then define the macros like this:
> 
>  #define isalnum(c)  (__dj_ctype_flags[(int)(c)] & __dj_ISALNUM)
> 
> which should work with EOF and 0xff alike.
> 
> Is anything wrong with this way?

Yes, very wrong:
The ANSI standard requires that the is* macros work for any int value in
the range of "unsigned char" and for EOF. The anding was primarily used
to turn the values in the signed char range into the unsigned char
range. Unfortunately it does also for EOF.
When you use the is* macro's 

#define isalnum(c)  (__dj_ctype_flags[(int)(c)] & __dj_ISALNUM)

then these do not work for values in the range(unsigned char) / range
(signed char), because you have a negative index in the array.

BTW I have numerous solutions in mind (like all mentioned so far), but
they all depend on the default of "char", and the best (=most
performant?) solution requires that "char" defaults to "unsigned char".

-- 
 \ Vik /-_-_-_-_-_-_/   
  \___/ Heyndrickx /          
   \ /-_-_-_-_-_-_/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019