Mail Archives: djgpp-workers/1998/02/06/03:00:49
Eli Zaretskii wrote:
> On Wed, 4 Feb 1998, DJ Delorie wrote:
>
> > OK, then, how do we fix it? Is there ever a case where the is*()
> > functions/macros *care* if it's EOF or 0xff?
Probably not, but but the (char)0xff character might be used in some
locales, I don't know really.
> The only ones I know of
> > are tolower/toupper, which return 0 for EOF (funny, toupper/tolower
> > return *unsigned* char!). If we change that to return 0xff for EOF,
> > then it won't matter if EOF==0xff, and we can just mask the value
> > we're given with 0xff and be done with it (not even add 1).
Perfectly possible, but some people (not me) have working code that use
djgpp implementation specific code by writing to the character
information table in order to support different extended character sets.
> You could always treat EOF as a special case in toupper/tolower.
That is IMHO the best solution if we don't change the default of char.
However, this complicates the object code. And since the is* macro's are
used quite often, this will mean that the libc would get slower.
> If ANDing with 0xff is not a requirement, we could say something like
> this:
>
> static unsigned short ctype_flags[] = {
> ... /* put here what's now in __dj_ctype_flags[] */
> };
>
> unsigned short *__dj_ctype_flags = &ctype_flags[1];
>
> and then define the macros like this:
>
> #define isalnum(c) (__dj_ctype_flags[(int)(c)] & __dj_ISALNUM)
>
> which should work with EOF and 0xff alike.
>
> Is anything wrong with this way?
Yes, very wrong:
The ANSI standard requires that the is* macros work for any int value in
the range of "unsigned char" and for EOF. The anding was primarily used
to turn the values in the signed char range into the unsigned char
range. Unfortunately it does also for EOF.
When you use the is* macro's
#define isalnum(c) (__dj_ctype_flags[(int)(c)] & __dj_ISALNUM)
then these do not work for values in the range(unsigned char) / range
(signed char), because you have a negative index in the array.
BTW I have numerous solutions in mind (like all mentioned so far), but
they all depend on the default of "char", and the best (=most
performant?) solution requires that "char" defaults to "unsigned char".
--
\ Vik /-_-_-_-_-_-_/
\___/ Heyndrickx /
\ /-_-_-_-_-_-_/
- Raw text -