Sender: vheyndri AT rug DOT ac DOT be Message-Id: <34DAC2A1.7E06@rug.ac.be> Date: Fri, 06 Feb 1998 08:58:25 +0100 From: Vik Heyndrickx Mime-Version: 1.0 To: Eli Zaretskii Cc: DJ Delorie , djgpp-workers AT delorie DOT com Subject: Re: char != unsigned char... sometimes, sigh References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Precedence: bulk Eli Zaretskii wrote: > On Wed, 4 Feb 1998, DJ Delorie wrote: > > > OK, then, how do we fix it? Is there ever a case where the is*() > > functions/macros *care* if it's EOF or 0xff? Probably not, but but the (char)0xff character might be used in some locales, I don't know really. > The only ones I know of > > are tolower/toupper, which return 0 for EOF (funny, toupper/tolower > > return *unsigned* char!). If we change that to return 0xff for EOF, > > then it won't matter if EOF==0xff, and we can just mask the value > > we're given with 0xff and be done with it (not even add 1). Perfectly possible, but some people (not me) have working code that use djgpp implementation specific code by writing to the character information table in order to support different extended character sets. > You could always treat EOF as a special case in toupper/tolower. That is IMHO the best solution if we don't change the default of char. However, this complicates the object code. And since the is* macro's are used quite often, this will mean that the libc would get slower. > If ANDing with 0xff is not a requirement, we could say something like > this: > > static unsigned short ctype_flags[] = { > ... /* put here what's now in __dj_ctype_flags[] */ > }; > > unsigned short *__dj_ctype_flags = &ctype_flags[1]; > > and then define the macros like this: > > #define isalnum(c) (__dj_ctype_flags[(int)(c)] & __dj_ISALNUM) > > which should work with EOF and 0xff alike. > > Is anything wrong with this way? Yes, very wrong: The ANSI standard requires that the is* macros work for any int value in the range of "unsigned char" and for EOF. The anding was primarily used to turn the values in the signed char range into the unsigned char range. Unfortunately it does also for EOF. When you use the is* macro's #define isalnum(c) (__dj_ctype_flags[(int)(c)] & __dj_ISALNUM) then these do not work for values in the range(unsigned char) / range (signed char), because you have a negative index in the array. BTW I have numerous solutions in mind (like all mentioned so far), but they all depend on the default of "char", and the best (=most performant?) solution requires that "char" defaults to "unsigned char". -- \ Vik /-_-_-_-_-_-_/ \___/ Heyndrickx / \ /-_-_-_-_-_-_/