Sender: vheyndri AT rug DOT ac DOT be Message-Id: <34E00C53.175A@rug.ac.be> Date: Tue, 10 Feb 1998 09:14:11 +0100 From: Vik Heyndrickx Mime-Version: 1.0 To: DJ Delorie Cc: djgpp-workers AT delorie DOT com Subject: Re: char != unsigned char... sometimes, sigh (long) References: <199802092259 DOT RAA14332 AT delorie DOT com> Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Precedence: bulk In order to have a very last try to make you all reconsider that changing the default of 'char' from 'signed' to 'unsigned', I tried to summarize all points of view. Pro's: ------ - EOF is an element of the 'signed char' range which means that no matter what trickery is applied, only 256 distinct values can be represented of which EOF is one. This has as a consequence that locales that have a real character defined for value (char)255 (i.e. EOF) cannot be supported by ANY is* macro's, no matter how smart implemented. Those locales are non-fictitious. The 'unsigned char' range is completely distinct from EOF, which makes it much easier to implement a optimal performing library. - many users do not expect that '`' (the Greek letter alpha with EASCII value 224 in DOS CP 437 or 850) is not equal to 224. IMHO, this is strongly counterintuitive. This triggers unexpected (and never wanted) outputs in printf("%d", ...) and printf ("%u", ...) statements. - the is* macro's expect, as defined in the ANSI standard, that a value in the unsigned char range or EOF is passed. Although it is not really a requirement to support broken programs, the potential number of problems that may arise if 'char' defaults to 'signed' is much larger, because many users are not aware of this idiosyncrazy and they prefer to use 'char' over 'unsigned char'. Even if we put a warning about this in a bright flashing box, it will still be ignored. - All DOS compilers that I know about (not many), use 'unsigned char' by default. SGI uses 'unsigned char' ;) - A char can be used as an array subscript, especially in translation tables. Most of the time (99%) the user does not expect that this value can be negative. - (implicit) casting from an unsigned type to a signed type goes usually as expected for values not exceeding half of the type's range; something which cannot be said from the other way around. This is of course not specifically important to 'char' only, but it certainly applies. Anti-pro's: ----------- - If the user decides to live dangerously, let him deal with the consequences of that. Anti-Anti-Pro: - As implementors we should protect our users up to a certain level against their own ignorance }:-|. You don't give a gun to a five year old. (no, I don't think djgpp users are criminals, nor do I assess there intellectual skills to be that of a five year old (except the five year old boys and girls that use djgpp (Yes, I'm also starting to think that I fall in the second category (Pfff...)))) Anti-cons: ---------- - As the gnu compiler supports both platforms, changing the compiler's default should be really easy. - The gnu utilities are supposed run on both kind of architectures. This means that porting should be no problem at all. - Despite what I said above, most use of 'char' couldn't care less if were 'signed' or 'unsigned'. Changing it will most probably have no large consequences. - The user can always explicitely specify 'signed' or 'unsigned' if it is that what he wants. - If the user want his program to behave in an implementation specific way he can always specify "-funsigned-char" or "-fsigned-char" at the command line. Cons: ----- - The current djgpp compiler & libc library use 'signed char'. Changing this may break existing code, although the chance to that is relatively small: Since it is implementation defined if 'char' is 'signed' or 'unsigned', the only code that is subject to get broken are program's that are written non-portably. - Doing such a major change (I have my doubts about 'major') is not in its place for a minor release update. Solution: - Postpone this modification to the next 'major' release. - or, make it a major release (IMHO yet a lot more than a few bug fixes were done in 2.02alpha). - Some pre-ANSI compilers only support 'char' and no 'signed char'. Only positive values could be stored and older broken programs could expect that also negative values should fit in a 'char'. - IMO, maintaining compatibility with obsolete compilers is not an issue. - unsigned char store values up to 255 whereas signed char only can up to 127. This can ALSO be an advantage. I can only make suggestions... -- \ Vik /-_-_-_-_-_-_/ \___/ Heyndrickx / \ /-_-_-_-_-_-_/