delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp-workers/1998/02/10/03:16:44

Sender: vheyndri AT rug DOT ac DOT be
Message-Id: <34E00C53.175A@rug.ac.be>
Date: Tue, 10 Feb 1998 09:14:11 +0100
From: Vik Heyndrickx <Vik DOT Heyndrickx AT rug DOT ac DOT be>
Mime-Version: 1.0
To: DJ Delorie <dj AT delorie DOT com>
Cc: djgpp-workers AT delorie DOT com
Subject: Re: char != unsigned char... sometimes, sigh (long)
References: <199802092259 DOT RAA14332 AT delorie DOT com>

In order to have a very last try to make you all reconsider that
changing the default of 'char' from 'signed' to 'unsigned', I tried to
summarize all points of view.

Pro's:
------
- EOF is an element of the 'signed char' range which means that no
matter what trickery is applied, only 256 distinct values can be
represented of which EOF is one. This has as a consequence that locales
that have a real character defined for value (char)255 (i.e. EOF) cannot
be supported by ANY is* macro's, no matter how smart implemented. Those
locales are non-fictitious. The 'unsigned char' range is completely
distinct from EOF, which makes it much easier to implement a optimal
performing library.
- many users do not expect that '`' (the Greek letter alpha with EASCII
value 224 in DOS CP 437 or 850) is not equal to 224. IMHO, this is
strongly counterintuitive. This triggers unexpected (and never wanted)
outputs in printf("%d", ...) and printf ("%u", ...) statements.
- the is* macro's expect, as defined in the ANSI standard, that a value
in the unsigned char range or EOF is passed. Although it is not really a
requirement to support broken programs, the potential number of problems
that may arise if 'char' defaults to 'signed' is much larger, because
many users are not aware of this idiosyncrazy and they prefer to use
'char' over 'unsigned char'. Even if we put a warning about this in a
bright flashing box, it will still be ignored.
- All DOS compilers that I know about (not many), use 'unsigned char' by
default. SGI uses 'unsigned char' ;)
- A char can be used as an array subscript, especially in translation
tables. Most of the time (99%) the user does not expect that this value
can be negative.
- (implicit) casting from an unsigned type to a signed type goes usually
as expected for values not exceeding half of the type's range; something
which cannot be said from the other way around. This is of course not
specifically important to 'char' only, but it certainly applies.
Anti-pro's:
-----------
- If the user decides to live dangerously, let him deal with the
consequences
  of that.
Anti-Anti-Pro:
- As implementors we should protect our users up to a certain level
against their own ignorance }:-|. You don't give a gun to a five year
old. (no, I don't think djgpp users are criminals, nor do I assess there
intellectual skills to be that of a five year old (except the five year
old boys and girls that use djgpp (Yes, I'm also starting to think that
I fall in the second category (Pfff...))))
Anti-cons:
----------
- As the gnu compiler supports both platforms, changing the compiler's
default should be really easy.
- The gnu utilities are supposed run on both kind of architectures. This
means that porting should be no problem at all.
- Despite what I said above, most use of 'char' couldn't care less if
were 'signed' or 'unsigned'. Changing it will most probably have no
large consequences.
- The user can always explicitely specify 'signed' or 'unsigned' if it
is that what he wants.
- If the user want his program to behave in an implementation specific
way he can always specify "-funsigned-char" or "-fsigned-char" at the
command line.
Cons:
-----
- The current djgpp compiler & libc library use 'signed char'. Changing
this may break existing code, although the chance to that is relatively
small: Since it is implementation defined if 'char' is 'signed' or
'unsigned', the only code that is subject to get broken are program's
that are written non-portably.
- Doing such a major change (I have my doubts about 'major') is not in
its place for a minor release update.
Solution:
  - Postpone this modification to the next 'major' release.
  - or, make it a major release (IMHO yet a lot more than a few bug
fixes were done in 2.02alpha).
- Some pre-ANSI compilers only support 'char' and no 'signed char'. Only
positive values could be stored and older broken programs could expect
that also negative values should fit in a 'char'.
  - IMO, maintaining compatibility with obsolete compilers is not an
issue.
  - unsigned char store values up to 255 whereas signed char only can up
to 127. This can ALSO be an advantage.

I can only make suggestions...

-- 
 \ Vik /-_-_-_-_-_-_/   
  \___/ Heyndrickx /          
   \ /-_-_-_-_-_-_/


- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019