delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp-workers/2004/02/25/10:48:38

X-Authentication-Warning: delorie.com: mail set sender to djgpp-workers-bounces using -f
Date: Wed, 25 Feb 2004 09:49:09 -0600
From: Eric Rudd <rudd AT cyberoptics DOT com>
Subject: Re: strtoul bug (was Re: Fibonacci number)
In-reply-to: <uvflvelri.fsf@elta.co.il>
To: Eli Zaretskii <eliz AT elta DOT co DOT il>
Cc: djgpp-workers AT delorie DOT com
Message-id: <403CC3F5.80909@cyberoptics.com>
Organization: CyberOptics
MIME-version: 1.0
X-Accept-Language: en,pdf
User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.5) Gecko/20030925
References: <d7c3a0b2 DOT 0402220759 DOT 34d6435d AT posting DOT google DOT com>
<4038E8CA DOT 6491815E AT virginia DOT edu> <4039DD96 DOT 3F36F3B7 AT yahoo DOT com>
<200402231458 DOT i1NEwKwm020904 AT envy DOT delorie DOT com> <403A301C DOT E8F5FE65 AT yahoo DOT com>
<200402231751 DOT i1NHp5lv022894 AT envy DOT delorie DOT com> <403A5F55 DOT 7E608910 AT yahoo DOT com>
<200402232034 DOT i1NKYqrt024366 AT envy DOT delorie DOT com> <403ADB4A DOT ED9F77C2 AT yahoo DOT com>
<403BD3E2 DOT 90602 AT cyberoptics DOT com> <uvflvelri DOT fsf AT elta DOT co DOT il>
X-MailScanner-Information: Please contact the ISP for more information
X-MailScanner: Found to be clean
Reply-To: djgpp-workers AT delorie DOT com
Errors-To: nobody AT delorie DOT com
X-Mailing-List: djgpp-workers AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com

Eli Zaretskii wrote:

>I'm not sure the current behavior is wrong.  A case in point is a
>line like below from a somewhat buggy C program:
>
>     unsigned foo = -5;
>
>What do we expect a typical C compiler to produce?  I think we expect
>it to treat this as
>
>     unsigned foo = (unsigned)-5;
>  
>
Yes, that's exactly what I'd expect, too.  Of course, it's bad style to 
code like this, but I think that there's nothing in the standard that 
prohibits it, and, in fact, the standard type conversion rules would 
cause it to behave exactly as you have shown.

>which produces what the current strtoul does.  The reason this is
>relevant is that a compiler (or any other similar processor) could
>use strtoul to convert the string to an unsigned value.
>  
>
I think that would be contrary to the standard, since -5 is a signed 
type, and therefore strtol() should be used instead.  Now, if someone wrote

   unsigned foo = -5u;

then I think there should be a compiler error message, because the "u" 
indicates an unsigned constant and -5 is not a proper unsigned 
constant.  Unfortunately, none of the compilers I have tried (gcc, 
Borland, Watcom) agree with me on that point.

>Another case in point is
>
>	int bar = strtoul("-5", NULL, 0);
>
>Don't we expect `bar' to be assigned the value -5?
>  
>
Based on my understanding of the standard, since -5 is out of the range 
of unsigned longs, strtoul() should then return ULONG_MAX and set 
errno.  The behavior you suggest would be sensible, but there are 
certain difficulties with this approach (see below).

>So in my view, setting errno is okay, but the value should not
>blindly be ULONG_MAX.  I think ULONG_MAX is for the cases where we
>try to convert a string whose numerical value cannot be reasonably
>represented as a 32-bit number, like "11111111111111111111111111111"
>or some such.
>
I think the key here is the precise meaning of "32-bit number".  I'd 
argue that for strtoul(), this means "unsigned long".  For instance, 
what should strtoul return for "-2147483648"?  How about "-2147483649"?  
At what point does the value of this string get "out of range"?

Similarly, I'd expect strtol("4294967295", NULL, 0) to return LONG_MAX 
and set errno, rather than quietly return -1.  I haven't checked to see 
what it actually does.

I think the current "modulo 2^32" behavior is sensible, but nonetheless 
contrary to the standard.  It's actually pretty difficult to make this 
work consistently for very long inputs as in your example above, because 
of the need for extended-precision arithmetic; it's easier to signal an 
overflow for any inputs of more than, say, ten digits.

>FWIW, the GNU/Linux documentation of strtoul explicitly says that
>ULONG_MAX is returned and errno is set only if the original value
>would overflow.  Curiously, it also says:
>
>       The  strtoul()  function  returns either the result of the
>       conversion or, if there was  a  leading  minus  sign,  the
>       negation of the result of the conversion
>
>Does this mean it will return 5 in the last example above?  Can
>someone try this on a GNU/Linux box?
>
There is similar language in the standard, and it takes a pretty close 
reading to see that there is no absolute value taken.  The key is that 
the "conversion" the standard mentions is of the "subject sequence", 
which can be "optionally preceded by a plus or minus sign".  Therefore, 
the subject sequence itself does not include the sign, and the action is 
as follows: the digits are converted, the sign gets applied as one would 
expect, then the result is compared to the limits for the integer type 
of the result, and an error return is made if the constant is out of 
range for that integer type.

Let's see what the experts in comp.std.c have to say about this.

-Eric

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019