X-Authentication-Warning: delorie.com: mail set sender to djgpp-workers-bounces using -f Date: Wed, 25 Feb 2004 09:49:09 -0600 From: Eric Rudd Subject: Re: strtoul bug (was Re: Fibonacci number) In-reply-to: To: Eli Zaretskii Cc: djgpp-workers AT delorie DOT com Message-id: <403CC3F5.80909@cyberoptics.com> Organization: CyberOptics MIME-version: 1.0 Content-type: text/plain; charset=ISO-8859-1; format=flowed Content-transfer-encoding: 7bit X-Accept-Language: en,pdf User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.5) Gecko/20030925 References: <4038E8CA DOT 6491815E AT virginia DOT edu> <4039DD96 DOT 3F36F3B7 AT yahoo DOT com> <200402231458 DOT i1NEwKwm020904 AT envy DOT delorie DOT com> <403A301C DOT E8F5FE65 AT yahoo DOT com> <200402231751 DOT i1NHp5lv022894 AT envy DOT delorie DOT com> <403A5F55 DOT 7E608910 AT yahoo DOT com> <200402232034 DOT i1NKYqrt024366 AT envy DOT delorie DOT com> <403ADB4A DOT ED9F77C2 AT yahoo DOT com> <403BD3E2 DOT 90602 AT cyberoptics DOT com> X-MailScanner-Information: Please contact the ISP for more information X-MailScanner: Found to be clean Reply-To: djgpp-workers AT delorie DOT com Errors-To: nobody AT delorie DOT com X-Mailing-List: djgpp-workers AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk Eli Zaretskii wrote: >I'm not sure the current behavior is wrong. A case in point is a >line like below from a somewhat buggy C program: > > unsigned foo = -5; > >What do we expect a typical C compiler to produce? I think we expect >it to treat this as > > unsigned foo = (unsigned)-5; > > Yes, that's exactly what I'd expect, too. Of course, it's bad style to code like this, but I think that there's nothing in the standard that prohibits it, and, in fact, the standard type conversion rules would cause it to behave exactly as you have shown. >which produces what the current strtoul does. The reason this is >relevant is that a compiler (or any other similar processor) could >use strtoul to convert the string to an unsigned value. > > I think that would be contrary to the standard, since -5 is a signed type, and therefore strtol() should be used instead. Now, if someone wrote unsigned foo = -5u; then I think there should be a compiler error message, because the "u" indicates an unsigned constant and -5 is not a proper unsigned constant. Unfortunately, none of the compilers I have tried (gcc, Borland, Watcom) agree with me on that point. >Another case in point is > > int bar = strtoul("-5", NULL, 0); > >Don't we expect `bar' to be assigned the value -5? > > Based on my understanding of the standard, since -5 is out of the range of unsigned longs, strtoul() should then return ULONG_MAX and set errno. The behavior you suggest would be sensible, but there are certain difficulties with this approach (see below). >So in my view, setting errno is okay, but the value should not >blindly be ULONG_MAX. I think ULONG_MAX is for the cases where we >try to convert a string whose numerical value cannot be reasonably >represented as a 32-bit number, like "11111111111111111111111111111" >or some such. > I think the key here is the precise meaning of "32-bit number". I'd argue that for strtoul(), this means "unsigned long". For instance, what should strtoul return for "-2147483648"? How about "-2147483649"? At what point does the value of this string get "out of range"? Similarly, I'd expect strtol("4294967295", NULL, 0) to return LONG_MAX and set errno, rather than quietly return -1. I haven't checked to see what it actually does. I think the current "modulo 2^32" behavior is sensible, but nonetheless contrary to the standard. It's actually pretty difficult to make this work consistently for very long inputs as in your example above, because of the need for extended-precision arithmetic; it's easier to signal an overflow for any inputs of more than, say, ten digits. >FWIW, the GNU/Linux documentation of strtoul explicitly says that >ULONG_MAX is returned and errno is set only if the original value >would overflow. Curiously, it also says: > > The strtoul() function returns either the result of the > conversion or, if there was a leading minus sign, the > negation of the result of the conversion > >Does this mean it will return 5 in the last example above? Can >someone try this on a GNU/Linux box? > There is similar language in the standard, and it takes a pretty close reading to see that there is no absolute value taken. The key is that the "conversion" the standard mentions is of the "subject sequence", which can be "optionally preceded by a plus or minus sign". Therefore, the subject sequence itself does not include the sign, and the action is as follows: the digits are converted, the sign gets applied as one would expect, then the result is compared to the limits for the integer type of the result, and an error return is made if the constant is out of range for that integer type. Let's see what the experts in comp.std.c have to say about this. -Eric