delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp-workers/2000/10/02/06:32:35

X-Authentication-Warning: acp3bf.physik.rwth-aachen.de: broeker owned process doing -bs
Date: Mon, 2 Oct 2000 12:31:27 +0200 (MET DST)
From: Hans-Bernhard Broeker <broeker AT physik DOT rwth-aachen DOT de>
X-Sender: broeker AT acp3bf
To: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
cc: djgpp-workers AT delorie DOT com
Subject: Re: scanf and invalid FP fields
In-Reply-To: <8296-Fri29Sep2000234330+0300-eliz@is.elta.co.il>
Message-ID: <Pine.LNX.4.10.10010021215090.31649-100000@acp3bf>
MIME-Version: 1.0
Reply-To: djgpp-workers AT delorie DOT com
Errors-To: nobody AT delorie DOT com
X-Mailing-List: djgpp-workers AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com

On Fri, 29 Sep 2000, Eli Zaretskii wrote:

> > Date: Fri, 29 Sep 2000 15:03:18 +0200 (MET DST)
> > From: Hans-Bernhard Broeker <broeker AT physik DOT rwth-aachen DOT de>
> > 
> > +++ quote (7.9.6.2, paragraph [#9], sentence 2, in the C9x draft)
> > An input item is defined as the longest sequence of input characters which
> > does not exceed any specified field width and which is,
> > *or*is*a*prefix*of*, a matching input sequence.
> > +++ quote ends
> 
> Thanks for the footwork, but I don't see how this makes the example
> correct; can you explain?

If we were to strictly what the standard says, the above sentence would
mandate that the string "100ergs" would have to be split into 'input
items' "100e" and "rgs", not "100" and "ergs", because "100e" is longer
than "100", but still the prefix of a matching input sequence (like
"100e10", say). So the input item seen by the "%lf" scanf format would
be "100e", here. This would then be detected as not matching the format,
correctly, i.e. it's a matching failure --> scanf() would have to return
zero, for no successful match was found.

The rationale behind this is hinted at by the footnote: this rule allows
'fscanf' to get away with never reading ahead by more than 1 character. To
'tokenize' "100ergs" as "100"/"ergs", it'd have to read until it has
"100er", notice that this doesn't fit the pattern of an FP constant, and
ungetc() both the 'e' and the 'r' --- but ungetc() isn't guaranteed to be
able to handle more than a single putback character.

Of course, *our* ungetc() can handle more than a single character's worth
of backlog, but that's an implementation-specific extension of the
standard. The result of an fscanf() should not usually depend on such
implementation-defined details.

> In any case, the behavior of other libc's we saw in this thread also
> looks consistent with the Standard, so I think we should behave like
> the majority of libraries and break "100ergs" into 100.0 and "ergs".

I'm not really sure. The problem with this is that the status of this
nitbit has changed, between the C89 and C99 standard versions.  In C89,
this was a non-normative example with no visible rule in the normative
parts of the standard to support it. In C99 the actual rules were
clarified, and now the behaviour of the example is justified by the
standard.

On the other hand, none of the systems I know have taken on C99
compliance, yet --- they're all C89 implementations, still. This means all
the testing was on C89 libraries.

I think the real question is what we consider more important, right now:
C99 compliance, or existing practice. The two obviously disagree. It
remains to be seen whether existing practice will change anytime soon...
We just might win the cup for being the first to do it ;-)

Hans-Bernhard Broeker (broeker AT physik DOT rwth-aachen DOT de)
Even if all the snow were burnt, ashes would remain.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019