X-Authentication-Warning: acp3bf.physik.rwth-aachen.de: broeker owned process doing -bs Date: Mon, 2 Oct 2000 12:31:27 +0200 (MET DST) From: Hans-Bernhard Broeker X-Sender: broeker AT acp3bf To: Eli Zaretskii cc: djgpp-workers AT delorie DOT com Subject: Re: scanf and invalid FP fields In-Reply-To: <8296-Fri29Sep2000234330+0300-eliz@is.elta.co.il> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Reply-To: djgpp-workers AT delorie DOT com Errors-To: nobody AT delorie DOT com X-Mailing-List: djgpp-workers AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk On Fri, 29 Sep 2000, Eli Zaretskii wrote: > > Date: Fri, 29 Sep 2000 15:03:18 +0200 (MET DST) > > From: Hans-Bernhard Broeker > > > > +++ quote (7.9.6.2, paragraph [#9], sentence 2, in the C9x draft) > > An input item is defined as the longest sequence of input characters which > > does not exceed any specified field width and which is, > > *or*is*a*prefix*of*, a matching input sequence. > > +++ quote ends > > Thanks for the footwork, but I don't see how this makes the example > correct; can you explain? If we were to strictly what the standard says, the above sentence would mandate that the string "100ergs" would have to be split into 'input items' "100e" and "rgs", not "100" and "ergs", because "100e" is longer than "100", but still the prefix of a matching input sequence (like "100e10", say). So the input item seen by the "%lf" scanf format would be "100e", here. This would then be detected as not matching the format, correctly, i.e. it's a matching failure --> scanf() would have to return zero, for no successful match was found. The rationale behind this is hinted at by the footnote: this rule allows 'fscanf' to get away with never reading ahead by more than 1 character. To 'tokenize' "100ergs" as "100"/"ergs", it'd have to read until it has "100er", notice that this doesn't fit the pattern of an FP constant, and ungetc() both the 'e' and the 'r' --- but ungetc() isn't guaranteed to be able to handle more than a single putback character. Of course, *our* ungetc() can handle more than a single character's worth of backlog, but that's an implementation-specific extension of the standard. The result of an fscanf() should not usually depend on such implementation-defined details. > In any case, the behavior of other libc's we saw in this thread also > looks consistent with the Standard, so I think we should behave like > the majority of libraries and break "100ergs" into 100.0 and "ergs". I'm not really sure. The problem with this is that the status of this nitbit has changed, between the C89 and C99 standard versions. In C89, this was a non-normative example with no visible rule in the normative parts of the standard to support it. In C99 the actual rules were clarified, and now the behaviour of the example is justified by the standard. On the other hand, none of the systems I know have taken on C99 compliance, yet --- they're all C89 implementations, still. This means all the testing was on C89 libraries. I think the real question is what we consider more important, right now: C99 compliance, or existing practice. The two obviously disagree. It remains to be seen whether existing practice will change anytime soon... We just might win the cup for being the first to do it ;-) Hans-Bernhard Broeker (broeker AT physik DOT rwth-aachen DOT de) Even if all the snow were burnt, ashes would remain.