Mail Archives: djgpp-workers/1998/03/24/07:47:26
Eli Zaretskii wrote:
>
> On Mon, 23 Mar 1998, Vik Heyndrickx wrote:
>
> > Isn't the REAL problem here that the ^Z should never get returned by a
> > read operation from a character device/file.
>
> This can be done (and is done by DJGPP's libc) only if the file is read
> in text mode. Binary reads cannot ignore and/or filter data, or they
> will betray their users.
>
> The case in point is precisely one of those when the file is read in
> binary mode, but written in text mode, because writing the console in
> binary mode has nasty side-effects (which I described). In this case, I
> don't see how can the ^Z be filtered during input.
Can't the libc be extended so that it autodetects the file contents
type, i.e. text or binary data, e.g. by means of the following
criterium:
If a file contains other characters than isascii()-characters before the
first CR LF is read, then the file is binary. If the CR LF is not
encountered within the first 162 characters, then it is binary. If ^Z is
the last character of the file (at the real EOF position) or if it is
followed by only ^Z characters before the real EOF position then the ^Z
characters are not returned (in a text file then).
Of course this criterium is to rough and needs fine-tuning, but it seems
an achievable goal, but putting this in the almost monolithic libc file
functions will probably be too difficult. Note that this may even be
achievable real-time, i.e. when the file is just opened it is in an
undetermined state. When data is read in and passed along to the user,
this state can change to a determinate state, i.e. either text or
binary. As long as the state is indeterminate this also means that the
data so far read does not contain any characters which can make a
difference between text vs. binary. In the worst case this algorithm
needs a one byte read ahead (but that is the same for a normal CR LF ->
NL translation, although I'm not sure how this gets done in the libc
functions (it might simply discard the CR's))
If a user does a binary open of a text file then that is his own fault,
and let him deal with the consequences (yes, I know, that means that the
one who ports a package has to do this)
> > IMO the way text-data is
> > stored should be entirely transparent to the user program (AFAIK POSIX
> > requires this), this means that the read functions should do CR/LF to NL
> > and ^Z to EOF translations. AFAIK this is enough to ensure that ^Z never
> > gets passed to the write functions.
>
> This is all so, but only for text-mode reads. Binary reads don't change
> the file's data at all.
The point I am trying to make is that the problem originates in the read
of binary files that are written to text files and that it is not libc's
task to make the write functions accept the binary data is if it were
text, but that the read functions should operate on text data when it IS
text data. You are trying to solve the problem where it occurs, I try to
point you to the origin of the problem and that is nearly always a
better place to deal with a problem.
> > IMO, there are only two cases: text files/devices and binary
> > files/devices. I don't see any use for making a distinction between
> > cooked-mode devices and files (I almost wrote cooked devices :-) )
>
> DOS doesn't have a notion of ``binary'' vs ``text'' devices.
(raw vs. cooked) == (binary vs. text) != (device vs. file)
> > IMO, a ^Z (at any place in the output data) should turn a file in EOF
> > mode, and let write and family ignore any further output to that file
> > (until the EOF indicator gets reset).
>
> This has several problems. First, you need to search every buffer for ^Z
> characters, which is expensive in functions like `_write' which don't
> usually examine every byte (we could have a 64KB transfer buffer).
This ^Z checking should happen at the same level as the CR LF -> NL
translation, i.e. only for text files.
> Second, what do you return as the number of bytes written to the caller?
Yes, I know that seems a problem, just the number of characters that
preceded this ^Z, i.e. 0 for a buffer of data starting with ^Z. That has
a few problems of its own like you mentionned yourself, but note very
well we are here not at the origin of our problem, which means trouble
by default.
--
\ Vik /-_-_-_-_-_-_/
\___/ Heyndrickx /
\ /-_-_-_-_-_-_/
- Raw text -