From: tiberius AT braemarinc DOT com (Gary R. Van Sickle) Subject: RE: Why text=binary mounts 10 Jan 1998 05:50:34 -0800 Message-ID: <01BD1D1B.99116B20.tiberius.cygnus.gnu-win32@braemarinc.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "gnu-win32 AT cygnus DOT com" >> I'm not sure how to do it though. One could just change the text mode. That would be o.k. for me but I'm not sure everybody would be happy with that. Another thought would be to invent another mode like "extended text mode" e.g. with an fopen() specifier "T", an open() flag O_ETEXT and an iostream mode ios::etext that could implement this. That way one could port tools to this mode by simply adding the flags just like you port binary tools now by adding O_BINARY, "b" or ios::bin. Does anbody else have an opinion on that problem? The real problem here is that files as they exist on disk don't have 'modes', they have formats. Adding 'modes' to a system that really doesn't work already will only make the situation worse. What I think is really needed is a Text Access Library (TAL) that sits *on top* of a *binary* stdio file and reads and writes lines from UNIX, DOS, Mac, maybe HTML, etc., etc., text files. Instead of fopen(???, "rt"), you'd use the library and then *not care* what the text file format is, only that it contains lines of text. This TAL would become part of the standard C library (or the GNU library at least, which would make it a de-facto standard), all the tools that were dealing with text would use it, and eventually the "t" functionality of stdio would be deprecated and the problem would be solved. I volunteer to write this library if someone else volunteers to get the GNU tools to use it. I propose the following features: 1. Written in portable ANSI C (no K&R compilers need apply) 2. Provides all the fscanf, fprintf, etc. (i.e. line reading and writing) functionality for text-containing files only 3. Provides some extended, cool features TBD 4. Reads and writes at least UNIX, DOS, and Mac, with maybe HTML, etc. formats coming later 5. Operates kind of in this wise: Opens for reading any supported format and they behave the same (i.e. 'read line', 'read next char', all retrieve the same text regardless of format), writes in the format selected by the programmer (i.e. the fopen equivalent would require a format specifier if a file is opened for write) Gary R. Van Sickle (tiberius AT braemarinc DOT com) Electrical Design Engineer Braemar Inc. 11481 Rupp Dr. Burnsville, MN 55337 (612) 890-5135 Ext. 144 Fax: (612) 882-6550 -----Original Message----- From: Benjamin Riefenstahl [SMTP:benny AT crocodial DOT de] Sent: Friday, January 09, 1998 6:51 AM To: gnu-win32 AT cygnus DOT com Subject: Re: Why text=binary mounts Hi All, I'm new here so please forgive if I'm missing something. I also have not yet a lot of experience with gnu-win32. I do have some experience with porting C and C++ and with the rules of these languages and how they affect porting. So this post that I'm replying to got my attention. marcus AT bighorn DOT dr DOT lucent DOT com wrote: > This is true as long as you are considering text files only. The problem > comes in when you also want to deal with binary files. On Unix systems, > of course, there is no difference in operations on either, so most Unix > programs open all files using the same open() or fopen() calls. On systems > that differentiate between these files, it is important to add O_BIARY or > O_TEXT to the second argument of open(), and "b" for binary files to the > second argument of fopen(). This tells the underlying routines whether to > apply any translation to the file. So far I agree. > If nothing is specified, the OS must > choose whether or not to make translations, and that is where the text=/!= > binary mounting comes in, as this specifies the default mode. No. At least for fopen() there is no choice. If you don't specify "b" you get text mode and that's that. An application that opens a binary file without the "b" has a bug. I don't think that fiddling with this (like "binary" mounts) actually helps. Fix the buggy source code instead, that seems to me is bound to be *much* more efficient in terms of developer and user time spent on the problem. BTW on DOS-like systems (DOS, Windows, OS/2) the RTL does the translation, not the OS. The OS just sets the guidelines how text should be represented and of course the OS tools enforce these guidelines. > Now, there are some difficulties in this implementation. First, since there > is no "t" that can be passed to fopen(), it is impossible to tell if a call > to fopen() wants a text mode open, or the default (blame POSIX/ANSI for that, > I guess). See above. The default is unambigously specified as text mode by the ISO C language standard. > ... However, if there exist Unix programs that call fopen() without > the "b" for binary files (since it isn't needed on Unix and was added to the > standard much later than the program may have been written), then these > programs won't run correctly without some additional porting effort. I'd prefer to invest a little time in porting the code instead of investing a lot of time in users tweaking their system. > The > same goes for programs that call open() without the O_BINARY bit set in the > second argument when opening binary files. Being that open() is a Unix call and Unix doesn't have the distinction between text and binary, it can be argued that the rules for Unix compatibility libraries can be made whatever one wants. It has been common practice though - and with good reason - to go by the same rules as C and C++ go with fopen() and iostreams: The default is text mode and you need the extra O_BINARY flag to get binary mode. This is done this way in all compilers that I know. > To compound this, there are times when it is extremely difficult to impossible > to tell if a file should be opened as text or binary. For instance, should > TAR open the files that it is writing to an archive as binary or text files? > How can it determine which to use? Some applications have a design problem here. AFAIK most ports that are designed for this allow the user to specify that all operations are to be done in binary, which is what I prefer always. I can always convert DOS text files to Unix text and back again. I can not convert a garbled binary file back to it's original form. > Sure, it's fun to play with cygwin32, but to me it doesn't seem reasonable to > try to develop it as a Linux replacement. I think that if it is to be truely > useful, cygwin32 must encourage interoperating with the native world that it > exists in. Part of that is running well in a text!=binary mounted world. > Sure, that means that porting programs to Cygwin32 means that you have to > install an awareness of binary v.s. text files, and that does mean more work > to port the programs, but it also produces more useful programs as well. Here we agree again ;-) Let me add another nit to the problem. I am actually using not only Unix and DOS but also Mac files. This means another variation in line ends: Unix uses , DOS uses and Macs use . In my world these are the prominent formats and most of my tools (editors, compilers and other commercial tools) agree with that. In DOS the translation for text mode works rather simple: On input all combinations are replaced by and on output all are replaced with . This means not only that DOS files are read correctly but also that Unix files are automatically read correctly. The coincidence is rather usefull, because in most simple tools one rarely ever needs to translate explicitly from Unix to DOS, most DOS tools get along with Unix files just fine. For my own programs I often implement an extension to this behaviour. Instead of only treating only and as line ends I also treat single the same. This means I loose the ability to use singe s for formatting but than the only files thus formatted that I have those are intended directly for a line printer. OTOH as I said I often have Mac files and with this arrangement these are read correctly. For my own programs this is done easy enough but when porting tools from Unix it's a lot more diffcult. Porting Unix tools to this mode would be a lot easier if this behaviour could be somehow included in the RTL itself (like ordinary text mode is now). I'm not sure how to do it though. One could just change the text mode. That would be o.k. for me but I'm not sure everybody would be happy with that. Another thought would be to invent another mode like "extended text mode" e.g. with an fopen() specifier "T", an open() flag O_ETEXT and an iostream mode ios::etext that could implement this. That way one could port tools to this mode by simply adding the flags just like you port binary tools now by adding O_BINARY, "b" or ios::bin. Does anbody else have an opinion on that problem? so long, benny ====================================== Benjamin Riefenstahl (benny AT crocodial DOT de) Crocodial Communications EntwicklungsGmbH Ophagen 16a, D-20257 Hamburg, Germany - For help on using this list (especially unsubscribing), send a message to "gnu-win32-request AT cygnus DOT com" with one line of text: "help". - For help on using this list (especially unsubscribing), send a message to "gnu-win32-request AT cygnus DOT com" with one line of text: "help".