Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Date: Mon, 22 Jul 2002 15:06:05 +0800 From: Greg Matheson To: cygwin AT cygwin DOT com Subject: perl & \n (was: perl-5.8.0 breaks code "working" on 5.6.1 over "\n" Message-ID: <20020722150604.A85877@ms.chinmin.edu.tw> Mail-Followup-To: cygwin AT cygwin DOT com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20020719105932.A11717@ms.chinmin.edu.tw> This is a followup to my email about differences between the way perl-5.8.0 and perl-5.6.1 handle \n. This is a cygwin problem but also one about portable code. I am going to write about some experimentation. A lot of this is for my own benefit, trying to understand \n on cygwin. The code I am interested in is comparing the length of a string with \n in it and the corresponding number of bytes in the file the string comes from. According to perldoc perlcygwin: o Text/Binary When a file is opened it is in either text or binary mode. In text mode a file is subject to CR/LF/Ctrl-Z translations. With Cygwin, the default mode for an open() is determined by the mode of the mount that underlies the file. Perl provides a binmode() func- tion to set binary mode on files that otherwise would be treated as text. sysopen() with the "O_TEXT" flag sets text mode on files that otherwise would be treated as binary: Here is code which shows the problem. -s file returns the size of the message on disk in bytes. length returns the number of bytes in the string. #!/usr/bin/perl # sysopen(O, "file.txt", O_WRONLY|O_CREAT|O_TEXT) # open line 1 # open O, ">file.txt"; # open line 2 # binmode O, ':raw'; # open line 3 print O "123\n567\n"; close O; open I, "file.txt"; while ( ) { $string .= $_; } print "String is: $string\n"; print "Length of string is: " . length ( $string ) . "\n"; print "Size of file is: " . -s "file.txt"; For Win32 systems: This is perl, v5.6.1 built for MSWin32-x86-multi-thread With just open O, ">file.txt" uncommented String is: 123 567 Length of string is: 8 Size of file is: 10 As a standard DOS text file, \n is CRLF, so -s "file.txt" ne length $string If binmode O, ':raw' is also uncommented String is: 123 567 Length of string is: 8 Size of file is: 8 Just the \cJ is written to disk. But then the file is unreadable by Notepad and other Windows applications. This was for comparison. Of course on Unix, binmode doesn't make any difference. The length of both string and file are 8. Now for cygwin. With perl, v5.6.1 built for cygwin-multi I am going to try the combinations suggested in perldoc perlcygwin. With just the 'open O, ">file.txt";' line uncommented, sizetest.pl String is: 123 567 Length of string is: 10 Size of file is: 8 That's strange, that the opposite of Win32 where the string is 8 bytes and the file is 10. I think the underlying mode here is text mode. I think on cygwin, perl's -s test must know what the underlying mode is, which it doesn't know on Win32. Now with 'binmode O, ':raw';' uncommented to force the binary mode write on "file.txt". sizetest.pl String is: 123 567 Length of string is: 8 Size of file is: 8 Only \cJ is being written to disk. Now with 'sysopen(O, "file.txt", O_WRONLY|O_CREAT|O_TEXT);' to force text mode. String is: 123 567 Length of string is: 8 Size of file is: 8 Here the read seems to have been aware that the file was text mode. But in this same case, with the O_TEXT flag, if the file has no 'txt' extension. String is: 123 567 Length of string is: 8 Size of file is: 10 The result of the -s test, at least for 5.6.1, and at least in the case of a force text mode write, seems to be dependent on whether it has a 'txt' extension. The name does not have an effect when I try forcing the binary mode write, or the default mode write. Now with the new perl-5.8.0, there is PerlIO, which apparently replaces the C stdio library, and disciplines, which expand binmode possibilities beyond :raw and :crlf. With just the 'open O, ">file.txt";' the default write line uncommented, String is: 123 567 Length of string is: 8 Size of file is: 10 This is the opposite of 5.6.1, at least with underlying text mode, and the same as Win32 with 5.6.1. The underlying mode is DOS text, I am pretty sure. Again with 'binmode O, ':raw';' uncommented to force the binary mode write on "file.txt". String is: 123 567 Length of string is: 8 Size of file is: 8 This is the same as 5.6.1. And with 'sysopen(O, "file.txt", O_WRONLY|O_CREAT|O_TEXT);' to force text mode. String is: 123 567 Length of string is: 8 Size of file is: 8 This is the same as 5.6.1. And now, with 5.8.0, whether the file has an extension or not doesn't matter for any of these write methods. Well, that's the end of my experiments. They are not complete. I didn't try default binary mode mounts. It seems the most reliable or portable or something are forced text mode writes, to a file with a 'txt' file extension, or is that just a jump to a conclusion. In any case, with the aim being to write portable code that also works on cygwin whether you are using 5.6.1 or 5.8.0, what do we suggest to developers? I don't think you want to force a binary file mail message format on cygwin users, because this would mean you couldn't use Windows applications. I also don't think developers are keen on sprinkling binmode all through their IO routines either, especially when it won't help solve Win32 problems. -- Greg Matheson You can't get there from here. Chinmin College Taiwan Penpals Archive -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/