X-Spam-Check-By: sourceware.org Message-ID: <45093972.7080606@byu.net> Date: Thu, 14 Sep 2006 05:13:54 -0600 From: Eric Blake User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.5) Gecko/20060719 Thunderbird/1.5.0.5 Mnenhy/0.7.4.666 MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: bash-3.1-7 BUG References: <091320060438 DOT 11140 DOT 45078B490008FD8600002B8422007610640A050E040D0C079D0A AT comcast DOT net> <20060913052510 DOT GB1256 AT trixie DOT casa DOT cgf DOT cx> <45089854 DOT 8010705 AT scytek DOT de> <20060914001902 DOT GB24899 AT trixie DOT casa DOT cgf DOT cx> <4508ABAF DOT 5090408 AT scytek DOT de> <20060914020737 DOT GC24899 AT trixie DOT casa DOT cgf DOT cx> In-Reply-To: <20060914020737.GC24899@trixie.casa.cgf.cx> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 According to Christopher Faylor on 9/13/2006 8:07 PM: > > I doubt that Eric will want to deal with the fallout of having bash not > understand \r\n line endings but, if he does, it would be his decision > and, again, I would support it 100%. I am very eager to see things like > configure scripts work faster and if we have to drop a few scared or > lazy people along the way to accomplish that goal, that's fine with me. > I have no problem at all with being a part of a smaller community which > doesn't need to use notepad to edit their bash scripts. Here's the difference between 3.1-7 and 3.1-8: diff -u bash-3.1/input.c bash-3.1/input.c - --- bash-3.1/input.c 2006-09-08 16:58:58.703125000 -0600 +++ bash-3.1/input.c 2006-09-14 04:13:11.359375000 -0600 @@ -166,6 +166,10 @@ bp->b_used = bp->b_inputp = bp->b_flag = 0; if (bufsize == 1) bp->b_flag |= B_UNBUFF; +#ifdef __CYGWIN__ + if ((fcntl (fd, F_GETFL) & O_TEXT) != 0) + bp->b_flag |= B_TEXT; +#endif return (bp); } @@ -442,6 +446,25 @@ { ssize_t nr; +#ifdef __CYGWIN__ + /* lseek'ing on text files is problematic; lseek reports the true + file offset, but read collapses \r\n and returns a character + count. We cannot reliably seek backwards if nr is smaller than + the seek offset encountered during the read, and must instead + treat the stream as unbuffered. */ + if ((bp->b_flag & (B_TEXT | B_UNBUFF)) == B_TEXT) + { + off_t offset = lseek (bp->b_fd, 0, SEEK_CUR); + nr = zread (bp->b_fd, bp->b_buffer, bp->b_size); + if (nr > 0 && nr < lseek (bp->b_fd, 0, SEEK_CUR) - offset) + { + lseek (bp->b_fd, offset, SEEK_SET); + bp->b_flag |= B_UNBUFF; + nr = zread (bp->b_fd, bp->b_buffer, bp->b_size = 1); + } + } + else +#endif nr = zread (bp->b_fd, bp->b_buffer, bp->b_size); if (nr <= 0) { @@ -454,15 +477,6 @@ return (EOF); } - -#if defined (__CYGWIN__) - - /* If on cygwin, translate \r\n to \n. */ - - if (nr >= 2 && bp->b_buffer[nr - 2] == '\r' && bp->b_buffer[nr - 1] == '\n') - - { - - bp->b_buffer[nr - 2] = '\n'; - - nr--; - - } - -#endif - - bp->b_used = nr; bp->b_inputp = 0; return (bp->b_buffer[bp->b_inputp++] & 0xFF); only in patch2: unchanged: - --- bash-3.1-orig/input.h 2002-01-30 07:11:47.000000000 -0700 +++ bash-3.1/input.h 2006-09-14 03:29:05.484375000 -0600 @@ -47,6 +47,7 @@ #define B_ERROR 0x02 #define B_UNBUFF 0x04 #define B_WASBASHINPUT 0x08 +#define B_TEXT 0x10 /* Text stream, when O_BINARY is nonzero */ /* A buffered stream. Like a FILE *, but with our own buffering and synchronization. Look in input.c for the implementation. */ My thoughts on the matter are that if you use binary mounts (and I highly recommend them), then every character in your file is important. Since bash on Linux does not ignore \r, and POSIX does not allow bash to ignore \r by default (although you can set IFS to include \r as a whitespace character to ignore), then neither should bash on a binary cygwin file. If you use text mounts, then this patch is smart enough to buffer data up until the point that an \r\n pair is converted by the text mode file into a single character, at which point the lseek optimization breaks down and the text mode file is subsequently processed a byte at a time. If you need DOS line endings, use a text mount. If you need speed, use UNIX line endings on a binary mount, although even UNIX line endings on a text mount will be faster than DOS line endings. Case closed, since I'm the maintainer, and I really don't want to bother with anything larger than the above patch (and also plan on submitting the above patch upstream, where it is less likely to be accepted if it is larger). - -- Life is short - so eat dessert first! Eric Blake ebb9 AT byu DOT net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.1 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFCTlx84KuGfSFAYARArO1AKDE7x39iX74iMoG8Sr8In2V+HgKgwCdGoNd LCtH7JfK+6MNue1KjRlbMvE= =KYu0 -----END PGP SIGNATURE----- -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/