X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=1.0 required=5.0 tests=AWL,BAYES_50,DKIM_SIGNED,DKIM_VALID,FREEMAIL_FROM,KAM_VIAGRA1,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: sourceware.org MIME-Version: 1.0 In-Reply-To: References: <4DD36619 DOT 1010401 AT hima DOT com> Date: Tue, 7 Jun 2011 15:40:16 +0200 Message-ID: Subject: Re: 1.7.9: Problem with line endings of Perl output redirected to a file with textmode mounting From: Reini Urban To: cygwin AT cygwin DOT com Cc: pp Content-Type: multipart/mixed; boundary=0016e64ddf3e3d56d904a51f5c4c X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Note-from-DJ: This may be spam --0016e64ddf3e3d56d904a51f5c4c Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable 2011/5/24 Reini Urban: > 2011/5/18 Sven Severus: >> let me report a strange behaviour with Cygwin Perl (I'm using cygwin1.dll >> 1.7.9-1, full installation 2 weeks ago). >> >> File foo.h is an ordinary text file, all lines are terminated with DOS >> style line endings (hex: 0d 0a). >> It is located in a directory with textmode mounting in cygwin. >> One sequence of foo.h is split by a 4096 byte boundary within >> the file: "od -c -Ax foo.h" shows a (=3D'\r') at byte offset 4095 >> (0xfff) >> and a (=3D'\n') at offset 4096 (0x1000): >> ... >> 000ff0 =A0 / =A0 / =A0 / =A0 / =A0 / =A0 / =A0\r =A0\n =A0 / =A0 / =A0 X= =A0 X =A0 X =A0 X =A0 X =A0\r >> 001000 =A0\n =A0 / =A0 / =A0\r =A0\n =A0 / =A0 / =A0\r =A0\n >> 001009 >> >> Now I issued the command "perl -pe 's/12345/54321/' foo.h >foomod.h" >> to produce foomod.h, located in the same directory as foo.h, thus with >> textmode mounting too. >> When I examined the result, I noticed that foomod.h was one byte bigger >> then foo.h. I expected identical size, and "od -c -Ax foomod.h" reports: >> ... >> 000ff0 =A0 / =A0 / =A0 / =A0 / =A0 / =A0 / =A0\r =A0\n =A0 / =A0 / =A0 X= =A0 X =A0 X =A0 X =A0 X =A0\r >> 001000 =A0\r =A0\n =A0 / =A0 / =A0\r =A0\n =A0 / =A0 / =A0\r =A0\n >> 00100a >> >> Ups! The original sequence starting at offset 4095 (0xfff) >> became a three character sequence ! The is duplicated! >> >> In other files created by Perl with output redirection I observed this >> behaviour with every line ending, that is split by a 4096 byte >> boundary (even multiple times in one output file). Line endings not >> split by a 4096 byte boundary do not show this behaviour. >> >> The behaviour does not occur, when the destination file is located >> in a directory with binmode mounting. It does not occur either, when >> I use sed instead of Perl ("sed -e 's/12345/54321/' foo.h >foomod.h"), >> so I think the problem is specific to Cygwin Perl, not to Cygwin in >> general. >> >> I this a bug of the output buffering mechanism of Cygwin Perl? >> Or do I anything wrong? >> Any answer is highly appreciated. Thanks in advance. > > Yes, this looks like a PerlIO buffering bug for MSWin32 and cygwin. > The last char of the buffer is not stored when checking the first char > of the new buffer. > I think first we have to provide a sample test case to perl core. I could not reproduce it in perl core with the PerlIO :crlf layer, see attached test. I'm investigating cygwin buffer edge-case handling now. --=20 Reini Urban --0016e64ddf3e3d56d904a51f5c4c Content-Type: application/octet-stream; name="crlf-bufedge.patch" Content-Disposition: attachment; filename="crlf-bufedge.patch" Content-Transfer-Encoding: base64 X-Attachment-Id: f_gomwcmw20 ZGlmZm9yaWcgdC9pby9jcmxmLnQKCmRpZmYgLXUgdC9pby9jcmxmLnQub3Jp ZyB0L2lvL2NybGYudAotLS0gdC9pby9jcmxmLnQub3JpZwkyMDExLTAzLTI4 IDIxOjU5OjUxLjcyOTM3NjkwMCArMDIwMAorKysgdC9pby9jcmxmLnQJMjAx MS0wNi0wNyAxNTozNDowNy44MDgxMzAwMDAgKzAyMDAKQEAgLTEwLDEwICsx MCwxMCBAQAogdXNlIENvbmZpZzsKIAogCi1teSAkZmlsZSA9IHRlbXBmaWxl KCk7CitteSAkZmlsZSA9ICJ4eCI7ICN0ZW1wZmlsZSgpOwogCiB7Ci0gICAg cGxhbih0ZXN0cyA9PiAxNik7CisgICAgcGxhbih0ZXN0cyA9PiAyMCk7CiAg ICAgb2sob3BlbihGT08sIj46Y3JsZiIsJGZpbGUpKTsKICAgICBvayhwcmlu dCBGT08gJ2EnLigoKCdhJyB4IDE0KS5xcXtcbn0pIHggMjAwMCkgfHwgY2xv c2UoRk9PKSk7CiAgICAgb2sob3BlbihGT08sIjw6Y3JsZiIsJGZpbGUpKTsK QEAgLTcwLDYgKzcwLDIyIEBACiAJICAgIHVubGlrZSgkZm9vLCBxci9ceDBk XHgwZC8pOwogCX0KICAgICB9CisKKyAgICAjIFtwZXJsIDU4eHh4eF0gNDA5 NiBidWZzaXplIGVkZ2UtY2FzZTogXHI8YnVmZW5kPlxuIG5vdCBkZXRlY3Rl ZAorICAgICMgPT4gXHI8YnVmZW5kPlxyXG4KKyAgICBvcGVuKEZPTywiPjpj cmxmIiwkZmlsZSk7CisgICAgcHJpbnQgRk9PICgnLicgeCA0MDk1KS5xcXtc bn07CisgICAgY2xvc2UoRk9PKTsKKyAgICBvayAoLXMgJGZpbGUgPT0gNDA5 Nyk7CisgICAgb3BlbihGT08sIjw6Y3JsZiIsJGZpbGUpOworCisgICAgeyBs b2NhbCAkLzsgJHRleHQgPSA8Rk9PPiB9CisgICAgaXMoY291bnRfY2hhcnMo JHRleHQsICJcMDE1XDAxMiIpLCAwKTsKKyAgICBpcyhjb3VudF9jaGFycygk dGV4dCwgIlxuIiksIDEpOworICAgIG9wZW4oRk9PLCAiPjpjcmxmIiwgIiRm aWxlIik7CisgICAgcHJpbnQgRk9PICR0ZXh0OworICAgIGNsb3NlIEZPTzsK KyAgICBvayAoLXMgJGZpbGUgPT0gNDA5Nyk7CiB9CiAKIHN1YiBjb3VudF9j aGFycyB7Cg== --0016e64ddf3e3d56d904a51f5c4c Content-Type: text/plain; charset=us-ascii -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple --0016e64ddf3e3d56d904a51f5c4c--