delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2011/06/07/09:40:42

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=1.0 required=5.0 tests=AWL,BAYES_50,DKIM_SIGNED,DKIM_VALID,FREEMAIL_FROM,KAM_VIAGRA1,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,T_TO_NO_BRKTS_FREEMAIL
X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
In-Reply-To: <BANLkTino_ZBUjCzbrmLezRJqjz2Y+04DCQ@mail.gmail.com>
References: <4DD36619 DOT 1010401 AT hima DOT com> <BANLkTino_ZBUjCzbrmLezRJqjz2Y+04DCQ AT mail DOT gmail DOT com>
Date: Tue, 7 Jun 2011 15:40:16 +0200
Message-ID: <BANLkTinyfwi1Fkah76BAYX9YJwucsbUdLw@mail.gmail.com>
Subject: Re: 1.7.9: Problem with line endings of Perl output redirected to a file with textmode mounting
From: Reini Urban <rurban AT x-ray DOT at>
To: cygwin AT cygwin DOT com
Cc: pp <perl5-porters AT perl DOT org>
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Note-from-DJ: This may be spam

--0016e64ddf3e3d56d904a51f5c4c
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

2011/5/24 Reini Urban:
> 2011/5/18 Sven Severus:
>> let me report a strange behaviour with Cygwin Perl (I'm using cygwin1.dll
>> 1.7.9-1, full installation 2 weeks ago).
>>
>> File foo.h is an ordinary text file, all lines are terminated with DOS
>> style line endings <cr> <lf> (hex: 0d 0a).
>> It is located in a directory with textmode mounting in cygwin.
>> One <cr> <lf> sequence of foo.h is split by a 4096 byte boundary within
>> the file: "od -c -Ax foo.h" shows a <cr> (=3D'\r') at byte offset 4095
>> (0xfff)
>> and a <lf> (=3D'\n') at offset 4096 (0x1000):
>> ...
>> 000ff0 =A0 / =A0 / =A0 / =A0 / =A0 / =A0 / =A0\r =A0\n =A0 / =A0 / =A0 X=
 =A0 X =A0 X =A0 X =A0 X =A0\r
>> 001000 =A0\n =A0 / =A0 / =A0\r =A0\n =A0 / =A0 / =A0\r =A0\n
>> 001009
>>
>> Now I issued the command "perl -pe 's/12345/54321/' foo.h >foomod.h"
>> to produce foomod.h, located in the same directory as foo.h, thus with
>> textmode mounting too.
>> When I examined the result, I noticed that foomod.h was one byte bigger
>> then foo.h. I expected identical size, and "od -c -Ax foomod.h" reports:
>> ...
>> 000ff0 =A0 / =A0 / =A0 / =A0 / =A0 / =A0 / =A0\r =A0\n =A0 / =A0 / =A0 X=
 =A0 X =A0 X =A0 X =A0 X =A0\r
>> 001000 =A0\r =A0\n =A0 / =A0 / =A0\r =A0\n =A0 / =A0 / =A0\r =A0\n
>> 00100a
>>
>> Ups! The original <cr> <lf> sequence starting at offset 4095 (0xfff)
>> became a three character sequence <cr> <cr> <lf>! The <cr> is duplicated!
>>
>> In other files created by Perl with output redirection I observed this
>> behaviour with every <cr> <lf> line ending, that is split by a 4096 byte
>> boundary (even multiple times in one output file). Line endings not
>> split by a 4096 byte boundary do not show this behaviour.
>>
>> The behaviour does not occur, when the destination file is located
>> in a directory with binmode mounting. It does not occur either, when
>> I use sed instead of Perl ("sed -e 's/12345/54321/' foo.h >foomod.h"),
>> so I think the problem is specific to Cygwin Perl, not to Cygwin in
>> general.
>>
>> I this a bug of the output buffering mechanism of Cygwin Perl?
>> Or do I anything wrong?
>> Any answer is highly appreciated. Thanks in advance.
>
> Yes, this looks like a PerlIO buffering bug for MSWin32 and cygwin.
> The last char of the buffer is not stored when checking the first char
> of the new buffer.
> I think first we have to provide a sample test case to perl core.

I could not reproduce it in perl core with the PerlIO :crlf layer, see
attached test.
I'm investigating cygwin buffer edge-case handling now.

--=20
Reini Urban

--0016e64ddf3e3d56d904a51f5c4c
Content-Type: application/octet-stream; name="crlf-bufedge.patch"
Content-Disposition: attachment; filename="crlf-bufedge.patch"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_gomwcmw20

ZGlmZm9yaWcgdC9pby9jcmxmLnQKCmRpZmYgLXUgdC9pby9jcmxmLnQub3Jp
ZyB0L2lvL2NybGYudAotLS0gdC9pby9jcmxmLnQub3JpZwkyMDExLTAzLTI4
IDIxOjU5OjUxLjcyOTM3NjkwMCArMDIwMAorKysgdC9pby9jcmxmLnQJMjAx
MS0wNi0wNyAxNTozNDowNy44MDgxMzAwMDAgKzAyMDAKQEAgLTEwLDEwICsx
MCwxMCBAQAogdXNlIENvbmZpZzsKIAogCi1teSAkZmlsZSA9IHRlbXBmaWxl
KCk7CitteSAkZmlsZSA9ICJ4eCI7ICN0ZW1wZmlsZSgpOwogCiB7Ci0gICAg
cGxhbih0ZXN0cyA9PiAxNik7CisgICAgcGxhbih0ZXN0cyA9PiAyMCk7CiAg
ICAgb2sob3BlbihGT08sIj46Y3JsZiIsJGZpbGUpKTsKICAgICBvayhwcmlu
dCBGT08gJ2EnLigoKCdhJyB4IDE0KS5xcXtcbn0pIHggMjAwMCkgfHwgY2xv
c2UoRk9PKSk7CiAgICAgb2sob3BlbihGT08sIjw6Y3JsZiIsJGZpbGUpKTsK
QEAgLTcwLDYgKzcwLDIyIEBACiAJICAgIHVubGlrZSgkZm9vLCBxci9ceDBk
XHgwZC8pOwogCX0KICAgICB9CisKKyAgICAjIFtwZXJsIDU4eHh4eF0gNDA5
NiBidWZzaXplIGVkZ2UtY2FzZTogXHI8YnVmZW5kPlxuIG5vdCBkZXRlY3Rl
ZAorICAgICMgPT4gXHI8YnVmZW5kPlxyXG4KKyAgICBvcGVuKEZPTywiPjpj
cmxmIiwkZmlsZSk7CisgICAgcHJpbnQgRk9PICgnLicgeCA0MDk1KS5xcXtc
bn07CisgICAgY2xvc2UoRk9PKTsKKyAgICBvayAoLXMgJGZpbGUgPT0gNDA5
Nyk7CisgICAgb3BlbihGT08sIjw6Y3JsZiIsJGZpbGUpOworCisgICAgeyBs
b2NhbCAkLzsgJHRleHQgPSA8Rk9PPiB9CisgICAgaXMoY291bnRfY2hhcnMo
JHRleHQsICJcMDE1XDAxMiIpLCAwKTsKKyAgICBpcyhjb3VudF9jaGFycygk
dGV4dCwgIlxuIiksIDEpOworICAgIG9wZW4oRk9PLCAiPjpjcmxmIiwgIiRm
aWxlIik7CisgICAgcHJpbnQgRk9PICR0ZXh0OworICAgIGNsb3NlIEZPTzsK
KyAgICBvayAoLXMgJGZpbGUgPT0gNDA5Nyk7CiB9CiAKIHN1YiBjb3VudF9j
aGFycyB7Cg==


--0016e64ddf3e3d56d904a51f5c4c
Content-Type: text/plain; charset=us-ascii

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
--0016e64ddf3e3d56d904a51f5c4c--

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019