delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2018/02/27/11:57:05

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:subject:to:references:from:message-id:date
:mime-version:in-reply-to:content-type
:content-transfer-encoding; q=dns; s=default; b=ncAWoQVvR69KHPhl
WFryaXjkRvGlu7rprhSqJMTzOSz/6aAcQgyK4ZmMGRevlbXfR9yRMOcKgH3etEYN
v5n7HJ0iQ5WzRsJw4626EE9+BnfaOSskDMmGl6qqT3aLrjf052CKK21FJYSrVCQM
b3ozYCOK0qHs10kKZUfaMByYOdo=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:subject:to:references:from:message-id:date
:mime-version:in-reply-to:content-type
:content-transfer-encoding; s=default; bh=aiTY0sNYXYteBc2jQ9fq7R
FMWeQ=; b=l7J3i7Rc4JNwPGMGbrWqxQ6xCJi/EMk1HkBO90+OSldtymodY57dDF
LQj85f7BZnjNbqQ30Tg3Z0ORVXx8V4e9m6lV7SFyHTUk6IjLxSlOAJwgPQXnxTtN
mtdwU0e3hU2mNt3i7a4fCzMv/At0v83BSdNOjjI8/i3dq2FTiuDbQ=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.8 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=entitled, coordinated, executive
X-HELO: mx1.redhat.com
Subject: Re: gawk Regression: CR characters are not stripped on Windows
To: cygwin AT cygwin DOT com, bug-gawk AT gnu DOT org, Eli Zaretskii <eliz AT gnu DOT org>
References: <CAGHpTB+bfbts=fOBSQPN7c-NDh8FTXR+EauhDhiVrqbgawcYoA AT mail DOT gmail DOT com>
From: Eric Blake <eblake AT redhat DOT com>
Message-ID: <619440c1-0480-41a8-ddc0-216b31f3efd9@redhat.com>
Date: Tue, 27 Feb 2018 10:56:50 -0600
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0
MIME-Version: 1.0
In-Reply-To: <CAGHpTB+bfbts=fOBSQPN7c-NDh8FTXR+EauhDhiVrqbgawcYoA@mail.gmail.com>
X-IsSubscribed: yes

[urrgh - Cygwin's list policy in supplying reply-to makes it difficult 
to reply-to-all]

On 02/27/2018 01:22 AM, Orgad Shaneh wrote:
> Hi,
> 
> Cross-posting per Eli Zaretskii's request.
> 
> CR characters used to be automatically stripped on Windows (MSYS2 and
> Cygwin environments). This is broken in 4.2.0.

You should not think of Cygwin as a Windows environment, but as a 
Linux-alike environment.  gawk on Linux does not automatically strip 
CRs, therefore gawk on Cygwin should not automatically strip CRs.

What MSYS2 does is different, and that environment is entitled to use 
patches to make interoperability with native windows program nicer, at 
the expense of being less like Linux.

Furthermore, the change in Cygwin predates the gawk 4.2.0 release, and 
was intentionally made in a coordinated release in Feb 2017 alongside 
sed and grep:

https://sourceware.org/ml/cygwin/2017-02/msg00152.html
https://sourceware.org/ml/cygwin/2017-02/msg00188.html
https://sourceware.org/ml/cygwin/2017-02/msg00189.html

following on from discussions about bash after ShellShock:

https://sourceware.org/ml/cygwin/2016-08/msg00097.html

Changing gawk back to automatically strip CRs on Cygwin would be a 
regression.

> As Eli said, this change was deliberate. But this has several drawbacks.
> 
> 1. The gawk info page states that:
> 
>> Under MS-Windows, 'gawk' (and many other text programs) silently
>> translates end-of-line '\r\n' to '\n' on input and '\n' to '\r\n' on
>> output.
> 
> and on Feb 8 the following section was added:
> 
>> Recent versions of Cygwin open all files in binary mode.  This means
>> that you should use 'RS = "\r?\n"' in order to be able to handle
>> standard MS-Windows text files with carriage-return plus line-feed line
>> endings.

Or mount your Windows text files under a text mount in Cygwin (so that 
such files already have \r stripped), or add steps to your pipelines to 
strip CR before handing the data to gawk.

> 
> This breaks compatibility between different gawk versions. What were
> the reasons for this change in cygwin, and why was it pushed upstream?

See the discussion in Feb 2017 for rationale, but the executive summary 
is that Cygwin attempts to emulate Linux, silent corruption of binary 
files was deemed worse than manually having to explicitly strip CR when 
dealing with Windows text output.

> 
> 2. Git and other tools automatically convert text files to CRLF on
> Windows.

Not Cygwin git.  The problems you are encountering are more likely to 
happen when you mix and match tools from disparate environments, rather 
than when you use all tools from the same source.

> This means that any awk script that runs on both platforms
> must use RS = "\r?\n".

or strip the CR in any other means.  But the same is true of any script 
that must run on both Windows and Linux.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019