delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2017/08/14/06:36:47

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:from:to:subject:date:message-id:content-type
:content-transfer-encoding:mime-version; q=dns; s=default; b=a6p
qA7Yb7U2nix/fxIYsCvPoUuR3LTDItS0RXjnZztzk1Kq68YoAXmMRbwN2Diz2dC1
81gUwREqiX4M/3KopQMV+2XsefmuIcHcVBBEFxeONWbceG8Pvx4T/BycobFCIWmC
jIQeWJM0nGTJeIrXIhbyVjrAMm44EnlBcmTiWBLs=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:from:to:subject:date:message-id:content-type
:content-transfer-encoding:mime-version; s=default; bh=GbjV4UJBf
lL15gD5jx6gZhCCUkk=; b=y4/i7aXud3h09MDRvFyfGA01yg5U8wv1hhFSVyf9X
0fzHUq5GTzj4lXKYPGiX4ZI5CLeQp/T98t4xs0oc3hJkuAXNJRHMGBwTyBuh9yPe
FZ4ITj5+oIw0WUxyCgnhByYbmq5qXg8IDjpDI8ckitvN8GIDgP4nSBf7cy0l8Gzy
5U=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-5.9 required=5.0 tests=BAYES_00,GIT_PATCH_2,KAM_LAZY_DOMAIN_SECURITY autolearn=ham version=3.3.2 spammy=H*F:D*at, endless, endings, vote
X-HELO: mx-relay06-haj2.antispameurope.com
From: Vermessung AVT - Wolfgang Rieger <w DOT rieger AT avt DOT at>
To: "cygwin AT cygwin DOT com" <cygwin AT cygwin DOT com>
Subject: RE: gawk 4.1.4: CR separate char for CRLF files
Date: Mon, 14 Aug 2017 10:36:23 +0000
Message-ID: <AB495CE313664A489959F8DEF45069A901ABFC2D4B@EXSRV01.avt-imst.local>
MIME-Version: 1.0
X-cloud-security-sender:w DOT rieger AT avt DOT at :
X-cloud-security-recipient:cygwin AT cygwin DOT com :
X-cloud-security-Virusscan:CLEAN :
X-cloud-security-disclaimer: This E-Mail was scanned by E-Mailservice on mx-gate06-haj2 with CF5A679401A
X-cloud-security-connect: 185-58-53-14.customers.tirolnet.com[185.58.53.14], TLS=1, IP=185.58.53.14
X-cloud-security:scantime:.6737 :
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id v7EAaj7c020302

On Wed, 9 Aug 2017 10:38 +0000, Jannick wrote:

--- snip ---
> Now I can see the following *easy* solutions to the very situation here (input only for now):
>
> 1 - Inserting the BEGIN section as you suggested into more than 1k scripts (not feasible due to additional regression test workload) 
>
> 2 - Calling 'gawk -vRS=\r\n -vORS=\r\n' instead of 'gawk' (hack to turn back the additional the latest gawk's complexity, wrapper needed)
>
> 3 - Wrapping a d2u/u2d pipe solution (additional app and wrapper needed again)
>
> 4 - Using another compiled version of gawk which does *not* disable the out-of-the-box gawk feature to swallow CRs (cf., e.g., http://git.savannah.gnu.org/cgit/gawk.git/tree/awkgram.y#n3543), i.e.
> without the artificial obstacle to now know the EOL type of the input file ahead of running gawk.
>
>> It works in all my cases. The only disadvantage: you have to know what kind
>
>... plus the disadvantage to systematically amend all the scripts instead of having an external solution 
>
>> of files you want to handle in the awk script. The same awk script 
>> will not
>> work for DOS files as well as for linux files.
>
>... another issue originated by the change and which didn't exist before.
>
>> Best
>> 
>> Roger
>
> Please don't get me wrong, but this raises a real issue here and I am not sure which rationale other than 'let's get more of the Linux-feel' drove the decision.
>
> All the best,
> J. 
--- snip ---

Another solution which we have been using for many years now, though it might not be feasible for you:

We very rarely update Cygwin. We have been using Cygwin for some 15+ years now. We use tools like gawk (hundreds of scripts), head, tail, sort, etc. that we are using in shell scripts running under cmd.exe (no Unix shells involved). I soon realized that upgrades of Cygwin may cause troubles with existing scripts, so we only update if we really need to (e.g.: New functionality that would be important, 32 to 64 bit shift, eventually new Windows versions, bugs we needed to be fixed).

I have followed the discussions about the CR/LF behaviour changes in the past attentively and decided not to update in near future, because that would lead to a massive problem with many hundreds of scripts - hoping that sometimes there will be a change in gawk again.

What is Unix-like or OS-like or Posix-like behaviour in that context? You could argue that gawk interprets line endings like the underlying OS does (i. e., gawk reads LF in Unix and CR/LF in Win), or it interprets line endings in a Unix-style no matter of the underlying OS used. That's a developer's decision in my opinion.

But since with pipes or output redirection gawk used to write no CRs even in previous versions, we already had the problem that gawk had to accept *both* inputs, LF with or without CR. That worked widely fine so far, since most Windows and other application SW we use accept both record formats, fortunately (we had issues with SW upgrades of other vendors no longer accepting pure LF, but that only concerned a very small number of scripts). With the new approach in Cygwin that seems to be broken, so we did not upgrade Cygwin since then (we currently use gawk 4.1.3).

Of course the reason for that really annoying CR/LF thing is the arrogance and ignorance of MS, which caused innumerable of useless developers' hours when I think of the endless discussions and changes in Cygwin; but MS is the one who defines the standards because of its very market power, so we have to deal with it, if we like or not. I'd definitely prefer to use Unix for its powerful tools, but most of the SW we use is simply not available for Unix, and MS does not provide gawk etc. So we have to deal with that CR/LF issue in a pragmatic rather than in a more, say, philosophical approach: We need to run our scripts with as little changes as possible. So that's why we upgrade Cygwin as seldom as possible. It is a "living system", yes, which is great on the one side - but can be annoying in everyday practice.

In my opinion there should be at least an option for gawk to accept both LF and CR/LF line endings equally, preferably with a system variable so that there is no need to change the command line call of gawk at all. That's what I vote for.

Kind regards,
Wolfgang





--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019