delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2006/03/30/15:33:34

X-Spam-Check-By: sourceware.org
Message-ID: <442C408B.3080409@carter.to>
Date: Thu, 30 Mar 2006 14:33:15 -0600
From: David Carter <carter AT carter DOT to>
User-Agent: Thunderbird 1.5 (Windows/20051201)
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: problems with gawk 3.1.5-3 hanging -- more info
References: <442C25D0 DOT 7030605 AT pondol DOT com> <442C3197 DOT 7090309 AT pondol DOT com> <20060330200757 DOT GO20907 AT calimero DOT vinschen DOT de>
In-Reply-To: <20060330200757.GO20907@calimero.vinschen.de>
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

Corinna Vinschen wrote:

> O_TEXT is correct because gawk is a text tool in the first place and
> it should treat input lines identical, regardless if they have DOS
> or UNIX lineendings.

Hi Corinna, thanks for the prompt reply.

If I understand you correctly, the fix in -3 has to do with converting 
DOS-style CRLFs to LFs. This appears to be the issue. The ouput from 
rsync (on all platforms--windows/unix/POSIX/whatever) contains CR 
characters (0x0d) by themselves. This is what accounts for the output of 
rsync "overwriting" itself when you run it alone from a bash prompt.

Here's a snippet of hexdump output from rsync:

$ rsync -Pv /cygdrive/c/backup2 10.0.0.204:~ | xxd
0000000: 6261 636b 7570 320a 2020 2020 2020 2020  backup2.
0000010: 2037 3030 2020 2030 2520 2020 2030 2e30   700   0%    0.0
0000020: 306b 422f 7320 2020 2030 3a30 303a 3030  0kB/s    0:00:00
0000030: 0d20 2020 2020 3133 3736 3137 3620 2020  .     1376176
0000040: 3025 2020 2020 312e 3238 4d42 2f73 2020  0%    1.28MB/s
0000050: 2020 303a 3133 3a33 350d 2020 2020 2032    0:13:35.     2

You can see the 0d all by itself at address 0000030, and again at 0000059.

It appears to me that by opening the file as O_TEXT, that gawk is 
hanging because it is waiting for that LF char to follow the CR (which 
never comes). Does this sound likely to you?

> I can't tell why it fails for you, because I can't reproduce this
> locally.  

I'm working on a short script that reproduces the problem for all 
parties; I'll post it here when I have it. Or would you rather I send it 
directly to you?

Also, I took a look at some of the source for other utilites that work 
with text input; these included tail, head, cat, and sed. I don't see 
any of those utilities opening up the input file the way you are in 
gawk, and in fact a look at the ChangeLog for coreutils hints that they 
used setmode at one time and since removed it (why, I don't know). 
Comments abound like this in the ChangeLog:

ChangeLog:      * src/cat.c (main): Avoid setmode; use POSIX-specified 
routines instead.

My thinking was, "gawk should probably open files the same way sed 
does," but maybe my thinking is in error on this point. Your thoughts?

> As for the O_BINARY mode, in theory there's a way to
> accomplish that without rebuilding gawk by setting the BINMODE
> variable:
> 
>   gawk -v BINMODE=r [...]
> 
> Unfortunately it turns out that this doesn't work because gawk fails
> to call the setmode function in this case on Cygwin.  I'll upload a
> patched gawk soon.  If you want to apply it by yourself, try this:
>  (snip...)

This is a suitable workaround for me, but I would like to humbly submit 
that gawk shouldn't hang regardless of the input given to it. If the 
input isn't acceptable, perhaps it should error to stderr or some such 
and exit. Your thoughts?

Again, I'll come up with a short shell script that reproduces the issue 
for you, and hopefully together we can come up with an agreeable solution.

Regards;

David Carter

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019