X-Spam-Check-By: sourceware.org Message-ID: <442C408B.3080409@carter.to> Date: Thu, 30 Mar 2006 14:33:15 -0600 From: David Carter User-Agent: Thunderbird 1.5 (Windows/20051201) MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: problems with gawk 3.1.5-3 hanging -- more info References: <442C25D0 DOT 7030605 AT pondol DOT com> <442C3197 DOT 7090309 AT pondol DOT com> <20060330200757 DOT GO20907 AT calimero DOT vinschen DOT de> In-Reply-To: <20060330200757.GO20907@calimero.vinschen.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Corinna Vinschen wrote: > O_TEXT is correct because gawk is a text tool in the first place and > it should treat input lines identical, regardless if they have DOS > or UNIX lineendings. Hi Corinna, thanks for the prompt reply. If I understand you correctly, the fix in -3 has to do with converting DOS-style CRLFs to LFs. This appears to be the issue. The ouput from rsync (on all platforms--windows/unix/POSIX/whatever) contains CR characters (0x0d) by themselves. This is what accounts for the output of rsync "overwriting" itself when you run it alone from a bash prompt. Here's a snippet of hexdump output from rsync: $ rsync -Pv /cygdrive/c/backup2 10.0.0.204:~ | xxd 0000000: 6261 636b 7570 320a 2020 2020 2020 2020 backup2. 0000010: 2037 3030 2020 2030 2520 2020 2030 2e30 700 0% 0.0 0000020: 306b 422f 7320 2020 2030 3a30 303a 3030 0kB/s 0:00:00 0000030: 0d20 2020 2020 3133 3736 3137 3620 2020 . 1376176 0000040: 3025 2020 2020 312e 3238 4d42 2f73 2020 0% 1.28MB/s 0000050: 2020 303a 3133 3a33 350d 2020 2020 2032 0:13:35. 2 You can see the 0d all by itself at address 0000030, and again at 0000059. It appears to me that by opening the file as O_TEXT, that gawk is hanging because it is waiting for that LF char to follow the CR (which never comes). Does this sound likely to you? > I can't tell why it fails for you, because I can't reproduce this > locally. I'm working on a short script that reproduces the problem for all parties; I'll post it here when I have it. Or would you rather I send it directly to you? Also, I took a look at some of the source for other utilites that work with text input; these included tail, head, cat, and sed. I don't see any of those utilities opening up the input file the way you are in gawk, and in fact a look at the ChangeLog for coreutils hints that they used setmode at one time and since removed it (why, I don't know). Comments abound like this in the ChangeLog: ChangeLog: * src/cat.c (main): Avoid setmode; use POSIX-specified routines instead. My thinking was, "gawk should probably open files the same way sed does," but maybe my thinking is in error on this point. Your thoughts? > As for the O_BINARY mode, in theory there's a way to > accomplish that without rebuilding gawk by setting the BINMODE > variable: > > gawk -v BINMODE=r [...] > > Unfortunately it turns out that this doesn't work because gawk fails > to call the setmode function in this case on Cygwin. I'll upload a > patched gawk soon. If you want to apply it by yourself, try this: > (snip...) This is a suitable workaround for me, but I would like to humbly submit that gawk shouldn't hang regardless of the input given to it. If the input isn't acceptable, perhaps it should error to stderr or some such and exit. Your thoughts? Again, I'll come up with a short shell script that reproduces the issue for you, and hopefully together we can come up with an agreeable solution. Regards; David Carter -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/