delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2010/04/23/21:05:43

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=0.7 required=5.0 tests=BAYES_20,TVD_RCVD_IP
X-Spam-Check-By: sourceware.org
Date: Fri, 23 Apr 2010 18:03:11 -0700 (PDT)
From: "Peter A. Castro" <doctor AT fruitbat DOT org>
To: Yutaka Amanai <yasai-itame1942 AT jade DOT plala DOT or DOT jp>
cc: cygwin AT cygwin DOT com
Subject: Re: zsh 4.3.9-1: text-mode stdin problem (breaking base64)
In-Reply-To: <4BCEBF96.3030201@jade.plala.or.jp>
Message-ID: <alpine.LNX.2.00.1004231627020.4090@gremlin.fruitbat.org>
References: <4BCEBF96 DOT 3030201 AT jade DOT plala DOT or DOT jp>
User-Agent: Alpine 2.00 (LNX 1167 2008-08-23)
MIME-Version: 1.0
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On Wed, 21 Apr 2010, Yutaka Amanai wrote:

> 2010/04/21 2:12 Peter A. Castro wrote:
>> Greetings, Yutaka,
>
> Greetings, Peter. Thank you for your reply.

Gettings, again, Yutaka,

>> The text-mode "hack" was created to solve a basic problem that zsh has
>> with running scripts, in general, on Windows. Much of the code assumes
>> that scripts have a single-character line terminator (eg: LF). So do
>> many text-based programs and filters. Windows "native" line termination
>> is (still) CRLF and zsh code does not deal well with this.
>>
>> Cygwin's text-mode munches the CR from the stream input leaving the LF
>> which works well in 99% of the usage cases. Without it, Zsh treats the
>> CR as part of the input line and tries to parse it as such leading to
>> "Bad Things"(tm) happening. The same think would be true of data read
>> via the shell and passed to other programs as stdin. There's also some
>> size calculations that only work with a single-character line terminator
>> (at least in zsh code).
>
> Could you give me a simple test case that fails without
> cygwin_premain0()? I set my filesystems as text-mode and tried to find
> such cases, but I couldn't.

It's been a while since I've looked at this, but the problem was mostly
with binary-mode mounts, not text-mode mounts.  The problem was that,
say, you had your root mounted as text-mode, but your /tmp mounted as
binary-mode.  Zsh (and other utilities) create temp files fairly often
and feed those as input to itself or other programs.  Or, reverse the
case (root mounted binary and /tmp mounted text).

{f}open() in Cygwin is context sensitive to the filesystem mount mode.
This leads to such situations as calling fopen("/tmp/foo","r") and
expecting it to read "text" lines, but "/tmp" is mounted binary and file
"foo" contains CRLF's because it was created by a Windows program or
editor.  So, when you read the lines you will get the CR as well as the
LF, when you really only want the LF.  Where as if "/tmp" was mounted
text, the CR would be stripped off as part of text processing.

> I thought about two cases:
> * If you don't use CRLF scripts at all and mount all your filesystems as
>  binary-mode, there should be no problem (without premain hack).

In a pure Cygwin eco-system that might work.  However, many Cygwin users
have to interact with non-Cygwin created data and files.  If you ask the
good users on this mailing list you might find that people have any
combination of file systems mounted for their particluar needs.

> * If you use CRLF scripts and mount all your filesystem as text-mode,
>  there should be no problem (without premain hack).

But, now, you won't get binary data from the files using a naked "open()"
as so many typically coded apps do.

> Is it right?

If you could keep things strictly black-and-white like that, yes, in
theory these could work.  Well, the first one would be preferable as
opposed to the second one.  But the problem is that most Cygwin users
don't operate in such a strict environment.

>> A while back I looked at making changes to somehow acommodate CRLF, but
>> there are many places in the code that would require some heavy changes
>> (some of which I'm still not certain would be correct) and would make it
>> difficult to maintain. I doubt that Zsh base would accept such changes
>> either as they would be an intrusive hack for Windows only support. By
>> contrast the premain hack was elegent and global.
>>
>> I could have simply told people that they had to run scripts from a
>> non-text-mode mount, that their /tmp had to also be on a non-text-mode
>> mount and all data the scripts explicitly read from were also on a
>> non-text-mode mount AND all scripts (and input data) must be non-CRLF.
>> Think that would fly? Me neither. That was the basis for this "fix" in
>> the first place.
>
> I don't know well about zsh code, but I think it will be hard to do the
> hack without cygwin_premain0(), as you said. But, how about bash? bash
> seems not to have such hacks, but it seems to work well. And I think
> it's confusing that bash and zsh treat stdin as different mode.

Have a look at Bash code some time.  I recall seeing some O_TEXT options
being set in the various {f}open()'s that it does.  Again, I looked at
doing the same in Zsh code, but after some initial experiments it proved
that there were too many dependencies and assumptions about the
carriage-control of "text" files to make it work quickly.

>> And how is base64's deficiency a zsh problem? Stdin/Stdout are "text"
>> handles, which implies possible data manipulation along those lines.
>> There's no guarantee that they would pass binary data.
>>
>> I believe that programs reading from stdin are supposed to assume the
>> text-mode semantic for the handles and behave accordingly. You've
>> mentioned "cat" and "gzip" doing that very thing. Think there might be a
>> reason for that?
>
> Indeed, it's theoretically right that any programs which perform binary
> I/O should set stdin/stdout as binary mode for portability. But
> practically, it will be a heavy work to check that all programs on our
> system follow the rule, and I think the check can't be perfect. I'd

Reguardless of how much work it might be, it's a matter of "due
diligence".  When you find something that doesn't behave appropriately,
report it to the maintainers.

And, in that vein, yes, I acknowledge there are issues with Zsh in this
area.  The premain is one "solution" that works for most cases.  You
appear to have found one case that doesn't work as expected
(congratulations!).  But, as I said, that particular case appears to be
more a matter of that the Stdin handle should be treated as and work
appropriately.

This problem is still under consideration.  Having more than one type of
filesystem mode is part of the equasion and attempting to treat that
correctly is somewhat difficult in Zsh.

> rather keep all my scripts as LF than break my data by some programs
> like base64, so I will continue to use the customized zsh.

If that works for you, great.  That's why the source is available.
I do hope to get back to this issue at some point.
Thanks for pointing it out.

> PS: for base64, I will report the problem to bug-coreutils list later.

Excellent.  I have a good understanding of what you are testing so I can
have another look at show Zsh does it's file handles and maybe code a
proper fix.

-- 
Peter A. Castro <doctor AT fruitbat DOT org> or <Peter DOT Castro AT oracle DOT com>
 	"Cats are just autistic Dogs" -- Dr. Tony Attwood

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019