X-Spam-Check-By: sourceware.org Message-ID: <45401C68.7080706@byu.net> Date: Wed, 25 Oct 2006 20:24:40 -0600 From: Eric Blake User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Thunderbird/1.5.0.7 Mnenhy/0.7.4.666 MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: igncr vs text mode mounts, performance vs compatibility References: <1160655422743 DOT antti DOT nospam DOT 1605718 DOT wGO_WJ9D1NlId3tB-z6Qig AT luukku DOT com> <20061012123406 DOT GA30908 AT trixie DOT casa DOT cgf DOT cx> <452EA386 DOT 9010201 AT qualcomm DOT com> <20061012212011 DOT GA8535 AT trixie DOT casa DOT cgf DOT cx> <452EFDDB DOT 1010301 AT qualcomm DOT com> <452F8719 DOT 9060300 AT cygwin DOT com> <4536BC88 DOT 3030003 AT qualcomm DOT com> <4536C922 DOT 4090807 AT qualcomm DOT com> <45376CBF DOT 1030104 AT byu DOT net> <453EA9C1 DOT 8060402 AT byu DOT net> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 According to Lewis Hyatt on 10/25/2006 1:48 PM: > Below is a patch to shell.c which implements my suggestion (It is just the > output of diff -u, is that OK? The patch is relative to the 3.2-4 version.) I > just modified the open_shell_script() function; now after it reads in 80 > characters to make sure the script isn't binary, it also scans that buffer for > the first \n it finds and if it is preceded by \r, then it enables igncr. If > necessary it reads more characters, 80 bytes at a time. In most cases the > performance decrease for this check will be entirely negligible, particularly > since the first line of the script is almost always going to just be the > #!/bin/bash line anyway. Wow - someone besides me actually wrote a patch! What a welcome relief in this thread. > > The only downside I can see to this is that bash will behave differently > depending whether it is reading from a file or from a pipe. But that is already > true in other cases as well, for instance the check for binary files is not > performed when reading from a pipe. It would be possible to extend the \r\n > check to piped input as well, although it would be somewhat more annoying to > implement. When reading from a pipe, the presence or absence of \r is determined by CYGWIN=binmode (default) or CYGWIN=nobinmode (which is less tested, and has been documented as breaking things like 'tar xjf foo.tar.bz2'). In bash-3.2-4, if igncr is set, then even using CYGWIN=binmode and piping in \r will ignore the \r, whereas you are correct that your patch would not autodetect the \r line endings. For what I can think of, to check whether the first line of piped input ends in \r\n, you would have to add a flag that you are reading from a pipe and have not yet seen a line ending, then check that flag for every read in input.c; which adds overhead to every byte read, rather than just the first line. You can't afford to rewind, or read in advance, because you don't know what is feeding the other end of the pipe. > > If you think this overall concept is a good idea, I am happy to take everyone's > suggestions and implement them. It's a bit of a struggle for me to write C code > instead of C++, but I can give it a shot. There are a few other things to think about with your initial attempt. Right now, igncr is an all or nothing setting, can be inherited into child processes via SHELLOPTS, but can only be changed by explicit user action. Your patch only ever turns it on without user intervention, not off. You are changing the global igncr variable even though you described your goal as a per-file setting; so if bash reads two files in a row, first with \r\n, and the second with \n only, the second inherits the fact that the first decided to turn on igncr. At which point, why not just make igncr the default to begin with? So it sounds like the effort to make a per-file decision work correctly is getting bigger and bigger, with less and less perceivable benefits. > > -Lewis > > --------- > > --- shell.c 2006-10-25 15:40:53.625000000 -0400 > +++ shell_patched.c 2006-10-25 15:41:02.093750000 -0400 > @@ -1335,6 +1335,7 @@ > open_shell_script (script_name) > char *script_name; > { > + extern int igncr; My uses of igncr in 3.2-4 are protected by #if __CYGWIN__, so that my patches will still compile on other platforms; you would need to do likewise to avoid a link error. Actually, when I propose an upstream patch, I will probably do it in the form of ./configure --enable-igncr, then protect all uses of igncr by IGNCR, so that the presence of igncr is no longer cygwin-dependent, but can still be compiled out when desiring strict POSIX conformance. > int fd, e, fd_is_tty; > char *filename, *path_filename, *t; > char sample[80]; > @@ -1426,8 +1427,32 @@ > internal_error ("%s: cannot execute binary file", filename); > exit (EX_BINARY_FILE); > } > + > + /* Now scan the first line of the file, if it ends with \r\n, then > + enable the igncr shell option. */ > + > + while(sample_len>1) > + { > + int i; > + for(i=0; i!=sample_len; ++i) if(sample[i]=='\n') It seems a bit redundant to loop over the sample once in check_binary_file, then again in your patch; I wonder if it is worth patching check_binary_file instead? Your code formatting is not consistent with nearby code; while this is not a serious drawback, it makes it harder for me to propose your code to the upstream maintainer, should I choose to accept something like it. > + { > + if(i>0 && sample[i-1]=='\r') igncr=1; > + break; > + } > + if(i + > + /* else need to read more data */ > + sample_len = read(fd, sample, sizeof(sample)); > + if(sample_len<0) > + { > + file_error(filename); > + exit(EX_NOEXEC); I'm not sure I like this - before your patch, bash can execute the first part of a file, even if a read error would occur before encountering a newline but after 80 characters; with your patch, you keep on plowing through the file, looking for a newline, so you are possibly rejecting a file that used to be partially executable. On the other hand, read errors are generally rare, so this corner case is probably not worth me fretting about. > + } > + } > + > /* Now rewind the file back to the beginning. */ > lseek (fd, 0L, 0); Just because the bash maintainer writes (or more likely, inherited) bad code doesn't give you the excuse to use a magic number 0 instead of the symbolic name SEEK_SET as the third argument to lseek. One last point - your patch has no way to bypass automatic filtering. I would feel more comfortable if there were a way for users to avoid automatic filtering (yes, maybe only masochists would use that option, but at least it would no longer be silently forced on users). I wonder if a better approach would be to propose a patch to the cygwin base-files maintainer to set igncr in the default /etc/profile. I tend to place a high value on standardized behavior, because following standards increases portability. I am more than willing to make scripts on text mounts work smoothly regardless of line ending, because text mounts are already outside the realm of POSIX (POSIX requires fopen("r") and fopen("rb") to be identical, which is not the case on text mounts), so there are no standards preventing me from ignoring \r. But on binary mounts, the cygwin philosophy is to emulate Linux, which in turn is slowly converging on POSIX compliance. The fact that bash is used as /bin/sh makes it very hard for me to turn on a non-standard behavior unless it is explicitly requested by a configuration file such as /etc/profile, because I write shell scripts that I expect to behave identically across multiple platforms that claim some degree of POSIX conformance. There is also the possibility of making bash turn igncr on by default, but /bin/sh leaving it off, since only /bin/sh is specified by POSIX; but that also gives me the willies thinking about people who will complain why their script doesn't work when they change from #!/bin/bash to #!/bin/sh. - -- Life is short - so eat dessert first! Eric Blake ebb9 AT byu DOT net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.1 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFQBxn84KuGfSFAYARArOCAKCjQMJbii2pMSKaz9k78ootxnVM/gCghNfR whq++azHUYHn0crwPQAZlfs= =9Ga1 -----END PGP SIGNATURE----- -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/