delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2006/10/25/15:48:44

X-Spam-Check-By: sourceware.org
To: cygwin AT cygwin DOT com
From: Lewis Hyatt <lhyatt AT princeton DOT edu>
Subject: Re: igncr vs text mode mounts, performance vs compatibility
Date: Wed, 25 Oct 2006 19:48:13 +0000 (UTC)
Lines: 101
Message-ID: <loom.20061025T210057-749@post.gmane.org>
References: <1160655422743 DOT antti DOT nospam DOT 1605718 DOT wGO_WJ9D1NlId3tB-z6Qig AT luukku DOT com> <20061012123406 DOT GA30908 AT trixie DOT casa DOT cgf DOT cx> <452EA386 DOT 9010201 AT qualcomm DOT com> <20061012212011 DOT GA8535 AT trixie DOT casa DOT cgf DOT cx> <452EFDDB DOT 1010301 AT qualcomm DOT com> <452F8719 DOT 9060300 AT cygwin DOT com> <4536BC88 DOT 3030003 AT qualcomm DOT com> <4536C922 DOT 4090807 AT qualcomm DOT com> <45376CBF DOT 1030104 AT byu DOT net> <loom DOT 20061024T205240-509 AT post DOT gmane DOT org> <453EA9C1 DOT 8060402 AT byu DOT net>
Mime-Version: 1.0
User-Agent: Loom/3.14 (http://gmane.org/)
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

> Propose a patch, and I will consider it.  In my opinion, it was much
> easier to do igncr as an all or none option than it was to parse the first
> line and discard \r on a per-file basis, not to mention that all-or-none
> is easily configurable so that those of us who WANT literal \r as required
> by POSIX can do so.  To date, I have been the only person proposing
> patches in the area of \r\n handling, rather than just ideas (which is
> probably why I am the maintainer of the cygwin bash port).  And I have no
> personal interest in redoing this patch to a more complex algorithm at
> this time.

I can understand your point of view, I'm sure it is a huge pain to deal with
this and then have to answer all the questions and complaints from people who
don't know why their scripts broke.

Given the number of support requests on the mailing list, it definitely seems
worthwhile to have this issue handled more transparently. The most compelling
argument for not translating \r\n, as I understand it, is that searching for the
\r slows down processing of large scripts, such as configure, when it is not
necessary. On the other hand, most user-created scripts that do contain \r\n
line endings are not large enough for the overhead to be problematic. For that
reason, I think it makes the most sense to enable or disable igncr on a per-file
basis. The other obvious alternative would be to have igncr turned on by
default, but then users would have to explicitly disable it to get the
performance enhancement when running configure scripts, and most would simply
not know they needed to do it.

Below is a patch to shell.c which implements my suggestion (It is just the
output of diff -u, is that OK? The patch is relative to the 3.2-4 version.) I
just modified the open_shell_script() function; now after it reads in 80
characters to make sure the script isn't binary, it also scans that buffer for
the first \n it finds and if it is preceded by \r, then it enables igncr. If
necessary it reads more characters, 80 bytes at a time. In most cases the
performance decrease for this check will be entirely negligible, particularly
since the first line of the script is almost always going to just be the
#!/bin/bash line anyway.

The only downside I can see to this is that bash will behave differently
depending whether it is reading from a file or from a pipe. But that is already
true in other cases as well, for instance the check for binary files is not
performed when reading from a pipe. It would be possible to extend the \r\n
check to piped input as well, although it would be somewhat more annoying to
implement.

If you think this overall concept is a good idea, I am happy to take everyone's
suggestions and implement them. It's a bit of a struggle for me to write C code
instead of C++, but I can give it a shot.

-Lewis

---------

--- shell.c	2006-10-25 15:40:53.625000000 -0400
+++ shell_patched.c	2006-10-25 15:41:02.093750000 -0400
@@ -1335,6 +1335,7 @@
 open_shell_script (script_name)
      char *script_name;
 {
+  extern int igncr;
   int fd, e, fd_is_tty;
   char *filename, *path_filename, *t;
   char sample[80];
@@ -1426,8 +1427,32 @@
 	  internal_error ("%s: cannot execute binary file", filename);
 	  exit (EX_BINARY_FILE);
 	}
+	
+	/* Now scan the first line of the file, if it ends with \r\n, then
+	   enable the igncr shell option. */
+	   	
+	while(sample_len>1)
+	{
+		int i;				
+		for(i=0; i!=sample_len; ++i) if(sample[i]=='\n')
+		{
+			if(i>0 && sample[i-1]=='\r') igncr=1;
+			break;
+		}		
+		if(i<sample_len) break;
+				
+		/* else need to read more data */
+		sample_len = read(fd, sample, sizeof(sample));
+		if(sample_len<0)
+		{
+			file_error(filename);
+			exit(EX_NOEXEC);
+		}					
+	}
+	
       /* Now rewind the file back to the beginning. */
       lseek (fd, 0L, 0);
+      
     }
 
   /* Open the script.  But try to move the file descriptor to a randomly

------------







--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019