Mail Archives: cygwin/2006/10/25/15:48:44
> Propose a patch, and I will consider it. In my opinion, it was much
> easier to do igncr as an all or none option than it was to parse the first
> line and discard \r on a per-file basis, not to mention that all-or-none
> is easily configurable so that those of us who WANT literal \r as required
> by POSIX can do so. To date, I have been the only person proposing
> patches in the area of \r\n handling, rather than just ideas (which is
> probably why I am the maintainer of the cygwin bash port). And I have no
> personal interest in redoing this patch to a more complex algorithm at
> this time.
I can understand your point of view, I'm sure it is a huge pain to deal with
this and then have to answer all the questions and complaints from people who
don't know why their scripts broke.
Given the number of support requests on the mailing list, it definitely seems
worthwhile to have this issue handled more transparently. The most compelling
argument for not translating \r\n, as I understand it, is that searching for the
\r slows down processing of large scripts, such as configure, when it is not
necessary. On the other hand, most user-created scripts that do contain \r\n
line endings are not large enough for the overhead to be problematic. For that
reason, I think it makes the most sense to enable or disable igncr on a per-file
basis. The other obvious alternative would be to have igncr turned on by
default, but then users would have to explicitly disable it to get the
performance enhancement when running configure scripts, and most would simply
not know they needed to do it.
Below is a patch to shell.c which implements my suggestion (It is just the
output of diff -u, is that OK? The patch is relative to the 3.2-4 version.) I
just modified the open_shell_script() function; now after it reads in 80
characters to make sure the script isn't binary, it also scans that buffer for
the first \n it finds and if it is preceded by \r, then it enables igncr. If
necessary it reads more characters, 80 bytes at a time. In most cases the
performance decrease for this check will be entirely negligible, particularly
since the first line of the script is almost always going to just be the
#!/bin/bash line anyway.
The only downside I can see to this is that bash will behave differently
depending whether it is reading from a file or from a pipe. But that is already
true in other cases as well, for instance the check for binary files is not
performed when reading from a pipe. It would be possible to extend the \r\n
check to piped input as well, although it would be somewhat more annoying to
implement.
If you think this overall concept is a good idea, I am happy to take everyone's
suggestions and implement them. It's a bit of a struggle for me to write C code
instead of C++, but I can give it a shot.
-Lewis
---------
--- shell.c 2006-10-25 15:40:53.625000000 -0400
+++ shell_patched.c 2006-10-25 15:41:02.093750000 -0400
@@ -1335,6 +1335,7 @@
open_shell_script (script_name)
char *script_name;
{
+ extern int igncr;
int fd, e, fd_is_tty;
char *filename, *path_filename, *t;
char sample[80];
@@ -1426,8 +1427,32 @@
internal_error ("%s: cannot execute binary file", filename);
exit (EX_BINARY_FILE);
}
+
+ /* Now scan the first line of the file, if it ends with \r\n, then
+ enable the igncr shell option. */
+
+ while(sample_len>1)
+ {
+ int i;
+ for(i=0; i!=sample_len; ++i) if(sample[i]=='\n')
+ {
+ if(i>0 && sample[i-1]=='\r') igncr=1;
+ break;
+ }
+ if(i<sample_len) break;
+
+ /* else need to read more data */
+ sample_len = read(fd, sample, sizeof(sample));
+ if(sample_len<0)
+ {
+ file_error(filename);
+ exit(EX_NOEXEC);
+ }
+ }
+
/* Now rewind the file back to the beginning. */
lseek (fd, 0L, 0);
+
}
/* Open the script. But try to move the file descriptor to a randomly
------------
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
- Raw text -