delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2006/10/18/19:45:44

X-Spam-Check-By: sourceware.org
Message-ID: <4536BC88.3030003@qualcomm.com>
Date: Wed, 18 Oct 2006 16:45:12 -0700
From: Rob Walker <rwalker AT qualcomm DOT com>
User-Agent: Thunderbird 1.5.0.7 (Windows/20060909)
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: shopt igncr not working
References: <1160655422743 DOT antti DOT nospam DOT 1605718 DOT wGO_WJ9D1NlId3tB-z6Qig AT luukku DOT com> <20061012123406 DOT GA30908 AT trixie DOT casa DOT cgf DOT cx> <452EA386 DOT 9010201 AT qualcomm DOT com> <20061012212011 DOT GA8535 AT trixie DOT casa DOT cgf DOT cx> <452EFDDB DOT 1010301 AT qualcomm DOT com> <452F8719 DOT 9060300 AT cygwin DOT com>
In-Reply-To: <452F8719.9060300@cygwin.com>
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

Larry Hall (Cygwin) wrote:
> On 10/12/2006, Rob Walker wrote:
>> If you're referring to the performance gain realized, I think it 
>> could have been accomplished (if not as trivially) without breaking 
>> CRLF handling.  This seems to be indicated in other posts, ones that 
>> talk about reworking line parsing.
>
>
> I believe the response to this is <http://cygwin.com/acronyms/#PTC>.  
> In other
> words, if your belief is strong enough and you have the knowledge to 
> back up
> that belief, you just need the persistence to follow through on all 
> that to
> show everyone your concrete ideas.  Since we've had allot of opinionated
> discussions on topics like this from the uninformed or those who lack the
> conviction to actually submit a patch to back up their point of view, 
> it's
> important to realize here that patches speak louder than words (hm, 
> PSLTW -
> acronym alert? ;-) )
>
>
>> Actually, though, I was asking about a bigger-picture strategy.  One 
>> that appears to be steering Cygwin away from interoperability of the 
>> past, towards a more rigid interpretation of what Cygwin's suitable 
>> uses are.  Do you have a set of guiding principles you consult when 
>> deciding the fate of Cygwin?  Who do you consider Cygwin's customers 
>> to be? 
>
>
> The basic strategy is that in cases where decisions have to be made 
> between
> supporting Linux-like behavior or Windows conventions, err on the side
> of Linux.  Since the tools are meant to support the Linux way of doing
> things, it's important they do.  Otherwise people who are looking for and
> expecting this behavior are left out.

Are you saying that these people expect bash to treat CRLF as if the CR 
were non-whitespace?  Can you give me an example where this would be a 
useful feature?

>   They are the ones these tools are
> built to support.  That said, support for various Windows ways and 
> conventions
> are supported by default and when they don't conflict with the above.  
> But
> when there is a conflict, Linux-like behavior is the goal.
I guess you're saying (in this case) that the performance benefit of 
barfing on CRLF outweighs the usefulness of bash's invisible handling of 
CRLF?

To test this assertion, I benchmarked bash (3.1-9).  The script I used 
to test is essesntially empty, with nothing but the shebang, a call to 
shopt, and 50k empty lines.  I chose empty lines to keep bash's other 
complexities out of the picture.  I only wanted to measure is how long 
it takes bash to parse lines.

Here are my results:

-----------------------------------------------------
 line ending  | mount mode | igncr | time ./test.sh
-----------------------------------------------------
              |            |       | real    0m4.219s
 CRLF         |  text      |  set  | user    0m0.983s
              |            |       | sys     0m3.202s
-----------------------------------------------------
              |            |       | real    0m4.312s
 CRLF         |  text      | clear | user    0m1.062s
              |            |       | sys     0m3.265s
-----------------------------------------------------
              |            |       | real    0m2.109s
  LF          |  text      |  set  | user    0m0.608s
              |            |       | sys     0m1.499s
-----------------------------------------------------
              |            |       | real    0m2.125s
  LF          |  text      | clear | user    0m0.592s
              |            |       | sys     0m1.546s
-----------------------------------------------------
              |            |       | real    0m2.125s
 CRLF         |  bin       |  set  | user    0m0.546s
              |            |       | sys     0m1.530s
-----------------------------------------------------
              |            |       |
 CRLF         |  bin       | clear | Whoops!
              |            |       |
-----------------------------------------------------
              |            |       | real    0m2.188s
  LF          |  bin       |  set  | user    0m0.608s
              |            |       | sys     0m1.546s
-----------------------------------------------------
              |            |       | real    0m2.141s
  LF          |  bin       | clear | user    0m0.640s
              |            |       | sys     0m1.515s
-----------------------------------------------------

My conclusions:

1) CRLF vs. LF line endings have essentially no effect on the 
performance of this version of bash, even on a test where bash is doing 
nothing but handling linefeeds.
2) Ignoring CR on a binmode mount has no performance penalty over a 
clean LF-only file.  In fact, the margin of error in this test was 
higher than the performance penalty.
3) CRLF on a text mode mount is really, really bad.  This isn't bash's 
fault (note the time spent in user mode is the same as on binary mounts, 
all the time is spent in sys), and so to me looks like a non-solution to 
the problem of bash not handling CRLF; to say nothing of the other 
issues with text mode mounts.

Looks like making igncr the default in Cygwin is a no-cost solution in 
terms of performance, and a big win for compatibility.

Has anyone else done anything like this?  Any flaws in my analysis?

Thanks for reading.

-Rob





--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019