X-Spam-Check-By: sourceware.org Message-ID: <45376CBF.1030104@byu.net> Date: Thu, 19 Oct 2006 06:17:03 -0600 From: Eric Blake User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Thunderbird/1.5.0.7 Mnenhy/0.7.4.666 MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: igncr vs text mode mounts, performance vs compatibility References: <1160655422743 DOT antti DOT nospam DOT 1605718 DOT wGO_WJ9D1NlId3tB-z6Qig AT luukku DOT com> <20061012123406 DOT GA30908 AT trixie DOT casa DOT cgf DOT cx> <452EA386 DOT 9010201 AT qualcomm DOT com> <20061012212011 DOT GA8535 AT trixie DOT casa DOT cgf DOT cx> <452EFDDB DOT 1010301 AT qualcomm DOT com> <452F8719 DOT 9060300 AT cygwin DOT com> <4536BC88 DOT 3030003 AT qualcomm DOT com> <4536C922 DOT 4090807 AT qualcomm DOT com> In-Reply-To: <4536C922.4090807@qualcomm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 According to Rob Walker on 10/18/2006 6:38 PM: > I looked into my scripts a little harder, have better results, some new > conclusions: Rob, please avoid http://cygwin.com/acronyms/#TOFU. Thanks for calculating some timings. > > ----------------------------------------------------- > line ending | mount mode | igncr | "user" time > ----------------------------------------------------- > CRLF | text | set | 1.0114s Here, both cygwin and bash are checking for \r (obviously, the bash check won't find any), and bash is forced to read the file one byte at a time. > ----------------------------------------------------- > CRLF | text | clear | 0.984s Slightly faster; bash is still forced to read one byte at a time, but it is not wasting efforts checking for \r. This matches the bash-3.1-6 behavior, regardless of mount point. > ----------------------------------------------------- > LF | text | set | 0.56995s > ----------------------------------------------------- > LF | text | clear | 0.5653s For these two, now bash can read a buffer at a time. OK, so bash's check for \r is in the noise compared to the speed penalty for slower file reads. > ----------------------------------------------------- > CRLF | bin | set | 0.59435s When bash must filter \r, the timing is still noticeable, making it even slower than a text mount that need not filter \r. > ----------------------------------------------------- > CRLF | bin | clear | whoops! > ----------------------------------------------------- > LF | bin | set | 0.5545s > ----------------------------------------------------- > LF | bin | clear | 0.5576s Indeed, as I predicted, LF only on binary mounts are as fast as you can get; the minor difference here on igncr is probably due to statistical variance. > > In the bin mode section (the Cygwin recommended mount mode): note here > that there's an approx 7% penalty between the most accomodating case > (CRLF on a binmode mount with igncr set) and the most restrictive case > (LF only on a bin mode mount with igncr clear). Less than 10% penalty > on this perverse benchmark (handling _nothing_ but linefeeds) seems like > a small price for compatibility. But there's also the issue of POSIX compatibility - ignoring \r is not POSIX compatible. And any speed penalty, however slight, that it noticeable in a benchmark, even if the penalty is in the noise for real life cases, is worth addressing - if everyone took the attitude that their patch was only 10% worse in the worst case, we'd have some slow programs. On the other hand, the complaint factor on the mailing list is a tangible factor, although much harder to objectively measure. If I make igncr the default (or more likely, if I make it depend on the state of POSIXLY_CORRECT), I will be noticeably saving myself time by not having to plow through so many emails from clueless users wondering why their CRLF scripts don't work on cygwin, since those same scripts won't work on Linux either. >> >> Are you saying that these people expect bash to treat CRLF as if the >> CR were non-whitespace? Can you give me an example where this would >> be a useful feature? It may not be a well-used feature, but I won't go so far as to call it not useful. One possible use - a script written with \n line endings, but which wants to intentionally generate an output file with \r\n line endings (this sounds like something sharutils might want to do). On Linux, literal \r in a here-doc get output to the file. So it stands to reason that someone might want to do the same action on cygwin when using a binary mount. Since cygwin's goal is to provide a Linux emulation, I don't see any reason to artificially limit cygwin by making bash always ignore \r; rather, I think it is only safe to ignore \r when explicitly told to do so (either by a text mount, or by using igncr). - -- Life is short - so eat dessert first! Eric Blake ebb9 AT byu DOT net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.1 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFN2y+84KuGfSFAYARAvs/AKDB1KWuMvOwVL7a2XRqapHpI0kO4QCeIv5U dhd/hrxm0UJUf1Cs0F0OFF4= =pKZR -----END PGP SIGNATURE----- -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/