delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2006/09/13/10:33:47

X-Spam-Check-By: sourceware.org
To: cygwin AT cygwin DOT com
From: Eric Blake <ebb9 AT byu DOT net>
Subject: Re: =?utf-8?b?YmFzaC0zLjEtNxskQiEhGyhCQlVH?=
Date: Wed, 13 Sep 2006 14:31:22 +0000 (UTC)
Lines: 34
Message-ID: <loom.20060913T160909-692@post.gmane.org>
References: <091320060438 DOT 11140 DOT 45078B490008FD8600002B8422007610640A050E040D0C079D0A AT comcast DOT net> <20060913052510 DOT GB1256 AT trixie DOT casa DOT cgf DOT cx>
Mime-Version: 1.0
User-Agent: Loom/3.14 (http://gmane.org/)
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

Christopher Faylor <cgf-no-personal-reply-please <at> cygwin.com> writes:

> Is bash assuming that it can read N characters and then subtract M
> characters from the current position to get back to the beginning of a
> line?  If so, hmm.  I guess this explains why it was reading a byte at a
> time before.  It must be counting characters rather than calling lseek
> to figure out where it is.

Yes, indeed, and it seems like reasonable semantics to expect as well 
(nevermind that it means that text mode on a seekable file involves a lot more 
processing, to consistently present the user with character count instead of 
byte offset).  When a file is seekable, bash reads a buffer at a time for 
speed, but then must reseek to the offset where it last processed input before 
invoking any subprocesses, since POSIX requires that seekable files be left in 
a consistent state when swapping between multiple handles to the same 
underlying file description (even if the multiple handles exist in separate 
processes).  When using stdio (such as fread and fseek), this works due to code 
in newlib (see __SCLE in stdio.h).  But bash uses low-level Unix I/O, and does 
not benefit from newlib's approach.  In a binary mount, seeking backwards by 
the character offset from where bash has processed to the end of the buffer it 
has read just works.  It is only in a text mount where having lseek report the 
binary offset within the file, rather than the character offset, is causing 
problems.  So I will probably end up reinstating a form of the previous #ifdef 
__CYGWIN__ check for is_seekable in bash 3.1-8 to chek whether a file is in 
text mode, in which case it is non-seekable; that is certainly a faster 
solution than waiting for cygwin to make a change for lseek on a text file to 
consistently use a character offset.  But I intend that on binary files, \r\n 
line endings will treat the \r as part of the line, so at least binary mounts 
won't suffer from the speed impact of treating a file as unseekable the way 
bash 3.1-6 does.

-- 
Eric Blake



--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019