Mailing-List: contact cygwin-help AT sourceware DOT cygnus DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT sources DOT redhat DOT com Delivered-To: mailing list cygwin AT sources DOT redhat DOT com Message-ID: <3BD59914.780FD423@syntrex.com> Date: Tue, 23 Oct 2001 18:21:40 +0200 From: Pavel Tsekov X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: 1.3.3-2: fseek fails on multiples of 1024 (binary mode) References: <20011021225629 DOT 17478 DOT qmail AT foolabs DOT com> <20011022150648 DOT B10383 AT redhat DOT com> <3BD556FE DOT 46350849 AT syntrex DOT com> <20011023141012 DOT G31068 AT cygbert DOT vinschen DOT de> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Upon further investigation this problem seems to be much bigger than I've initially thought and there is no easy patch to it (at least I think so) so I'll describe what've found about it. I started a patch though and will would like to know if you like my approach or will suggest another one. So here is what I've found: 1. The described problem appears only when you have buffering enabled. 2. It basically has to do with a intended optimisation which involves not reading the same buffered region twice, if the fseek call requests a offset which is in the buffered region. This optiomisation is performed only if the file is opened in READONLY mode. 3. The problem is to happen only if you fseek backwards from the current file position and the difference between the current file position and the new position is not greater than the buffer size used to buffer reads. A requirement for the problem to appear is to have two or more fseeks between no other operation between them. 4. This problem is likely to affect all BSD flavour OSes - the newlib stdio code is derived from BSD. I looked at the fseek code in OpenBSD and FreeBSD and I think it has the error. Now let me point you to the code location which introduces the problem: --> Target is where do we wont to go if (whence == SEEK_SET) target = offset; else { if (_fstat_r (ptr, fp->_file, &st)) goto dumb; target = st.st_size + offset; } --> What is actually done here up to the next --> mark is that fseek tries ot determine the position where the last read occured. fp->_r is number if bytes left in the buffer to be read, n = fp->_p - fp->_bf._base is the bytes from the buffer already read. n + fp->_r is the size of the buffer. By decrementing the current file positon with n + fp->_r the code expects to find the position where the last read occured. if (!havepos) { if (fp->_flags & __SOFF) curoff = fp->_offset; else { curoff = (*seekfn) (fp->_cookie, 0L, SEEK_CUR); if (curoff == POS_ERR) goto dumb; } curoff -= fp->_r; if (HASUB (fp)) curoff -= fp->_ur; } /* * Compute the number of bytes in the input buffer (pretending * that any ungetc() input has been discarded). Adjust current * offset backwards by this count so that it represents the * file offset for the first byte in the current input buffer. */ if (HASUB (fp)) { curoff += fp->_r; /* kill off ungetc */ n = fp->_up - fp->_bf._base; curoff -= n; n += fp->_ur; } else { n = fp->_p - fp->_bf._base; curoff -= n; n += fp->_r; } --> Here we try to determine if the position we request is in the buffered region. If we determnine erronosly that the requested position is in this buffer we got the error which started this thread. /* * If the target offset is within the current buffer, * simply adjust the pointers, clear EOF, undo ungetc(), * and return. (If the buffer was modified, we have to * skip this; see fgetline.c.) */ if ((fp->_flags & __SMOD) == 0 && target >= curoff && target < curoff + n) { register int o = target - curoff; fp->_p = fp->_bf._base + o; fp->_r = n - o; if (HASUB (fp)) FREEUB (fp); fp->_flags &= ~__SEOF; return 0; } Let me show you with some digits whats going on: We have 2048 bytes file. We have bufsize 1024. We fread 1024 bytes - this fills exactly the internal buffer thus 'n' will become 1024, fp->_r will become 0. We request then file position 0 from the end of file. ftell reports 2048. Now we want to go back to 1024 and read 8 bytes. fseek tries to optiomize since we are in read only mode ... it does this curoff = 2048 (current position) curoff -= r; 2048 - 0 = 2048 curoff -= n; 2048 - 1024 = 1024 Determines where last read occured n + r = 1024 + 0; Determines the upper limit of the internal buffer now the final check target >= curoff && target < curoff + n 1024 >= 1024 && 1024 < 1024 + 1024 is TRUE we read from the old buffer which is error. I think to fix the situation so this optimisation really works the best thing is to add a new member to FILE which is the last known read position and replace curoff in the last check with this member. However this requires much more work than just a simple patch - So do you think there is another (better) way to do this. Or perhaps just remove this optimisation at all ? Which is easy and will require chabges just to fseek. Thanks. Corinna Vinschen wrote: > > On Tue, Oct 23, 2001 at 01:39:42PM +0200, Pavel Tsekov wrote: > > I think I found the problem - is there anyone who can > > test a patch ? > > Just send it to the cygwin-patches mailing list, please. > > Thanks, > Corinna > -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/