Mail Archives: cygwin/2001/10/23/12:50:40
Again, these observations should go to the mailing list which is
responsible for maintaining the code that you've analyzed:
newlib AT sources DOT redhat DOT com .
I've redirected this discussion there.
Thanks for your in-depth analysis of the problem.
cgf
On Tue, Oct 23, 2001 at 06:21:40PM +0200, Pavel Tsekov wrote:
>Upon further investigation this problem seems to be much bigger than
>I've initially thought and there is no easy patch to it (at least I
>think so) so I'll describe what've found about it. I started a patch
>though and will would like to know if you like my approach or will
>suggest another one.
>
>So here is what I've found:
>
>1. The described problem appears only when you have buffering enabled.
>
>2. It basically has to do with a intended optimisation which involves
>not reading the same buffered region twice, if the fseek call requests
>a offset which is in the buffered region. This optiomisation is
>performed only if the file is opened in READONLY mode.
>
>3. The problem is to happen only if you fseek backwards from the
>current file position and the difference between the current file
>position and the new position is not greater than the buffer size used
>to buffer reads. A requirement for the problem to appear is to have
>two or more fseeks between no other operation between them.
>
>4. This problem is likely to affect all BSD flavour OSes - the newlib
>stdio code is derived from BSD. I looked at the fseek code in OpenBSD
>and FreeBSD and I think it has the error.
>
>Now let me point you to the code location which introduces the problem:
>
>--> Target is where do we wont to go
>
> if (whence == SEEK_SET)
> target = offset;
> else
> {
> if (_fstat_r (ptr, fp->_file, &st))
> goto dumb;
> target = st.st_size + offset;
> }
>
>--> What is actually done here up to the next --> mark is that fseek
> tries ot determine the position where the last read occured. fp->_r is
> number if bytes left in the buffer to be read, n = fp->_p -
> fp->_bf._base is the bytes from the buffer already read. n + fp->_r is
> the size of the buffer. By decrementing the current file positon with
> n + fp->_r the code expects to find the position where the last read
> occured.
>
> if (!havepos)
> {
> if (fp->_flags & __SOFF)
> curoff = fp->_offset;
> else
> {
> curoff = (*seekfn) (fp->_cookie, 0L, SEEK_CUR);
> if (curoff == POS_ERR)
> goto dumb;
> }
> curoff -= fp->_r;
> if (HASUB (fp))
> curoff -= fp->_ur;
> }
>
> /*
> * Compute the number of bytes in the input buffer (pretending
> * that any ungetc() input has been discarded). Adjust current
> * offset backwards by this count so that it represents the
> * file offset for the first byte in the current input buffer.
> */
>
> if (HASUB (fp))
> {
> curoff += fp->_r; /* kill off ungetc */
> n = fp->_up - fp->_bf._base;
> curoff -= n;
> n += fp->_ur;
> }
> else
> {
> n = fp->_p - fp->_bf._base;
> curoff -= n;
> n += fp->_r;
> }
>
>--> Here we try to determine if the position we request
> is in the buffered region. If we determnine erronosly
> that the requested position is in this buffer we got the
> error which started this thread.
>
> /*
> * If the target offset is within the current buffer,
> * simply adjust the pointers, clear EOF, undo ungetc(),
> * and return. (If the buffer was modified, we have to
> * skip this; see fgetline.c.)
> */
>
> if ((fp->_flags & __SMOD) == 0 &&
> target >= curoff && target < curoff + n)
> {
> register int o = target - curoff;
>
> fp->_p = fp->_bf._base + o;
> fp->_r = n - o;
> if (HASUB (fp))
> FREEUB (fp);
> fp->_flags &= ~__SEOF;
> return 0;
> }
>
>Let me show you with some digits whats going on:
>
>We have 2048 bytes file. We have bufsize 1024. We fread 1024 bytes -
>this fills exactly the internal buffer thus 'n' will become 1024,
>fp->_r will become 0. We request then file position 0 from the end of
>file. ftell reports 2048. Now we want to go back to 1024 and read 8
>bytes. fseek tries to optiomize since we are in read only mode ... it
>does this
>
>curoff = 2048 (current position)
>
>curoff -= r; 2048 - 0 = 2048
>curoff -= n; 2048 - 1024 = 1024 Determines where last read occured
>
>n + r = 1024 + 0; Determines the upper limit of the internal buffer
>
>now the final check
>
>target >= curoff && target < curoff + n
>1024 >= 1024 && 1024 < 1024 + 1024 is TRUE
>
>we read from the old buffer which is error.
>
>I think to fix the situation so this optimisation really works the best
>thing is to add a new member to FILE which is the last known read
>position and replace curoff in the last check with this member.
>However this requires much more work than just a simple patch - So do
>you think there is another (better) way to do this. Or perhaps just
>remove this optimisation at all ? Which is easy and will require
>chabges just to fseek.
>
>Corinna Vinschen wrote:
>>
>>On Tue, Oct 23, 2001 at 01:39:42PM +0200, Pavel Tsekov wrote:
>>>I think I found the problem - is there anyone who can test a patch ?
>>
>>Just send it to the cygwin-patches mailing list, please.
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Bug reporting: http://cygwin.com/bugs.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
- Raw text -