delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp-workers/2001/10/09/19:01:44

Sender: rich AT phekda DOT freeserve DOT co DOT uk
Message-ID: <3BC38161.C1FBC8F1@phekda.freeserve.co.uk>
Date: Tue, 09 Oct 2001 23:59:45 +0100
From: Richard Dawe <rich AT phekda DOT freeserve DOT co DOT uk>
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.2.19 i586)
X-Accept-Language: de,fr
MIME-Version: 1.0
To: djgpp-workers AT delorie DOT com
Subject: Re: Resend: DJGPP and files > 2GB
References: <200110092052 DOT WAA16564 AT father DOT ludd DOT luth DOT se>
Reply-To: djgpp-workers AT delorie DOT com

Hello.

Martin Str|mberg wrote:
> 
> According to Richard Dawe:
> > An example of the problem is the handling of the st_size member of the
> > stat structure. st_size is an off_t. This means that the maximum size
> > that can be represented is 2GB - 1. Currently stat will return a large
> > negative number, since off_t is signed (which caused ls to output a
> > bogus size). If st_size cannot represent the file's size, stat is
> > supposed to return -1 and set errno to EOVERFLOW.
> >
> > I don't think the library's current behaviour is good - who knows what
> > effect silently returning bogus sizes for files >= 2GB will have?
> 
> As the possible values are -2^31 - 2^31-1 there no ambiguity (if
> you're aware of that a negative value != -1 is really a big positive
> one).

Sure, there's no ambiguity for stat alone. But if you report the file size
as > 2GB in stat, then you may not be able to manipulate some portions of
the file. E.g. you may want to use relative seeks to get to the top 2GB of
a file, but you can't, because off_t is a signed value and cannot be used
to represent a seek to > +2GB of the current position. How would you
interpret the seek to > +2GB, when you can't represent > +2GB, because
negative values are used for backwards seeking?

I think the idea is that the file's size should be representable as a
positive number in off_t, so that's there no doubt about its
interpretation, i.e. no casts are needed to use it as a file size.

> But I'm not against any improvements as that result might confuse
> people.

Or lead to ugly hacks in the code, like the fix to ls?

> How does the type offset_t fit into the ways of addressing the
> problem (if you know)?

(I presume you are mean off_t rather than offset_t.)

I just read the LFS document in more detail. It appears each file
descriptor has a offset maximum stored with it. Operations that would go
beyond this offset maximum fail and set errno to EOVERFLOW. Storing the
offset maximum in the file descriptor data allows the descriptor to be
used with objects compiled with different off_t sizes.

Offset maximum would be 2GB - 1 for us.

There are several different ways that a program could be compiled:

1. _LARGEFILE_SOURCE defined, compiled & linked with LFS flags to use a
large off_t;
2. _LARGEFILE_SOURCE defined, but compiled & linked with no LFS flags to
use a normal off_t;
3. _LARGEFILE64_SOURCE defined, compiled & linked with transitional
library.

The LFS flags are returned from getconf. There a CFLAGS, LDFLAGS and LIBS
variants, to pass on the compile & link lines. The example for case 1 from
the LFS doc is:

c89  $(getconf LFS_CFLAGS)  a.c         \
        $(getconf LFS_LDFLAGS)             \
        $(getconf LFS_LIBS)

and for case 2:

c89 -D_LARGEFILE_SOURCE     a.c

Case 2 allows you to use the functions fseeko and ftello, which return
EOVERFLOW, when the offset cannot be stored in an off_t.

I don't think it's worth implementing the transitional API, so I won't
explain 3. See the LFS document for more details.

But this doesn't answer your question about off_t size. The off_t size can
be controlled by defining _FILE_OFFSET_BITS to the number of bits in
off_t. If it's not defined, then the default size should be used. If it's
defined, but does not correspond to a supported size (e.g. 37 bits), then
an error should be generated.

The tricky part of supporting various sizes of off_t via _FILE_OFFSET_BITS
is that #undef lseek, say, is not allowed to change the size of off_t that
lseek takes. This means that you can't alias, say, lseek to lseek64 just
using a #define.

glibc appears to use some assembly magic to do the thunking. If the
compiler doesn't support this method, then it falls back on...#defines.

So it seems like this work can be split into stages:

1. Make the file descriptor functions aware of the offset maximum. Add
support for EOVERFLOW (and other error messages) as listed in LFS/POSIX
drafts.

2. Add support for larger off_t based on _FILE_OFFSET_BITS.

3. Add support for transitional API, if anyone cares.

BTW autoconf has tests for the large file features, which fileutils uses.

Bye, Rich =]

-- 
Richard Dawe
http://www.phekda.freeserve.co.uk/richdawe/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019