delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp-workers/2012/09/28/17:55:28

X-Authentication-Warning: delorie.com: mail set sender to djgpp-workers-bounces using -f
X-Recipient: djgpp-workers AT delorie DOT com
X-Authenticated: #27081556
X-Provags-ID: V01U2FsdGVkX18TJsJLn3sThvzyeBXuocizZueIzz8Et6r2kREDYz
nGh40KAwKtFxGu
From: Juan Manuel Guerrero <juan DOT guerrero AT gmx DOT de>
To: djgpp-workers AT delorie DOT com
Subject: Re: djtar and pax/posix headers
Date: Fri, 28 Sep 2012 23:56:21 +0200
User-Agent: KMail/1.9.10
References: <201209280053 DOT 02782 DOT juan DOT guerrero AT gmx DOT de> <83fw62hatk DOT fsf AT gnu DOT org>
In-Reply-To: <83fw62hatk.fsf@gnu.org>
MIME-Version: 1.0
Message-Id: <201209282356.21546.juan.guerrero@gmx.de>
X-Y-GMX-Trusted: 0
Reply-To: djgpp-workers AT delorie DOT com

Am Freitag, 28. September 2012 schrieb Eli Zaretskii:

> > From: Juan Manuel Guerrero <juan DOT guerrero AT gmx DOT de>
> > Date: Fri, 28 Sep 2012 00:53:02 +0200
> > 
> > At least the GNU tar program distributed with the linux distribution I use, has
> > already as default format pax/posix as decribed in:
> >   <http://pubs.opengroup.org/onlinepubs/009695399/utilities/pax.html#tag_04_100_13_02>
> > This program produces tar archives that if extracted with djtar and other older
> > tar programs clobbers the directory where the files are created with directories
> > and files corresponding to the pax header data blocks.  This extra directories
> > are called ./PaxHeader.NNNNN, where NNNNN stands for some number.  AFAIK the
> > information stored in the pax headers are of no use for the plain DOS file
> > system.
> 
> You are right.  The question is, does the untar program itself need
> these headers to untar correctly?  (I don't know the answer.)  If it
> doesn't, then skipping them is TRT.

No, the untar program shall never require the pax header to work properly.
The standard mandates that older tar program shall treat unrecognized
typeflag values as regular files, this allowed the format to include
all the extended information in a pseudo-regular file that preceded
each real file.  This is exactly what happens when djtar tries to extract
a tar archive containing pax headers.  The pax headers are not recognized
as extension of the ustar (aka normal or old) headers that follows the pax
header but as header of a "real" file composed by the data blocks following
inmediatly after the pax header.  There is always a "normal or old style"
header preceeding the file and it contains all information to create the file.
Even if we assume that the pax header is used to store a _very_ large
path exceeding the 100 characters in the name field and the 155 characters
in the prefix field this may already exceed the possibilities of a FAT file
system so that the information in the pax header becomes useless and ignoring
it will not break anything that is not already brocken.  This means it will
fail anyway no matter if the header is ignored or not.  If the header is used
to store names in some locale different than the C locale it will also break
the possiblities of djtar to extract the archive contents.
AFAIK the ustar header will always have valid values for its flags and
entries that may become replaced by the data supplied by the pax headers
iff the "format reading tool" aka tar program is able to decode this
additional information.
Please note that this is my reading of the standard and from what I have
seen in the GNU tar sources.  Especially because I am not a native
english speaker I may have missed something in the standard.
If I am wrong, please let me know.


> > If someone wants a tar program with full pax/posix support
> > it will be better to port latest GNU tar or GNU pax than trying to implement
> > this support in djtar.  I tested this patch with all formats produced by
> > GNU tar 1.26.
> 
> If we are looking at enhancing our Tar support beyond djtar, I'd
> suggest to look at libarchive and the bsdtar and bsdcpio programs in
> there.  Unlike the current GNU Tar maintainers, who are quite
> unfriendly to non-Posix platforms, libarchive is very friendly, and in
> fact already supports a Windows port out of the box (well, almost: I
> needed a few patches when I built it last year, which were readily
> accepted by the maintainers).  The advantage of bsdtar is that it
> supports every archive format out there, including xz, rpm, rar, 7z,
> etc. etc. -- you name it, it's already there.  Another advantage is
> that it performs all the compression/decompression internally, instead
> of invoking external programs via unportable pipes, which is a real
> win for DOS ports, where pipes are synchronous.

Sounds more promessing than trying to port GNU tar again to DJGPP.
I will look at libarchive when I decide to start a tar port.  Of course,
anyone else is welcome to that job instead of me.


> > AS usual suggestions, objections, comments are welcome.
>
> I have only one question: should djtar display anything about the pax
> headers it encounters?  (Maybe it already does; I didn't try the
> patch.)

No, the patched djtar did not output any information about pax headers
contained in the archive.  They were skipped silently.

I have modified the code so that some pax header specific information
will be displayed if the -v flag is passed.  If the -v flag is omitted
no warning at all is given to the user.  If the -v flag is passed then
the output will look like this:


prompt> djtar.exe -v _pax.tar


-- `_pax.tar' is uncompressed --

00000000    644 Sep 26 18:44:50 2012        90 ./PaxHeaders.10074/dir  [extended header + 1 data block(s) skipped]
00000400    755 Sep 26 18:44:50 2012         0 dir/
00000600    644 Sep 26 18:22:13 2012        90 dir/PaxHeaders.10074/dir2  [extended header + 1 data block(s) skipped]
00000a00    755 Sep 26 18:22:13 2012         0 dir/dir2/
00000c00    644 Sep 26 18:22:13 2012        90 dir/dir2/PaxHeaders.10074/file2  [extended header + 1 data block(s) skipped]
00001000    644 Sep 26 18:22:13 2012         0 dir/dir2/file2
00001200    644 Sep 26 21:03:04 2012        90 dir/PaxHeaders.10074/file  [extended header + 1 data block(s) skipped]
00001600    644 Sep 26 21:03:04 2012         4 dir/file
00001a00    644 Sep 26 21:03:04 2012        90 dir/PaxHeaders.10074/linkfile  [extended header + 1 data block(s) skipped]
00001e00    644 Sep 26 21:03:04 2012         0 dir/linkfile link to dir/file
00002000    644 Sep 26 18:22:08 2012        90 dir/PaxHeaders.10074/dir1  [extended header + 1 data block(s) skipped]
00002400    755 Sep 26 18:22:08 2012         0 dir/dir1/
00002600    644 Sep 26 21:03:13 2012        89 dir/dir1/PaxHeaders.10074/file1  [extended header + 1 data block(s) skipped]
00002a00    644 Sep 26 21:03:13 2012         5 dir/dir1/file1


E.g.: there is no global extended header but one extended header (PaxHeaders.10074)
followed by one data block (512 bytes long, 90 bytes used) preceeding the ustar
header (at 0x0400) of the directory "dir".  The 90 bytes encode atime, mtime
and ctime of the directory "dir"; 30 bytes per timestamp.


If no -v flag is supplied no warning at all will be issued and the output
will look like it always used to look:

prompt> djtar.exe _pax.tar


-rwx Sep 26 18:44:50 2012         0 dir/
-rwx Sep 26 18:22:13 2012         0 dir/dir2/
-rw- Sep 26 18:22:13 2012         0 dir/dir2/file2
-rw- Sep 26 21:03:04 2012         4 dir/file
-rw- Sep 26 21:03:04 2012         0 dir/linkfile link to dir/file
-rwx Sep 26 18:22:08 2012         0 dir/dir1/
-rw- Sep 26 21:03:13 2012         5 dir/dir1/file1



If something different is prefered, please let me know.


Regards,
Juan M. Guerrero



Logging in to :pserver:anonymous AT cvs DOT delorie DOT com:2401/cvs/djgpp
Index: djgpp/src/docs/kb/wc204.txi
===================================================================
RCS file: /cvs/djgpp/djgpp/src/docs/kb/wc204.txi,v
retrieving revision 1.201
diff -U 5 -r1.201 wc204.txi
--- djgpp/src/docs/kb/wc204.txi	22 Jan 2012 23:40:28 -0000	1.201
+++ djgpp/src/docs/kb/wc204.txi	28 Sep 2012 21:34:53 -0000
@@ -1244,5 +1244,9 @@
 @findex STYP_NRELOC_OVFL AT r{, new flag bit added to @code{s_flags} of @acronym{COFF} section header}
 The @code{s_flags} of the @acronym{COFF} section header now honors the new @code{STYP_NRELOC_OVFL} bit
 that signals that the section contains extended relocations and that the @code{s_nreloc} counter has
 overflown.  The bit set in case of overflow by @code{STYP_NRELOC_OVFL} is @code{0x01000000}.
 
+@pindex djtar AT r{, support for @code{tar} archives with @code{pax} headers}
+The djtar program can now unpack @code{tar} archives that contain @code{pax} headers
+conforming to @acronym{POSIX} 1003.1-2001.  The @code{pax} headers are always skipped
+and their contents are ignored.
Index: djgpp/src/utils/utils.tex
===================================================================
RCS file: /cvs/djgpp/djgpp/src/utils/utils.tex,v
retrieving revision 1.24
diff -U 5 -r1.24 utils.tex
--- djgpp/src/utils/utils.tex	10 Jan 2004 21:55:49 -0000	1.24
+++ djgpp/src/utils/utils.tex	28 Sep 2012 21:34:54 -0000
@@ -220,10 +220,17 @@
 exclusive open of the given file (it will refuse to overwrite an
 existing file), it will prompt you for a new name.  You may type in
 either a complete path, a replacement file name (no directory part), or
 just hit return (the file is skipped).
 
+If a @code{tar} archive contains @code{pax} extended headers as defined
+by @acronym{POSIX} 1003.1-2001 @command{djtar} will skip them and ignore
+any information contained in the data blocks that may follow the @code{pax}
+headers.  If you specify the @samp{-v} switch, the names of the headers,
+the number of data blocks following the header and the position of the
+header in the @code{tar} archiv will be shown. 
+
 If @command{djtar} is called as @command{djtart}, it behaves as if it were
 called with the @samp{-t} switch; when called as @command{djtarx}, it
 behaves like @command{djtar -x}.  Thus you can create 2 links to
 @file{djtar.exe} which will save you some typing.
 
@@ -252,11 +259,13 @@
 @item -v
 
 This option modifies the output format slightly to aid in debugging tar
 file problems.  It also causes @command{djtar} to emit more verbose warning
 messages and print the compression method for compressed archives.
-
+If the @code{tar} archive contains extended @code{pax} extended headers
+their name and number of following data blocks will be displayed.
+ 
 @item -.
 
 Enable the automatic conversion of dots to underscores and dashes.  This
 is the default.
 
Index: djgpp/src/utils/djtar/untar.c
===================================================================
RCS file: /cvs/djgpp/djgpp/src/utils/djtar/untar.c,v
retrieving revision 1.10
diff -U 5 -r1.10 untar.c
--- djgpp/src/utils/djtar/untar.c	24 Sep 2012 18:46:12 -0000	1.10
+++ djgpp/src/utils/djtar/untar.c	28 Sep 2012 21:34:54 -0000
@@ -32,30 +32,84 @@
 extern int list_only;
 
 extern FILE *log_out;
 
 /*------------------------------------------------------------------------*/
+/* tar Header Block, from POSIX 1003.1-1990.  */
 
-typedef struct {
-  char name[100];
-  char operm[8];
-  char ouid[8];
-  char ogid[8];
-  char osize[12];
-  char otime[12];
-  char ocsum[8];
-  char flags[1];
-  char filler[355];
+/* POSIX header.  */
+
+typedef struct posix_header
+{                            /* byte offset */
+  char name[100];            /*   0 */
+  char mode[8];              /* 100 */
+  char uid[8];               /* 108 */
+  char gid[8];               /* 116 */
+  char size[12];             /* 124 */
+  char mtime[12];            /* 136 */
+  char chksum[8];            /* 148 */
+  char typeflag;             /* 156 */
+  char linkname[100];        /* 157 */
+  char magic[6];             /* 257 */
+  char version[2];           /* 263 */
+  char uname[32];            /* 265 */
+  char gname[32];            /* 297 */
+  char devmajor[8];          /* 329 */
+  char devminor[8];          /* 337 */
+  char prefix[155];          /* 345 */
+  char filler[12];           /* 500 */
+                             /* 512 */
 } TARREC;
 
+
+#define NAME_FIELD_SIZE     100
+#define PREFIX_FIELD_SIZE   155
+#define FIRST_CHKSUM_OCTET  148
+#define LAST_CHKSUM_OCTET   155
+
+
+#define IS_USTAR_HEADER(magic)  ((magic)[0] == 'u' &&  \
+                                 (magic)[1] == 's' &&  \
+                                 (magic)[2] == 't' &&  \
+                                 (magic)[3] == 'a' &&  \
+                                 (magic)[4] == 'r' &&  \
+                                 (magic)[5] == '\0')
+
+#define IS_PAX_HEADER(h)        ((((h).typeflag == XGLTYPE) || ((h).typeflag == XHDTYPE)) &&  \
+                                 IS_USTAR_HEADER((h).magic))
+
+#define IS_CHKSUM_OCTET(d)      ((d) > (FIRST_CHKSUM_OCTET - 1) &&  \
+                                 (d) < (LAST_CHKSUM_OCTET + 1))
+
+
+/* tar files are made in basic blocks of this size.  */
+#define BLOCKSIZE 512
+
+
+/* Values used in typeflag field.  */
+#define REGTYPE  '0'    /* regular file */
+#define AREGTYPE '\0'   /* regular file */
+#define LNKTYPE  '1'    /* link */
+#define SYMTYPE  '2'    /* reserved */
+#define CHRTYPE  '3'    /* character special */
+#define BLKTYPE  '4'    /* block special */
+#define DIRTYPE  '5'    /* directory */
+#define FIFOTYPE '6'    /* FIFO special */
+#define CONTTYPE '7'    /* reserved */
+
+#define XHDTYPE  'x'    /* Extended header referring to the
+                           next file in the archive */
+#define XGLTYPE  'g'    /* Global extended header */
+
+
 static TARREC header;
 static int error_message_printed;
 static int looking_for_header;
 static char *changed_name;
 static int first_block = 1;
 static File_type file_type = DOS_BINARY;
-static long perm, uid, gid, size;
+static long mode, uid, gid, size;
 static long posn = 0;
 static time_t ftime;
 static struct ftime ftimes;
 static struct tm *tm;
 static int r;
@@ -69,11 +123,11 @@
   int should_be_written, batch_file_processing = 0;
 
   while (buf_size)
   {
     int write_errno = 0;
-    int dsize = 512, wsize;
+    int dsize = BLOCKSIZE, wsize;
 
     if (skipping)
     {
       if (skipping <= buf_size)
       {
@@ -86,23 +140,65 @@
           return 0;
       }
       else
       {
         bytes_out += buf_size;
-        skipping -= buf_size;
+        skipping  -= buf_size;
         return 0;
       }
     }
 
     if (looking_for_header)
     {
+      char name[PREFIX_FIELD_SIZE + 1 + NAME_FIELD_SIZE + 1];
       char *extension;
       int head_csum = 0;
       int i;
       size_t nlen;
 
       memcpy(&header, buf, sizeof header);
+
+      /* Skip global extended and extended pax headers.  */
+      if (IS_PAX_HEADER(header))
+      {
+        /*
+         *  The pax header block is identical to a ustar header block
+         *  except that two additional typeflag values are defined:
+         *    x: represents extended header records for the following
+         *       file in the archive (with its one ustar header block).
+         *    g: represents global extended header records for the
+         *       following files in the archive.
+         *
+         *  Skip header plus all following pax data blocks.
+         */
+
+        sscanf(header.mode, " %lo", &mode);
+        sscanf(header.size, " %lo", &size);
+        sscanf(header.mtime, " %o", &ftime);
+        memcpy(name, header.name, sizeof header.name);
+        name[sizeof header.name] = '\0';
+
+        skipping = (size + (BLOCKSIZE - 1)) & ~(BLOCKSIZE - 1);
+
+        if (v_switch)
+        {
+          fprintf(log_out, "%08lx %6lo %.20s %9ld %s", posn, mode, ctime(&ftime) + 4, size, name);
+          if (header.typeflag == XGLTYPE)
+            fprintf(log_out, "  [global extended header + ");
+          else if (header.typeflag == XHDTYPE)
+            fprintf(log_out, "  [ extended header + ");
+          fprintf(log_out, "%d data block(s) skipped ]\n", skipping / BLOCKSIZE);
+        }
+
+        posn += BLOCKSIZE + skipping;
+        buf += sizeof header;
+        buf_size -= sizeof header;
+        bytes_out += sizeof header;
+
+        continue;
+      }
+
       if (header.name[0] == 0)
       {
         bytes_out += buf_size;  /* assume everything left should be counted */
         return EOF;
       }
@@ -118,20 +214,20 @@
          so we will extract them with DOS-style EOL. */
       extension = strrchr(basename(header.name), '.');
       if (extension && !stricmp(extension, ".bat"))
         batch_file_processing = 1;  /* LF -> CRLF */
 
-      sscanf(header.operm, " %lo", &perm);
-      sscanf(header.ouid, " %lo", &uid);
-      sscanf(header.ogid, " %lo", &gid);
-      sscanf(header.osize, " %lo", &size);
-      sscanf(header.otime, " %o", &ftime);
-      sscanf(header.ocsum, " %o", &head_csum);
+      sscanf(header.mode, " %lo", &mode);
+      sscanf(header.uid, " %lo", &uid);
+      sscanf(header.gid, " %lo", &gid);
+      sscanf(header.size, " %lo", &size);
+      sscanf(header.mtime, " %o", &ftime);
+      sscanf(header.chksum, " %o", &head_csum);
       for (i = 0; i < (int)(sizeof header); i++)
       {
         /* Checksum on header, but with the checksum field blanked out.  */
-        int j = (i > 147 && i < 156) ? ' ' : *((unsigned char *)&header + i);
+        int j = IS_CHKSUM_OCTET(i) ? ' ' : *((unsigned char *)&header + i);
 
         head_csum -= j;
       }
       if (head_csum && !ignore_csum)
       {
@@ -147,55 +243,72 @@
         looking_for_header = 1;
         bytes_out += buf_size;
         return EOF;
       }
 
-      changed_name = get_new_name(header.name, &should_be_written);
+      /* Accept file names as specified by
+         POSIX.1-1996 section 10.1.1.  */
+      changed_name = name;
+      if (header.prefix[0] && IS_USTAR_HEADER(header.magic))
+      {
+        /*
+         *  A new pathname shall be formed by concatenating
+         *  prefix (up to the first NUL character), a slash
+         *  character, and name; otherwise, name is used alone.
+         */
+        size_t len = sizeof header.prefix;
+        memcpy(changed_name, header.prefix, len);
+        changed_name[len] = '/';
+        changed_name += ++len;
+      }
+      memcpy(changed_name, header.name, sizeof header.name);
+      changed_name[sizeof header.name] = '\0';
+
+      changed_name = get_new_name(name, &should_be_written);
 
       if (v_switch)
-        fprintf(log_out, "%08lx %6lo ", posn, perm);
+        fprintf(log_out, "%08lx %6lo ", posn, mode);
       else
         fprintf(log_out, "%c%c%c%c ",
-                S_ISDIR(perm)  ? 'd' : header.flags[0] == '2' ? 'l' : '-',
-                perm & S_IRUSR ? 'r' : '-',
-                perm & S_IWUSR ? 'w' : '-',
-                perm & S_IXUSR ? 'x' : '-');
+                S_ISDIR(mode)  ? 'd' : header.typeflag == SYMTYPE ? 'l' : '-',
+                mode & S_IRUSR ? 'r' : '-',
+                mode & S_IWUSR ? 'w' : '-',
+                mode & S_IXUSR ? 'x' : '-');
       fprintf(log_out, "%.20s %9ld %s", ctime(&ftime) + 4, size, changed_name);
 #if 0
       fprintf(log_out, "(out: %ld)", bytes_out);
 #endif
-      if (header.flags[0] == '2')
-        fprintf(log_out, " -> %s", header.filler);
-      else if (header.flags[0] == '1')
-        fprintf(log_out, " link to %s", header.filler);
+      if (header.typeflag == SYMTYPE)
+        fprintf(log_out, " -> %s", header.linkname);
+      else if (header.typeflag == LNKTYPE)
+        fprintf(log_out, " link to %s", header.linkname);
       fprintf(log_out, "%s\n",
               !should_be_written && !list_only ? "\t[ skipped ]" : "");
-      posn += 512 + ((size + 511) & ~511);
+      posn += BLOCKSIZE + ((size + (BLOCKSIZE - 1)) & ~(BLOCKSIZE - 1));
 #if 0
-      fprintf(log_out, "%6lo %02x %12ld %s\n", perm, header.flags[0], size, changed_name);
+      fprintf(log_out, "%6lo %02x %12ld %s\n", mode, header.typeflag, size, changed_name);
 #endif
 
-      if (header.flags[0] == '1' || header.flags[0] == '2')
+      if (header.typeflag == LNKTYPE || header.typeflag == SYMTYPE)
       {
         /* Symbolic links always have zero data, but some broken
            tar programs claim otherwise.  */
         size = 0;
       }
       if (should_be_written == 0)
       {
-        skipping = (size + 511) & ~511;
-        if (!skipping)	/* an empty file or a directory */
+        skipping = (size + (BLOCKSIZE - 1)) & ~(BLOCKSIZE - 1);
+        if (!skipping)    /* an empty file or a directory */
         {
           looking_for_header = 1;
           if (buf_size < (long)(sizeof header))
             return 0;
         }
         continue;
       }
       else if ((changed_name[nlen = strlen(changed_name) - 1] == '/'
-                || header.flags[0] == '5') /* '5' flags a directory */
-               && !to_stdout)
+                || header.typeflag == DIRTYPE) && !to_stdout)
       {
         if (changed_name != new)
         {
           memcpy(new, changed_name, nlen + 2);
           changed_name = new;
@@ -224,11 +337,11 @@
           {
             if (change(changed_name, "Cannot exclusively open file", 0))
               goto open_file;
             else
             {
-              skipping = (size + 511) & ~511;
+              skipping = (size + (BLOCKSIZE - 1)) & ~(BLOCKSIZE - 1);
               continue;
             }
           }
         }
         else
@@ -246,16 +359,16 @@
       char tbuf[1024];
       char *wbuf = buf;
 
       if (buf_size <= 0)    /* this buffer exhausted */
         return 0;
-      if (size < 512)
+      if (size < BLOCKSIZE)
         dsize = size;
-      else if (buf_size < 512)
+      else if (buf_size < BLOCKSIZE)
         dsize = buf_size;
       else
-        dsize = 512;
+        dsize = BLOCKSIZE;
       if (batch_file_processing && !to_tty)
       {
         /* LF -> CRLF.
            Note that we don't alter the original uncompressed
            data so as not to screw up the CRC computations.  */
@@ -285,12 +398,12 @@
           /* If they asked for text files to be written Unix style, or
              we are writing to console, remove the CR and ^Z characters
              from DOS text files.
              Note that we don't alter the original uncompressed data so
              as not to screw up the CRC computations.  */
-          char *s=buf, *d=tbuf;
-          while (s-buf < dsize)
+          char *s = buf, *d = tbuf;
+          while (s - buf < dsize)
           {
             if (*s != '\r' && *s != 26)
               *d++ = *s;
             s++;
           }
@@ -329,24 +442,25 @@
       ftimes.ft_day = tm->tm_mday;
       ftimes.ft_month = tm->tm_mon + 1;
       ftimes.ft_year = tm->tm_year - 80;
       setftime(r, &ftimes);
       close(r);
-      chmod(changed_name, perm);
+      chmod(changed_name, mode);
     }
     batch_file_processing = 0;
     looking_for_header = 1;
     if (write_errno == ENOSPC)  /* target disk full: quit early */
     {
       bytes_out += buf_size;
       return EOF;
     }
     else if (write_errno)       /* other error: skip this file, try next */
-      skipping = (size - dsize + 511) & ~511;
-    else    /* skip the slack garbage to the next 512-byte boundary */
-      skipping = 512 - dsize;
+      skipping = (size - dsize + (BLOCKSIZE - 1)) & ~(BLOCKSIZE - 1);
+    else    /* skip the slack garbage to the next BLOCKSIZE-byte boundary */
+      skipping = BLOCKSIZE - dsize;
   }
+
   return 0;
 }
 
 /*------------------------------------------------------------------------*/
 

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019