Mailing-List: contact cygwin-apps-help AT sourceware DOT cygnus DOT com; run by ezmlm
Sender: cygwin-apps-owner AT sourceware DOT cygnus DOT com
Date: Thu, 4 Oct 2001 21:20:30 -0400
From: Christopher Faylor <cgf AT redhat DOT com>
To: cygwin-patches AT cygwin DOT com
Cc: cygwin-apps AT cygwin DOT com
Subject: Re: File handling in setup.exe
Message-ID: <20011004212030.C1118@redhat.com>
Reply-To: cygwin-apps AT cygwin DOT com
Mail-Followup-To: cygwin-patches AT cygwin DOT com, cygwin-apps AT cygwin DOT com
References: <3BBD05EB DOT 2357D53A AT etr-usa DOT com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <3BBD05EB.2357D53A@etr-usa.com>
User-Agent: Mutt/1.3.21i

FWIW, I really like what you've proposed.  It feels right.

Feel free to create a branch in the cinstall directory using the tools
in the winsup/maint directory, if you'd like to get working on a real
proof of concept.

Although, I guess we should wait for a little more input first.

Btw, since there was not patch associated with this message, I redirected
this to cygwin-apps which is slightly more appropriate than cygwin-patches.
cygwin-developers has been the list where setup.exe was most often discussed
but maybe we should move setup.exe discussions to cygwin-apps to spare setup
developers from cygwin DLL low-level discussions.

cgf

On Thu, Oct 04, 2001 at 06:59:23PM -0600, Warren Young wrote:
>This is regarding the *.cwp stuff that was discussed last month.  It was
>agreed that my initial patch had good ideas, but that as long as I was
>in there, I might as well clean up the code some.  I've looked into the
>code, and have realized that I need some input before proceeding.
>
>My initial idea when I agreed to take this on was to just refactor and
>OOP-ify the code around tar.cc some.  I can do that, but some comments
>from Robert Collins got me on the track of looking into handling
>alternate sources for package files.
>
>This implies some kind of link between archive handling and the current
>NetIO hierarchy.  This would also require changes to geturl.cc and the
>code that calls functions in geturl.cc.  The foremost issue is, should I
>be chasing this at all, or should I simply refactor the tar handling
>mechanism as it exists right now?  
>
>If we want a Grand Refactoring and not just some reworking of tar.cc and
>friends, here's my proposal:
>
>I assume that reading packages from the network would be useful for
>allowing setup.exe to install directly from the network, without writing
>the packages out to disk first as it does today.  Yet, we need to keep
>that "caching" mechanism somehow, because it's useful.  Currently, file
>handling logic exists in geturl.cc, nio-file.cc, tar.cc, and probably
>other places.  To deal with all that, I have in mind something like
>this:
>
>class Source {
>public:
>	Source(out_pathname);
>	virtual int read(buffer, size);
>	virtual int write(buffer, size);
>
>	...
>private:
>	Source() { }	// can't create Source objects directly
>
>	FILE* fp_out;
>};
>
>class HTTPSource : public Source {
>public:
>	HTTPSource(in_url, out_pathname = 0);
>	...
>};
>
>
>By default, Source reads data from a file and has the option to cache
>the data it reads out to another file.  (If out_pathname == 0, the data
>isn't cached to a file as it's read.)  Subclasses override the
>constructor and read() to retrieve data from various network sources.
>(HTTP, FTP, WinInet.dll, etc.)  When reading straight from a file, you
>would set the Source to non-cacheable, but when reading via HTTP, you
>could elect to either cache the data to a file, or simply read the data
>in without caching it.
>
>This implies a fairly major refactoring all by itself.  As I stated
>above, there's a lot of code that assumes that it can write data out to
>disk and read it back.  My proposal would mean that everything deals
>with Source objects.  Because the data may not be cached, you'd want to
>keep the data pipeline simple: in the HTTP case, you'd read the data
>from the network, pass it to the gz/bz unpacker, and pass that stream to
>the tar file unpacker.  That is, go from initial network connection open
>to final unpacking, all in one operation.
>
>This implies two other class hierarchies:
>
>class Decomp {	// a cleaned-up version of class gzbz from tar.cc
>public:
>	// this is decomp_factory(), from my original patch
>	static Decomp* factory(Source*)
>
>	~Decomp();		// gzbz::close()
>
>	virtual int read(buf, len) = 0;
>	virtual off_t tell() = 0;
>
>protected:
>	FILE* fp;
>
>private:
>	Decomp(Source*);
>};
>
>class GZDecomp : public Decomp ...
>class BZDecomp : public Decomp ...
>
>
>class Archive {
>public:
>	Archive(Decomp*);
>
>	virtual int read(buf, len) = 0;
>	virtual off_t tell() = 0;
>	virtual const char* next_file_name() = 0;
>};
>
>class TarArchive : public Archive ...
>class RPMArchive : public Archive ...
>
>
>These are just "sketches" to give you an idea of where I'm headed with
>all this.  Don't worry about critiquing the actual member names or even
>the minor structures I've sketched out.  The main thing is the class
>chain structure I've sketched.
>
>As you can see, you create a Source object to retreive (and optionally
>cache) the data, then you create a Decomp object to read data from the
>Source and decompress it, and finally an Archive object to parse the
>data from the Decomp object, extracting files and other things found in
>tar/rpm/deb/whatever files.
>
>The get_url_*() functions can't exist in this scheme.  They only know
>how to read files in from what I'm calling Sources.  I haven't traced
>the code out beyond the get_url_* functions to find out how the data
>within the archives is dealt with.  My idea, however, is to make all
>that code look something like this:
>
>	// Given the URL, the options the user picked, and whether
>	// we have the file locally already or not, create a Source 
>	// subclass to read the archive in.
>	Source* source = open_source(url);
>
>	Archive* arch = new Archive(Decomp::factory(source));
>	while (arch) {
>		munch on archive, update UI, spit files out to disk...
>	}
>	delete arch;	// closes cache file (if any) as well
>			// as network connections, etc.
>
>I'm leaving the issue here until I hear back from the people whose
>opinions matter.  :)  I don't want to jump in and start all this rework
>if this idea is somehow broken, or simply too grandiose w.r.t. where
>people want to see setup.exe go.
>
>I'm thinking this will take a week of ideal hacking time, which is a lot
>considering that I'm doing all this in my spare time here at work.  In
>real terms, this may take a month or more.
>--
>'Net Address: http://www.cyberport.com/~tangent/
>ICBM Address: 36.8274040 N, 108.0204086 W, alt. 1714m

-- 
cgf AT cygnus DOT com                        Red Hat, Inc.
http://sources.redhat.com/            http://www.redhat.com/