Mailing-List: contact cygwin-help AT sourceware DOT cygnus DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT sources DOT redhat DOT com Delivered-To: mailing list cygwin AT sources DOT redhat DOT com Subject: re: *.cwp file recognistion patch for setup From: Robert Collins To: cygwin AT cygwin DOT com Cc: Warren Young Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-+xuoZhQ9HvJz36zHwL62" X-Mailer: Evolution/0.13 (Preview Release) Date: 08 Sep 2001 23:49:07 +1000 Message-Id: <999956948.9501.93.camel@lifelesswks> Mime-Version: 1.0 X-OriginalArrivalTime: 08 Sep 2001 13:36:00.0858 (UTC) FILETIME=[32D54BA0:01C1386B] --=-+xuoZhQ9HvJz36zHwL62 Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Hi Warren,=20 thanks for submitting a patch. Nice to see some code coming in (not meaning to imply that assisting on the support lists/documenting/package management etc aren't contributions as well :}). I've got a few comments about the patch though :[. In overview, I don't thinks its ready to put into setup. It would work, but it also would become an impediment to other later changes, and thus make more work ... If you're willing to put in a little more time, I think we'd have a real good solution. I hope my critique below parses as constructive criticism - thats all it's meant as. 1) the #if 1...#else...#endif's. Either the code is good or bad - CVS is for tracking changes, not compiler directives! If your intention was to prevent clobbering the existing code, don't worry about that :]. CVS will let use recover any version, so it's best to have only the production code in the files. Under rare circumstances, such as when a feature is needed for some folk, but not others, then keeping it #ifdef'd makes sense... but then it would be usual to give it a symbolic name - ie #ifdef CWP_SUPPORT. 2) Your code is very C-like. That is, it doesn't make use of some the nice things C++ can do for us. See my suggested solution below for somewhat more detailed approach.=20 3) You've got magic numbers in the code (*groan*). By this I mean that there are constant strings/numbers (ie gz, bz2, cwp, BZh) in the code with a) no explanation (and while it's trivial code at the moment, these things have habit of growing, and before we know it they are in much less readable code :}) and b) not defined for reuse. A better way to code that sort of thing is via #defines or enums. (ie #define GZ_EXT ".gz" #define BZ2_EXT ".bz2" #define CWP_EXT ".cwp" and then use strncmp (ac, CWP_EXT, "4"). and so on. This may seem like work for the sake of work, bu I assure you it's not. Those #defines accomplish several things: the magic string tests become more clear (ie GZ_MAGIC vs \037\213; they become centralised (easily accessible list of known identifiers, even if the code is more spread out), and they can get reused, if the same test needs to be made in different contexts and a function isn't appropriate. (I'm not claiming to be guilt free on this point btw :}). 4) filename comparisons should use strncmp not strcmp on win32. tsk tsk. 5) in find_tar_ext, you shouldn't need to copy the path, strrchr is non dsestructive. 6) You have a robustness bug in the code: a cwb that is a renamed gz, not a renamed .tar.gz or .tar.bz2 will die. IIRC we have an explicit test for .gz on its own now - and that needs to be moved from choose.cc to the tar.cc code - because we are now content dependent.=20 mmm thats about it on the patch you submitted, now for how I'd suggest it's done to get the maximum benefit from the change. This approach is IMO actually easier to code and maintain, but theres a little more legwork to do to get it up and running... 1) just _ignore_ the file extention. For choose.cc accept anything from the .ini. If scanning local non-ini listed files, call something like magic_is_archive(path), which will attempt to magic_open_archive(path) and then close the resulting stream. if the open fails it would return a error.=20 2) When we go to use the file we feed the path into a thing called magic_open_archive, which we expect to return NULL (file was not an archive) or an archive object (not currently existing theoretical parent of tar object). (We don't need to create an archive class for this patch, I'm just pointing out that the next step down the rpm/deb/whatever track requires multiple archive types, but they all have the characteristic of containing multiple files, and possibly being a compressed file themselves. magic_open_archive first makes a stream from the path, and then does a magic_open_archive_stream (probably just overload magic_open_archive - it's easier to read :]). magic_open_archive_stream detects the file type by magic number (tar|gz|bz) and then=20 a) fails if it's not recognised or has too little content for magic detection. b) returns an archive object (again here that can simply be a tar object - one step at a time) if the magic number was for a tar file.=20 c) if it's a known compression type, gets a stream from it (ie calls bz() _and then feeds that back into magic_open_archive_stream_ (ie return magic_open_archive_stream(compressed_stream)). This recursion means that we support compressed archives within other compressed files - or other archives. bz() and gz() need to be tweaked to support streams as well as paths. For ease of use we probably want we probably want a magic constructor that takes a stream, checks the magic and returns a stream that handles the contents. (ie raw for unknown, gz() for gz, bz() for bz, tar for tar file.) That makes magic_open_archive_stream even easier, all it has to do is test if the streamable object is an archive, not a compressed file - so a couple of virtual functions is_compressed and is_archive will solve that.=20 This has several advantages: files that are not tar archives are detected be exclusion, not by inclusion and without reliance on knowing the extension. (This is beneficial - consider when if add ar support. We would otherwise have to alter every function that tests for .tar :-/ ) Secondly its easily expandable. The file name no longer plays a part in how we handle the files. Even setup.ini could be tweaked :]. It's probably a bit more work :] but I think that like the hierarchy and category stuff (not that they are in use yet!) it's well worth it. It will make dealing with rpms and other multi-archive formats much easier (just pass the outer stream to magic_open_archive_stream at the beginning of the embedded archive, and keep reading after that embedded archive is processed). Rob --=-+xuoZhQ9HvJz36zHwL62 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iEYEABECAAYFAjuaIdMACgkQf4izsFjHGB4a3wCguuVGprLkcD8t/lZVC6CN9vlA Xx4An0gi7WPRWPjYzHG80rc6SP2yrd1j =6Xh1 -----END PGP SIGNATURE----- --=-+xuoZhQ9HvJz36zHwL62--