Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Message-ID: <4070DA67.5090306@att.net> Date: Mon, 05 Apr 2004 00:02:47 -0400 From: David Fritz User-Agent: Mozilla Thunderbird 0.5 (Windows/20040207) MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: Bogus assumption prevents d2u/u2d/conv/etal working on mixed files. References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes You guys are missing the point. Charles Wilson mentioned a side effect of the code at issue in the original post and suggested that it was valuable. Personally, I don't care if they attempt to detect binary files or not. My point was (and is) that: *If detection of binary files is desirable*, then why not implement it in a more robust manner and inform the user rather than silently skipping "binary" files. Hannu E K Nevalainen wrote: >>From: David Fritz >>Sent: Sunday, April 04, 2004 6:46 AM > > >>Charles Wilson wrote: >>[...] >> >>> (2) it's an attempt to prevent users from permanently >> >>scrogging binary >> >>>files. See: d2u, on a binary file, is an irreversible operation. So, >>>if you do "d2u *" you'll probably kill something deep inside >> >>some binary >> >>>file, and you can't fix it -- unless some minimal safeguards >> >>are in place. >> >>> u2d MAY be reversible -- IF there were no pre-exising \r\n >>>combinations in the file to begin with -- so when (OMG-fixit-)d2u is >>>run, obviously the first '\n' is preceeded by a (newly-added) >> >>'\r\n', so >> >>>the prog merrily replaces ALL '\r\n' with '\n'...which MAY fix your >>>oops, but maybe not. >>> >>> >>>So, with the current code, if you snarf the first "line" -- all chars >>>until the first '\n' -- if it's a binary file the odds are pretty low >>>that the immediately-preceeding character is a '\r' -- so d2u as >>>currently coded will bail out, and no harm is done. >>> >>>It doesn't work so well in the other direction -- by the same logic >>>above, you'll almost never bail out early if you run 'u2d' on a binary >>>file -- but if you immediately do a 'd2u' you MIGHT be able to recover.) >>> >> >>[...] >> >>If detection of binary files is desirable, why not use an >>explicit test with a >>more robust methodology? GNU grep detects binary files by >>looking for a '\0' >>byte. Such a test could be used by both d2u and u2d; they could >>bail out with a >>message like "skipping binary file". >> >>Cheers > > > A more "foolproof" (? does such a thing exist) test would be to disallow > using d2u/u2d on anything in directories found in $PATH. But then that one > has its disadvantages too, but less so IMO. > > I find all this "safety" related stuff be a PITA at times. Any kind of test > is prone to fail at some instances; at other instances just a cause for > confusion most of the time -> a lot of bug-hunting - for so little gain. > > How about running d2u/u2d, say, on a regedit 5 file (ie; mostly ascii but > due to the coding every other character is a NUL)? > Would that be considered "legal"? IMO it should, a fast and easy way to > strip the garbage - to create a file that can be used with normal tools. > Huh? u2d/d2u will not strip the "garbage". For that use iconv; as in, $ iconv -f UTF-16LE -t UTF-8 < in > out > IMO; stay away from all of this safety thingies, at _LEAST_ allow them to > be bystepped; e.g. --force. I will be using that switch all the time. > > There are a lot of these foolhardy "traps" one can fall into; e.g: > $ cd /;rm -rf * > are you gonna find a "safety" hatch for that too? > > > Noo... Please, remove all of these safety checks. > There must be some kind of user sanity presupposition. Or else the tools > soon will be crippled to a state where they are unusable for normal work. > > Make Backups, Not War! -> MBNW! ;-P > OLOCA? [...] Cheers -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/