Date: Tue, 7 Mar 2000 10:49:51 +0200 (IST) From: Eli Zaretskii X-Sender: eliz AT is To: Alain Magloire cc: Nate Eldredge , djgpp-workers AT delorie DOT com Subject: Re: DJGPP innovations ????? In-Reply-To: <200003070250.VAA31684@qnx.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Reply-To: djgpp-workers AT delorie DOT com Errors-To: dj-admin AT delorie DOT com X-Mailing-List: djgpp-workers AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk On Mon, 6 Mar 2000, Alain Magloire wrote: > > Running a program in background is easy, but finding files that are > > identical (efficiently) is not. > > find $1 -xdev -type f -printf '%p %s\n' | \ > sort -nk1 | tee candidates | \ > uniq -f1 >uniquefiles && \ > comm -3 candidates uniquefiles >redundant && \ > join -1 2 -2 2 -o 2.1 1.1 redundant uniquefiles | xargs -n2 ln -f I'm probably missing something: the above doesn't seem to compare file's contents, only their names and sizes, right? If so, this is not what I think was the intent: identical names and size does not mean the files' contents are identical. You need `cmp' somewhere in that pipe. When I said ``efficiently'', I thought about efficient comparison of file contents that would avoid the quadratic behavior.