Mail Archives: djgpp-workers/2000/03/07/04:38:30
On Mon, 6 Mar 2000, Alain Magloire wrote:
> > Running a program in background is easy, but finding files that are
> > identical (efficiently) is not.
> 
> find $1 -xdev -type f -printf '%p %s\n' | \
>  sort -nk1 | tee candidates | \
>  uniq -f1 >uniquefiles && \
>  comm -3 candidates uniquefiles >redundant && \
>  join -1 2 -2 2 -o 2.1 1.1 redundant uniquefiles | xargs -n2 ln -f  
I'm probably missing something: the above doesn't seem to compare
file's contents, only their names and sizes, right?  If so, this is
not what I think was the intent: identical names and size does not
mean the files' contents are identical.  You need `cmp' somewhere in
that pipe.
When I said ``efficiently'', I thought about efficient comparison of
file contents that would avoid the quadratic behavior.
- Raw text -