Mail Archives: djgpp-workers/2000/03/07/04:38:30
On Mon, 6 Mar 2000, Alain Magloire wrote:
> > Running a program in background is easy, but finding files that are
> > identical (efficiently) is not.
>
> find $1 -xdev -type f -printf '%p %s\n' | \
> sort -nk1 | tee candidates | \
> uniq -f1 >uniquefiles && \
> comm -3 candidates uniquefiles >redundant && \
> join -1 2 -2 2 -o 2.1 1.1 redundant uniquefiles | xargs -n2 ln -f
I'm probably missing something: the above doesn't seem to compare
file's contents, only their names and sizes, right? If so, this is
not what I think was the intent: identical names and size does not
mean the files' contents are identical. You need `cmp' somewhere in
that pipe.
When I said ``efficiently'', I thought about efficient comparison of
file contents that would avoid the quadratic behavior.
- Raw text -