Mail Archives: djgpp/1995/01/09/06:35:32
Two announcements concerning the Yaseppochi-gumi archive:
(1) I am making Eli Zaretskii's beta FAQ available. I am adding a
search capability for it.
(2) I now believe that the *.stripped.gz files now have no loss of
content. From my announcement:
One caveat: to speed up searches, I have stripped duplicate headers
generated by RMail and nuisance headers (such as "Reply-to:" and
"Received:") from the archives. However, the reduction in size of the
*.gz files is suspiciously large, [...]
I was right. This has been fixed; the *.stripped.gz are now
substantially larger in several cases (you can look at the DU-sorted
file on my server---of course I used the "-s" option; before comes
first). Some are smaller because I added "X400-[-A-Za-z]+:" to the
list of nuisance headers.
Some FAQs on the archive search (well, these are the *only* questions
I've got so far, so they're the most F-lyAQs ;)
That is probably just the Received headers. Your message came here
as seen below, so you can see that you should expect some shrinking.
I deleted the appended "Received:" headers, we all know what they look
like---and if we don't, we don't want to. That's why I filter them.
(Well, much more important, it substantially speeds up the greps.)
As Bob Babcock (I think it was) pointed out, if stuff like "Received:"
headers can make the *.gz files balloon (in one case, to 3 times the
size!), gzip ain't on the job. In fact, when I filtered my own .sig,
I was stripping large amounts of content from a couple of files. This
is due to that fact that the "last-line-of-my-sig" regexp didn't catch
some variant .sigs I use ;-) I don't use my .sig all the time (that's
why whole files didn't disappear).
Also, I just tried searching for "unsubscibe" and came up with way
^
|
typo ;-) ---------------------------------+
to little text. I don't know why.
These are my personal received-mail files---my left middle finger sits
on the 'd' key just to filter "unsubscribe". I will eventually use
the Clarkson archives, but for the moment I'm using my personal stuff
as it's more easily available to me right at the moment (my Clarkson
copy is about 4 months old and offline). When I *do* get the Clarkson
archives, I will filter all messages with less than 4 lines content
containing "subscribe", "add", or "delete" ;-)
I hope there's nothing too mortifyingly personal or insulting in
there ;-)
--Steve <turnbull AT shako DOT sk DOT tsukuba DOT ac DOT jp>
- Raw text -