Date: Sat, 24 Dec 1994 17:02:17 +0900 From: Stephen Turnbull To: djgpp AT sun DOT soe DOT clarkson DOT edu Subject: DJGPP list archives OK, the Yaseppochi-gumi DJGPP mailing list archives are now up-to-date (well, they will be until this message goes out :-). I noticed that the November archive was up to 2.2MB before gzip'ing, and the December archive to date is already 1.2MB. I don't see any reason for the level of traffic to decrease substantially over the next couple of months, except that a revised FAQ seems likely to come out shortly, which will help a little. But most of the traffic seems to be V2 or high-tech; the newbies are generally scared away by the volume, I think. This is unfortunate. Access to the archives might be a reasonable alternative to subscribing, and a way to get familiar with the list. gzip'ed, those archives are respectively 555KB and 365KB. A bit much for anyone operating over a telephone line, even at 14.4Kbps. I can see a couple of possibilities for improving this situation. But unless they can be easily automated, they're too much like work for me to be willing to do. Here's what I know how to automate, and will start doing shortly: (1) filter out subscribe and unsubscribe messages (actually, this is already done, mostly) (2) filter out nuisance headers, especially the duplicate set produced by RMail (3) filter out certain well-known Warlord-style .sigs (I'm an occasional offender myself, but I don't think you need to see them a dozen (or dozen-score) times in an archive; I'll probably put them in a separate file so you can just download the bunch once :-) I may be able to filter duplicates (certainly if they have the same message ID, as happens when I save both the direct copy to me and the listserv generated copy) also. But there aren't too many of these. I would appreciate any suggestions for (a) other easy-to-filter nuisances, either by line or by message (b) code (perl or gawk) for doing the job (c) other filtering tools (especially AI programs capable of filtering replies with 100 quoted lines and 2 lines of new content!) (d) dividing messages into files by content and *how to recognize content* (maybe threading a la newsreaders could sort of be done?) (e) dividing files into messages. Currently I plan to use ASCII FF (^L), but if there's a good reason to use something else, let me know (f) tools to use to 'grep' the archives. Currently I plan to use a batch file calling gawk but if there's a better tool that can easily be made message-oriented (I don't know how to do that with grep; perl is pretty big compared to gawk), I'd love to hear about it. Needs to be fairly newbie-transparent; this isn't for me, it's for people who would otherwise not have regexp search available. I would like to know if there is any interest in extending the archives backward by date; so far no one has requested this, but if there's a reason to do so and the process is pretty automatic, I'd do it for historical interest if nothing else. Let me know about anything else that might make the archives more useful. Don't hesitate to suggest things that look burdensome; if I don't like it, I'll just ignore it ;-) --Steve