Gentoo Archives: gentoo-dev

From: Henrik Treadup <hetr9922@××××××××××.se>
To: gentoo-dev@g.o
Subject: [gentoo-dev] Spam filters
Date: Fri, 27 Sep 2002 07:28:10
Message-Id: 20020927122808.31B2EA30B0@mbox1.su.se
1 Spam is (I assume) a problem not only for the gentoo-dev mailing list but
2 also for all other gentoo lists. Closing gentoo-dev doesn't solve the
3 problem. The other lists will still get spam.
4 Should the other lists also be closed? (Closing gentoo-newbies is probably a
5 bad idea.) The solution IMHO is a filter on the gentoo mailserver.
6
7 A lot of people mention Spamassassin. There is a world of difference between
8 Spamassassin and a Bayesian filter like Bogofilter.
9
10 Spamassassin is written in Perl. Bogofilter is written in C.
11 Spamassassin uses header analysis, text analysis, blacklists, Razor (a spam
12 tracking database)
13 Bogofilter uses word count. Which one do you think is faster?
14
15 I've learnt about two great ideas this year. Gentoo (the idea of a ports
16 sytem) and Bayesian Spam filters.
17
18 If you wan't to feel a happy warm glowing feeling for the rest of the day go
19 and read http://www.paulgraham.com/spam.html
20 From Grahams article
21
22 "To the recipient, spam is easily recognizable. If you hired someone to read
23 your mail and discard the spam, they would have little trouble doing it. How
24 much do we have to do, short of AI, to automate this process?
25
26 I think we will be able to solve the problem with fairly simple algorithms.
27 In fact, I've found that you can filter present-day spam acceptably well
28 using nothing more than a Bayesian combination of the spam probabilities of
29 individual words. Using a slightly tweaked (as described below) Bayesian
30 filter, we now miss less than 5 per 1000 spams, with 0 false positives."
31
32 /Henrik Treadup
33 hetr9922@××××××××××.se
34
35 PS. I have yet to see a spam email on this list that would have gotten
36 through a bayesian filter.

Replies

Subject Author
Re: [gentoo-dev] Spam filters Burton Samograd <kruhft@×××××××××××××.org>