1 |
Spam is (I assume) a problem not only for the gentoo-dev mailing list but |
2 |
also for all other gentoo lists. Closing gentoo-dev doesn't solve the |
3 |
problem. The other lists will still get spam. |
4 |
Should the other lists also be closed? (Closing gentoo-newbies is probably a |
5 |
bad idea.) The solution IMHO is a filter on the gentoo mailserver. |
6 |
|
7 |
A lot of people mention Spamassassin. There is a world of difference between |
8 |
Spamassassin and a Bayesian filter like Bogofilter. |
9 |
|
10 |
Spamassassin is written in Perl. Bogofilter is written in C. |
11 |
Spamassassin uses header analysis, text analysis, blacklists, Razor (a spam |
12 |
tracking database) |
13 |
Bogofilter uses word count. Which one do you think is faster? |
14 |
|
15 |
I've learnt about two great ideas this year. Gentoo (the idea of a ports |
16 |
sytem) and Bayesian Spam filters. |
17 |
|
18 |
If you wan't to feel a happy warm glowing feeling for the rest of the day go |
19 |
and read http://www.paulgraham.com/spam.html |
20 |
From Grahams article |
21 |
|
22 |
"To the recipient, spam is easily recognizable. If you hired someone to read |
23 |
your mail and discard the spam, they would have little trouble doing it. How |
24 |
much do we have to do, short of AI, to automate this process? |
25 |
|
26 |
I think we will be able to solve the problem with fairly simple algorithms. |
27 |
In fact, I've found that you can filter present-day spam acceptably well |
28 |
using nothing more than a Bayesian combination of the spam probabilities of |
29 |
individual words. Using a slightly tweaked (as described below) Bayesian |
30 |
filter, we now miss less than 5 per 1000 spams, with 0 false positives." |
31 |
|
32 |
/Henrik Treadup |
33 |
hetr9922@××××××××××.se |
34 |
|
35 |
PS. I have yet to see a spam email on this list that would have gotten |
36 |
through a bayesian filter. |