1 |
On Fri, Nov 08, 2002 at 09:03:29PM +0100, Javier Marcet wrote: |
2 |
> |
3 |
> A couple hours ago I committed a new ebuild for a fast spam filter based |
4 |
> on Bayesian filters, popular since the publication of Paul Graham's |
5 |
> paper A Plan For Spam: http://www.paulgraham.com/spam.html |
6 |
> |
7 |
> You can find the ebuild on bugzilla: |
8 |
> http://bugs.gentoo.org/show_bug.cgi?id=10435 |
9 |
|
10 |
Just my $0.02 about bogofilter.... |
11 |
|
12 |
I started using spamassassin a while back and thought it was great. |
13 |
When the "plan for spam" article came out I ignored it until someone |
14 |
suggested bogofilter. I used it a bit and was *very* dissapointed. |
15 |
Unlike SA, I had to go in and individually tell it what was spam and |
16 |
what wasn't, and it took a while to start getting them straight, due to |
17 |
it's need to create a good spamword database and "learn". |
18 |
|
19 |
I went back to spamassassin, but when I did I realized that I was |
20 |
missing the feeling of *power* that I got from being able to say "this |
21 |
is spam". Unlike bogofilter, with spamassassin you can't do anything if |
22 |
it misses a spam, or gets a false positive (never happened on any that I |
23 |
noticed, but I've heard other people have had problems). With |
24 |
bogofilter you can say "this isn't spam", or "this is spam" and know |
25 |
that that choice is reflected in the database. |
26 |
|
27 |
I did a bit of searching and found some good stuff in google here: |
28 |
http://groups.google.com/groups?q=how+to+train+bogofilter&hl=en&lr=&ie=UTF-8&selm=slrnao3uve.cdh.chisel%40chisel.herlpacker.co.uk&rnum=7 |
29 |
Basically it's the obvious answer, you have to train bogofilter with |
30 |
lots of spam, and lots of "good" email. So I sent my entire collectin |
31 |
of mail (probably around 150-200m of mail, with about 4megs of that |
32 |
being spam) and suddenly my results got a *lot* better. I haven't had a |
33 |
false positive since then, though if I send a single word email ('test') |
34 |
to myself that is considered spam still..., and any spam that does sneak |
35 |
through does so only a couple of times before it is filed where it |
36 |
belongs. |
37 |
|
38 |
The two limitations of bogofilter that need to be addressed someone are: |
39 |
1 can only "interact" with it through mutt or some custom hacking |
40 |
through other programs (I think that emacs vm can notify |
41 |
spam/non-spam with bogofilter if you know the right spell) |
42 |
2 needs enough spam and non-spam to properly set up the word lists. |
43 |
|
44 |
However, if 2 is done right, then 1 isn't a real problem. |
45 |
|
46 |
Anyway, that's my $0.02/testimonial :) Spread the love! |
47 |
|
48 |
alan |
49 |
|
50 |
|
51 |
-- |
52 |
Alan "Arcterex" <alan@×××××.org> -=][=- http://arcterex.net |
53 |
"I used to herd dairy cows. Now I herd lusers. Apart from the isolation, I |
54 |
think I preferred the cows. They were better conversation, easier to milk, and |
55 |
if they annoyed me enough, I could shoot them and eat them." -Rodger Donaldson |
56 |
|
57 |
-- |
58 |
gentoo-dev@g.o mailing list |