Gentoo Archives: gentoo-dev

From: Alan <alan@×××××.org>
To: gentoo-dev@g.o, gentoo-user@g.o
Subject: Re: [gentoo-dev] [ANNOUNCE] bayesiam SPAM filter 'bogofilter' ebuild on bugzilla
Date: Fri, 08 Nov 2002 20:23:46
Message-Id: 20021108202257.GD24287@ufies.org
In Reply to: [gentoo-dev] [ANNOUNCE] bayesiam SPAM filter 'bogofilter' ebuild on bugzilla by Javier Marcet
1 On Fri, Nov 08, 2002 at 09:03:29PM +0100, Javier Marcet wrote:
2 >
3 > A couple hours ago I committed a new ebuild for a fast spam filter based
4 > on Bayesian filters, popular since the publication of Paul Graham's
5 > paper A Plan For Spam: http://www.paulgraham.com/spam.html
6 >
7 > You can find the ebuild on bugzilla:
8 > http://bugs.gentoo.org/show_bug.cgi?id=10435
9
10 Just my $0.02 about bogofilter....
11
12 I started using spamassassin a while back and thought it was great.
13 When the "plan for spam" article came out I ignored it until someone
14 suggested bogofilter. I used it a bit and was *very* dissapointed.
15 Unlike SA, I had to go in and individually tell it what was spam and
16 what wasn't, and it took a while to start getting them straight, due to
17 it's need to create a good spamword database and "learn".
18
19 I went back to spamassassin, but when I did I realized that I was
20 missing the feeling of *power* that I got from being able to say "this
21 is spam". Unlike bogofilter, with spamassassin you can't do anything if
22 it misses a spam, or gets a false positive (never happened on any that I
23 noticed, but I've heard other people have had problems). With
24 bogofilter you can say "this isn't spam", or "this is spam" and know
25 that that choice is reflected in the database.
26
27 I did a bit of searching and found some good stuff in google here:
28 http://groups.google.com/groups?q=how+to+train+bogofilter&hl=en&lr=&ie=UTF-8&selm=slrnao3uve.cdh.chisel%40chisel.herlpacker.co.uk&rnum=7
29 Basically it's the obvious answer, you have to train bogofilter with
30 lots of spam, and lots of "good" email. So I sent my entire collectin
31 of mail (probably around 150-200m of mail, with about 4megs of that
32 being spam) and suddenly my results got a *lot* better. I haven't had a
33 false positive since then, though if I send a single word email ('test')
34 to myself that is considered spam still..., and any spam that does sneak
35 through does so only a couple of times before it is filed where it
36 belongs.
37
38 The two limitations of bogofilter that need to be addressed someone are:
39 1 can only "interact" with it through mutt or some custom hacking
40 through other programs (I think that emacs vm can notify
41 spam/non-spam with bogofilter if you know the right spell)
42 2 needs enough spam and non-spam to properly set up the word lists.
43
44 However, if 2 is done right, then 1 isn't a real problem.
45
46 Anyway, that's my $0.02/testimonial :) Spread the love!
47
48 alan
49
50
51 --
52 Alan "Arcterex" <alan@×××××.org> -=][=- http://arcterex.net
53 "I used to herd dairy cows. Now I herd lusers. Apart from the isolation, I
54 think I preferred the cows. They were better conversation, easier to milk, and
55 if they annoyed me enough, I could shoot them and eat them." -Rodger Donaldson
56
57 --
58 gentoo-dev@g.o mailing list

Replies