Re: Bayesian Spam Filtering and Bogofilter

Replies:

  • None.

Parents:

* Dean Jackson <[email protected]> [2003-01-06 15:33+1100]
> Joseph has already answered no, and I agree with him.

So, having just done it, the answer is indeed no if you have a decent
system to install bogofilter on.

On tux, I had to compile libdb4 beforehand, which took me a while.
Anyway, I have succeeded.

> I just ran bogofilter over my spambox, then over my
> inbox (and a few other spam-free mail files) and it was
> done. Then I set up mutt keystrokes to train bogofilter
> (although I've never used them - it's easier to do as
> Joseph recommends, a monthly/weekly training).

Here is my Mutt setup (with my muttrc cpp-processing[1]):

 # Bogofilter
 # Show spam headers
 unignore X-Bogosity
 #define BOGOFILTER_NONSPAM "|bogofilter -n\n"
 #define BOGOFILTER_SPAM "|bogofilter -s\n"
 #define SYNCHRONIZE_BOGOFILTER "!unison bogofilter\n"
 macro index \eS BOGOFILTER_SPAM "Declare to bogofilter as spam"
 macro pager \eS BOGOFILTER_SPAM "Declare to bogofilter as spam"
 macro index \eN BOGOFILTER_NONSPAM "Declare to bogofilter as non-spam"
 macro pager \eN BOGOFILTER_NONSPAM "Declare to bogofilter as non-spam"
 macro index \eU SYNCHRONIZE_BOGOFILTER "Synchronize bogofilter databases"

More on Unison synchronization further.
 
> The way I use bogofilter is slightly different from Joseph, and
> is probably obvious enough that it isn't worth polluting your
> email with, but here goes! I run bogofilter on the server with
> spamassassin. I send anything with two "Yes" votes into spambox,
> and anything with one "Yes" and one "No" to a maybe box.
[..]
> > - since I am anal, instead of just letting it live its life and don't
> >   care about my spam like I do with SpamAssassin, I was going to spend
> >   my time training it to make it better, and better, and better!
>
> Drugs may help with this problem.

Well, I decided to keep away from drugs, for now at least, and used
your "maybe folder" technique to train bogofilter.

I will train bogofilter with what is in this folder, after having fed
it with the content of my private folder for good vocabulary. I expect
to have to do a fair bit of work at the beginning, but I am sure it
will decrease fairly rapidly, and when I am satisfied by it, I will
probably stop using SpamAssassin.

Regarding the Unison[2] synchronization, I have several copies of my
mail thanks to isync[3], which means that I want to do my bogofilter
filtering locally, and then propagate the changes on my mail server
where procmail does its magic. I use Unison to do so.

One gotcha: I *think* that libdb3 and libdb4 don't use the same
format, or so it seems when I did a few tests. I have compiled mine
with libdb4 since Sarge's bogofilter uses libdb4. However, Woody only
has libdb3, so you will need to compile libdb4 too.

 1. http://larve.net/people/hugo/2002/04/mutt-cpp
 2. http://www.cis.upenn.edu/~bcpierce/unison/
 3. http://www.cs.hmc.edu/~me/isync/
--
Hugo Haas - http://larve.net/people/hugo/

HURL: fogo mailing list archives, maintained by Gerald Oskoboiny