Re: Bayesian Spam Filtering and Bogofilter

Replies:

Parents:

* Joseph Reagle <[email protected]> [2003-02-11 08:59-0500]
> On Tuesday 11 February 2003 01:23, Gerald Oskoboiny wrote:
> > I switched to Bogofilter (from Spamassassin) last Thursday, and
> > I'm really happy with it so far. My mail still gets labelled by
> > spamassassin on the way in, but I don't use it to decide where
> > my mail goes any more.
>
> I too no longer use spamassasin as a filtering criteria though I still run
> it to get rid of most of the horrid email on the pop server, and compare
> the results. I'm fairly happy with bogofilter though it does let some
> stupid spam through occasionally, and isn't catching the latest spams which
> is just a non-spammy natural language sentence or two and a link. (Also, it
> still misses some html mail, and I'm willing to consider that as very
> probable spam from the start.)
>
> > I decided to keep the spamassassin-labelling stuff around because
> > bogofilter can learn probability ratings for spamassassin tokens,
> > e.g. "user_agent_mutt" currently has pgood=0.007957, pbad=0.000354.
>
> What do you mean, the spam-assassin headers are still part of the email when
> you run bogofilter? That's just cheating then, right? (I originally trained
> bogofilter with the headers included and was stunned by it's performance
> when the headers were present, but not when they weren't, so that's why I :
>   sed -e "/X-KMail/d" -e "/X-Spam/d" -e "/X-Bogosity/d" -e "/  * /d" $f

Not cheating at all! This isn't a competition between SA and bogofilter; if they
bogofilter algorithms can be used to take into account features that SA detects,
so much the better.

Very loose analogy: its like a multi-layer feedforward neural network, where the earlier
layers reorganise the data and emphasise salient features in such a way as to make
it easier for later layers to do even more useful processing...

Dan

Re: Bayesian Spam Filtering and Bogofilter

Replies:

  • None.

Parents:

On Tuesday 11 February 2003 09:05, Dan Brickley wrote:
> Not cheating at all! This isn't a competition between SA and bogofilter;

Well, it is when I'm testing them. <smile/>

> if they bogofilter algorithms can be used to take into account features
> that SA detects, so much the better.

Ok, so the folks that are using bogofilter in this mode, do you find that
bogofilter is then able to correct SA's false negatives and positives, or
is it just parrotting what you would've learned from SA in the first place?
(I expect the SA+bayesian to add value, but didn't with bogofilter...)

HURL: fogo mailing list archives, maintained by Gerald Oskoboiny