* Joseph Reagle <
[email protected]> [2003-02-11 08:59-0500]
> On Tuesday 11 February 2003 01:23, Gerald Oskoboiny wrote:
> > I switched to Bogofilter (from Spamassassin) last Thursday, and
> > I'm really happy with it so far. My mail still gets labelled by
> > spamassassin on the way in, but I don't use it to decide where
> > my mail goes any more.
>
> I too no longer use spamassasin as a filtering criteria though I still run
> it to get rid of most of the horrid email on the pop server, and compare
> the results. I'm fairly happy with bogofilter though it does let some
> stupid spam through occasionally, and isn't catching the latest spams which
> is just a non-spammy natural language sentence or two and a link. (Also, it
> still misses some html mail, and I'm willing to consider that as very
> probable spam from the start.)
>
> > I decided to keep the spamassassin-labelling stuff around because
> > bogofilter can learn probability ratings for spamassassin tokens,
> > e.g. "user_agent_mutt" currently has pgood=0.007957, pbad=0.000354.
>
> What do you mean, the spam-assassin headers are still part of the email when
> you run bogofilter? That's just cheating then, right? (I originally trained
> bogofilter with the headers included and was stunned by it's performance
> when the headers were present, but not when they weren't, so that's why I :
> sed -e "/X-KMail/d" -e "/X-Spam/d" -e "/X-Bogosity/d" -e "/ * /d" $f
Not cheating at all! This isn't a competition between SA and bogofilter; if they
bogofilter algorithms can be used to take into account features that SA detects,
so much the better.
Very loose analogy: its like a multi-layer feedforward neural network, where the earlier
layers reorganise the data and emphasise salient features in such a way as to make
it easier for later layers to do even more useful processing...
Dan