Re: Bayesian Spam Filtering and Bogofilter

Replies:

Parents:

On Tuesday 11 February 2003 01:23, Gerald Oskoboiny wrote:
> I switched to Bogofilter (from Spamassassin) last Thursday, and
> I'm really happy with it so far. My mail still gets labelled by
> spamassassin on the way in, but I don't use it to decide where
> my mail goes any more.

I too no longer use spamassasin as a filtering criteria though I still run
it to get rid of most of the horrid email on the pop server, and compare
the results. I'm fairly happy with bogofilter though it does let some
stupid spam through occasionally, and isn't catching the latest spams which
is just a non-spammy natural language sentence or two and a link. (Also, it
still misses some html mail, and I'm willing to consider that as very
probable spam from the start.)

> I decided to keep the spamassassin-labelling stuff around because
> bogofilter can learn probability ratings for spamassassin tokens,
> e.g. "user_agent_mutt" currently has pgood=0.007957, pbad=0.000354.

What do you mean, the spam-assassin headers are still part of the email when
you run bogofilter? That's just cheating then, right? (I originally trained
bogofilter with the headers included and was stunned by it's performance
when the headers were present, but not when they weren't, so that's why I :
 sed -e "/X-KMail/d" -e "/X-Spam/d" -e "/X-Bogosity/d" -e "/  * /d" $f

But, the combination of SA's features/heuristics and Bayesian filtering will
ruck and I'm looking forward to playing with that feature in the new
version of SA.

Re: Bayesian Spam Filtering and Bogofilter

Replies:

  • None.

Parents:

On Tuesday 11 February 2003 09:05, Dan Brickley wrote:
> Not cheating at all! This isn't a competition between SA and bogofilter;

Well, it is when I'm testing them. <smile/>

> if they bogofilter algorithms can be used to take into account features
> that SA detects, so much the better.

Ok, so the folks that are using bogofilter in this mode, do you find that
bogofilter is then able to correct SA's false negatives and positives, or
is it just parrotting what you would've learned from SA in the first place?
(I expect the SA+bayesian to add value, but didn't with bogofilter...)

Re: Bayesian Spam Filtering and Bogofilter

Replies:

Parents:

* Joseph Reagle <[email protected]> [2003-02-11 08:59-0500]
> On Tuesday 11 February 2003 01:23, Gerald Oskoboiny wrote:
> > I switched to Bogofilter (from Spamassassin) last Thursday, and
> > I'm really happy with it so far. My mail still gets labelled by
> > spamassassin on the way in, but I don't use it to decide where
> > my mail goes any more.
>
> I too no longer use spamassasin as a filtering criteria though I still run
> it to get rid of most of the horrid email on the pop server, and compare
> the results. I'm fairly happy with bogofilter though it does let some
> stupid spam through occasionally, and isn't catching the latest spams which
> is just a non-spammy natural language sentence or two and a link. (Also, it
> still misses some html mail, and I'm willing to consider that as very
> probable spam from the start.)
>
> > I decided to keep the spamassassin-labelling stuff around because
> > bogofilter can learn probability ratings for spamassassin tokens,
> > e.g. "user_agent_mutt" currently has pgood=0.007957, pbad=0.000354.
>
> What do you mean, the spam-assassin headers are still part of the email when
> you run bogofilter? That's just cheating then, right? (I originally trained
> bogofilter with the headers included and was stunned by it's performance
> when the headers were present, but not when they weren't, so that's why I :
>   sed -e "/X-KMail/d" -e "/X-Spam/d" -e "/X-Bogosity/d" -e "/  * /d" $f

Not cheating at all! This isn't a competition between SA and bogofilter; if they
bogofilter algorithms can be used to take into account features that SA detects,
so much the better.

Very loose analogy: its like a multi-layer feedforward neural network, where the earlier
layers reorganise the data and emphasise salient features in such a way as to make
it easier for later layers to do even more useful processing...

Dan

Re: Bayesian Spam Filtering and Bogofilter

Replies:

  • None.

Parents:

Joseph Reagle <[email protected]> writes:

> I'm fairly happy with bogofilter though it does let some stupid spam
> through occasionally, and isn't catching the latest spams which is
> just a non-spammy natural language sentence or two and a
> link.

Those are the ones that bug me the most, and don't know what to do
about them.  

I'm still SA and going to add bogofilter on my mail server.  My plan
is to do all filtering on the mail server, using it's cycles and
having more of my mail processing take place before my mail comes to
me.  For training purposes I'll resend a message to an alias on the
mail server which will procmail into bogofilter.  I'll probably have
the procmail recipe (at least for training on non-spam) look at other
headers (Received, MUA, etc.) to avoid outside influences from
tainting.

Actually I think the known sold (eg [email protected]) and harvested
addresses I will just send to the spam alias bucket.  Might even make
some honey pots for autotraining in this manner.

> (Also, it still misses some html mail, and I'm willing to consider
> that as very probable spam from the start.)

Ditto.

--
Ted Guild <[email protected]>
http://www.guilds.net

HURL: fogo mailing list archives, maintained by Gerald Oskoboiny