RE: notes on SMTP-time spamassassin rejections

from "David A. Jones" <David.A.Jones@sympatico.ca>, Tue, 8 Jun 2004 15:21:07 -0700

Replies:

None.

Parents:

gerald

impressive! :-) he he he

ok, but if a message comes in with bogus address and legit address this
should also lead you to believe that this is a bogus email. for example:

to: Gerald@impressive.net, rald@impressive.net,
invalidaddress@impressive.net
from: spammaster@spamcity.bastards.org
subject: penis vagina penis vagina

crap crap crap crap

now say the spamcity.bastards.org is a legit address with MX records and
such. The To address is legit also but becuase it's accompanied with invalid
addresses you own shouldn't this also be rejected? Is that what the
honeypots do? is look at then entire SMTP and BSMTP headers?

Cheers,

David.

-----Original Message-----
From: fogo-admin@impressive.net [mailto:fogo-admin@impressive.net]On
Behalf Of Gerald Oskoboiny
Sent: Tuesday, June 08, 2004 11:15 AM
To: fogo@impressive.net
Subject: Re: notes on SMTP-time spamassassin rejections

* Gerald Oskoboiny <gerald@impressive.net> [2004-06-01 13:01-0400]

> The spamassassin that runs at SMTP time is a generic one that
> doesn't learn over time because it doesn't have a bayes DB that
> it can write to (because it runs as user nobody), so it is much
> less effective than it could be.
>
> I had planned to figure out how to set up a bogus user with a
> bayes DB that I could train over time, but it seems tricky to do
> that with exiscan-acl so maybe I should just configure SA on
> mr-burns to use my personal bayes DB.

I did this, and started training spamassassin on any spam it
misses (maybe 5-10/day), set up a few honeypots (notes below),
and wow, what a huge improvement.

My spam intake has dropped to 1998 levels. It's actually eerily
quiet. I'm worried that I must be rejecting too much stuff, but
can't find any evidence of legit mail being blocked.

To set up honeypot addresses, I checked for the most common
unrouteable addresses in exim's rejectlog (somehow a bunch of
bogus addrs got onto spammer's lists, usually truncated versions
of real addresses, e.g. rald@impressive.net) and turned those
into aliases for a new user I created:

# spam honeypots (most common unrouteable addrs in rejectlog)
rald: spam-honeypot
ald: spam-honeypot
...

(I could have also just created a bunch of fake addrs and put
those on my web site to be crawled by email harvesting bots, but
might as well use addresses that were already known to spammers.)

The 'spam-honeypot' user has the same uid as gerald so it can
write to my bayes DB, and it feeds all its non-daemon mail into
sa-learn using a procmailrc like this:

# procmailrc for spam-honeypot user: feed all mail into sa-learn --spam

PATH=$HOME/bin:/usr/bin:/bin:/usr/local/bin

:0:
* ^FROM_DAEMON
from-daemon

:0c
| sa-learn --spam

:0:
sa-learned-spam

and ~spam-honeypot/.spamassassin is symlinked to ~gerald/.spamassassin

(spam-honeypot etc above are actually called something else; I
didn't put the real names here because I don't want spammers
finding out the names of my honeypots and poisoning them with
legitimate mail.)

--
Gerald Oskoboiny <gerald@impressive.net>
http://impressive.net/people/gerald/