I get a lot of spam email. In the first half of December 2000, I received an average of more than 36 spam messages per day, out of 384 total messages per day.
I have tried various ways of filtering it in the past, and finally decided the best way to do it is to use whitelist-based filtering.
Most spam filtering systems use blacklists, where mail from a certain list of email addresses or matching a certain list of text patterns is rejected or otherwise filtered. These lists take a lot of time and effort to maintain, and in the end still don't work very well.
The way whitelist-based filtering works is: you create a list of addresses of people you expect to receive mail from, and filter anything that is not from them into a separate low-priority mailbox that you check once a week or once a month or something.
There are other features that could be implemented as well, such as sending a reply to unknown recipients automatically to notify them that you might not read their mail for a while, or possibly asking them to verify that the message was sent by a human (in which case it would then be delivered directly to your inbox.)
I implemented a simple whitelist-based filtering system using procmail, described below.
To create my initial whitelist, I started with a list of staff addresses that I happened to have laying around, then added a bunch of others to it using my current mailboxes (which have already had spam purged from them manually) and a shell script I wrote called atw (add to whitelist). This script takes an email message or mailbox as input, and writes the email address from each From: line to my whitelist if it isn't already there.
So the specific commands I used were something like:
touch .whitelist atw < $MAIL atw < mail/w3c/inbox # (repeat with any other mailboxes you have that don't have spam)
Then I added the following to my .procmailrc, near the very bottom after all the other filters I have for filtering mail from mailing lists etc.:
# do filtering against whitelists FROM=`formail -XFrom: | formail -r -xTo: | tr -d ' '` :0 * ! ? grep -F -i -x -q "$FROM" $HOME/.whitelist mail/unknown-sender
This causes any mail from unknown senders to be filtered into a mailbox called 'unknown-sender'.
Then, once a week or so I can check my unknown-sender mailbox
for non-spam, and if I find some I can add the previously
unknown sender to my whitelist by typing |atw
in my mail
client, Mutt. (|atw
pipes
the current message to the atw shell script;
this also works in elm and Pine.)
With Mutt, I can also tag
multiple messages and then press ";" to invoke Mutt's tag-pipe
function, where it pipes each tagged message to the specified
command. So the full key sequence in Mutt would be ;|atw
(Mutt rules!)
Update, May 2002: I am fairly happy with the approach above, but since I am being deluged in spam at my work address where I regularly need to see email from people I don't know yet, I decided to give spamassassin a try. It is fantastic; I highly recommend it. Although I don't like heuristic-style filtering in general, spamassassin works well because it has a huge list of filters and uses a score-based approach. See also: Hugo's notes on spamassassin, integration with whitelists, Razor, etc.
Update, Oct 2003: I now receive about 500 spams/day, out of 700 total messages/day. Almost all the spam is trapped by spamassassin.
Update, Jan 2004: started rejecting most spam at SMTP time with Exim and spamassassin. (see config info)
Update, May 2004: started rejecting forged messages with Exim and SPF.
Update, June 2004: started auto-training spamassassin using spam honeypots as well as training it manually on any spam it misses; huge improvement!
Update, July 2004: reduced my threshold for spam rejection to 6.0
Update, August 2005: I am still delighted with my spam rejection setup; almost all spam is rejected at SMTP time (400+ msgs/day), with only a couple known false positives in the last year.
(some of this stuff is not related to my whitelist setup, but this is the only page I have on spam filtering/blocking so I put other stuff here for now as well)
Last modified: $Date: 2008/07/04 05:55:53 $
Gerald Oskoboiny, <gerald@impressive.net>