Whitelist-based spam filtering

by Gerald Oskoboiny


Photo: 19th-century spam fighting technology

19th-century spam fighting technology

I get a lot of spam email. In the first half of December 2000, I received an average of more than 36 spam messages per day, out of 384 total messages per day.

I have tried various ways of filtering it in the past, and finally decided the best way to do it is to use whitelist-based filtering.

Most spam filtering systems use blacklists, where mail from a certain list of email addresses or matching a certain list of text patterns is rejected or otherwise filtered. These lists take a lot of time and effort to maintain, and in the end still don't work very well.

The way whitelist-based filtering works is: you create a list of addresses of people you expect to receive mail from, and filter anything that is not from them into a separate low-priority mailbox that you check once a week or once a month or something.

There are other features that could be implemented as well, such as sending a reply to unknown recipients automatically to notify them that you might not read their mail for a while, or possibly asking them to verify that the message was sent by a human (in which case it would then be delivered directly to your inbox.)

Implementation using procmail

I implemented a simple whitelist-based filtering system using procmail, described below.

To create my initial whitelist, I started with a list of staff addresses that I happened to have laying around, then added a bunch of others to it using my current mailboxes (which have already had spam purged from them manually) and a shell script I wrote called atw (add to whitelist). This script takes an email message or mailbox as input, and writes the email address from each From: line to my whitelist if it isn't already there.

So the specific commands I used were something like:

    touch .whitelist
    atw < $MAIL
    atw < mail/w3c/inbox
    # (repeat with any other mailboxes you have that don't have spam)

Then I added the following to my .procmailrc, near the very bottom after all the other filters I have for filtering mail from mailing lists etc.:

    # do filtering against whitelists
    FROM=`formail -XFrom: | formail -r -xTo: | tr -d ' '`
    :0
    * ! ? grep -F -i -x -q "$FROM" $HOME/.whitelist
    mail/unknown-sender

This causes any mail from unknown senders to be filtered into a mailbox called 'unknown-sender'.

Then, once a week or so I can check my unknown-sender mailbox for non-spam, and if I find some I can add the previously unknown sender to my whitelist by typing |atw in my mail client, Mutt. (|atw pipes the current message to the atw shell script; this also works in elm and Pine.)

With Mutt, I can also tag multiple messages and then press ";" to invoke Mutt's tag-pipe function, where it pipes each tagged message to the specified command. So the full key sequence in Mutt would be ;|atw (Mutt rules!)

Update, May 2002: I am fairly happy with the approach above, but since I am being deluged in spam at my work address where I regularly need to see email from people I don't know yet, I decided to give spamassassin a try. It is fantastic; I highly recommend it. Although I don't like heuristic-style filtering in general, spamassassin works well because it has a huge list of filters and uses a score-based approach. See also: Hugo's notes on spamassassin, integration with whitelists, Razor, etc.

Update, Oct 2003: I now receive about 500 spams/day, out of 700 total messages/day. Almost all the spam is trapped by spamassassin.

Update, Jan 2004: started rejecting most spam at SMTP time with Exim and spamassassin. (see config info)

Update, May 2004: started rejecting forged messages with Exim and SPF.

Update, June 2004: started auto-training spamassassin using spam honeypots as well as training it manually on any spam it misses; huge improvement!

Update, July 2004: reduced my threshold for spam rejection to 6.0

Update, August 2005: I am still delighted with my spam rejection setup; almost all spam is rejected at SMTP time (400+ msgs/day), with only a couple known false positives in the last year.

(some of this stuff is not related to my whitelist setup, but this is the only page I have on spam filtering/blocking so I put other stuff here for now as well)

Todo

Related stuff


Valid HTML 4.0! Last modified: $Date: 2008/07/04 05:55:53 $
Gerald Oskoboiny, <[email protected]>