Re: Spam filters

by Hugo Haas <hugo@larve.net>

 Date:  Mon, 15 Apr 2002 18:13:01 -0400
 To:  fogo@impressive.net
 References:  hugo gerald hugo2 gerald2 gerald3 hugo3
 Replies:  gerald4 hugo4 hugo5
* Hugo Haas <hugo@larve.net> [2001-01-27 19:49-0500]
> I have finally switched from Junkfilter[1] to whitelist based
> filtering.
[..]

* Hugo Haas <hugo@larve.net> [2002-04-03 11:10-0500]
[..]
> It seems that there is no immediate nor easy technological answer, and
> no easy legal action either.

I have changed my spam filtering techniques taking into account the
new type of spam. I talked to Max who started using SpamAssassin[2]
and was happy about it. I had a look and found it cool. But I didn't
want to abandon my whitelist filtering.

I therefore am using 3 different folders:
- emails identified as spam.
- emails not identified as spam from people I know (who are on my
  whitelist).
- emails not identified as spam from people I don't know.

SpamAssassin works with a scoring system. I use my whitelist to
decrease the score when somebody is on my whitelist. It is therefore
easier to be considered as a spammer if the address in not on my
whitelist.

I have also enabled Vipul's Razor[3] for increasing my detection
accuracy. When I detect spam which isn't registered in Razor, I do so.

Here is what it looks like:

-*- Promailrc
=============

Whitelist detection:

  # White-base filtering
  WHITELIST_DIR=$HOME/whitelist
  WHITELIST=$WHITELIST_DIR/whitelist
  ffield=`formail -XFrom: | formail -r -xTo: | tr -d ' '`
  :0fhw
  * ? grep -F -i -x -q "$ffield" $WHITELIST
  | formail -i "X-HH-Whitelist: YES"

  :0Efhw
  | formail -i "X-HH-Whitelist: NO"

Spam filtering:

  SPAMASSASSINDIR=$HOME/spam/spamassassin

  :0fw
  | $SPAMASSASSINDIR/spamassassin -c $SPAMASSASSINDIR/rules -P

  :0:
  * ^X-Spam-Flag: YES
    spam

If something hasn't been classified as spam, see if I know the guy:

  INCLUDERC=$HOME/.procmail/spamfiltering

  :0:
  * ^X-HH-Whitelist: NO
    unknown

-*- SpamAssassin
================

Here is how I use my whitelist:

  # Whitelist filtering
  header          ON_WHITELIST    X-HH-Whitelist  =~      /^YES$/
  describe        ON_WHITELIST    Sender whitelisted
  score           ON_WHITELIST    -5.0

I have a few other non-related settings:

  # Don't rewrite the subject
  rewrite_subject 0

  # Leave the content-type alone
  defang_mime 0

  # Report in the header
  report_header 1
  use_terse_report 1

-*- Muttrc
==========

A few things that I configured to make my life easier:

  # Spam stuff
  # Show spam headers
  unignore X-Spam-Status X-Spam-Report
  # Highlight spam
  #ifndef USE_IMAP
  color index     red     default "~h '^X-Spam-Flag: YES'"
  color index     red     blue "~h '^X-Spam-Flag: YES' ! ~h '^X-Spam-Status: .*RAZOR_CHECK'"
  #endif
  # How to report spam
  #define REPORT_BULK_SPAM ";|formail -s spamassassin -r -D\n"
  macro index \eR "T! ~s '\^[[]Moderator Action[]] ' ~h '\^X-Spam-Flag: YES' ! ~h '\^X-Spam-Status: .*RAZOR_CHECK'\n"
  macro index \eS REPORT_BULK_SPAM
  macro pager \eS REPORT_BULK_SPAM

Note that there are spp commands because I preprocess my muttrc[4].

I am going to test that extensively and tweak it if necessary.

  2. http://spamassassin.taint.org/
  3. http://razor.sf.net/
  4. http://larve.net/people/hugo/2002/04/mutt-cpp
-- 
Hugo Haas <hugo@larve.net> - http://larve.net/people/hugo/
Perhaps, but let's not get bogged down in semantics. -- Homer J.
Simpson

HURL: fogo mailing list archives, maintained by Gerald Oskoboiny