Re: Spam filters


--cNdxnHkX5QqsyA0e
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Mon, Dec 18, 2000, Gerald Oskoboiny wrote:
> I've been getting a ton of spam lately (~36 messages per day this
> December, out of a total of 384 messages per day), so I implemented
> this whitelist-based filtering. Notes/code:
>
>     http://impressive.net/people/gerald/2000/12/spam-filtering.html

I have finally switched from Junkfilter[1] to whitelist based
filtering. I read Gerald's page, and I made a few changes to Gerald's
implementation.

1/ Add to whitelist (atw) script

Gerald's version is unsafe: it doesn't lock the whitelist before
writing to it, it executes a program that it creates in /tmp.

I have rewritten it and it is attached to this email.

I have added two new features: it is possible to add a list of email
addresses and to import email addresses from Mutt aliases:
- 'atw': process RFC822 message from stdin.
- 'atw -a': takes a list of email addresses as argument.
- 'atw -M': process Mutt aliases from stdin.

2/ Procmail rule

It has two few minor mistakes:
- it does not lock the folder.
- it uses regular expression matching, which has two problems:
 + it is case sensitive.
 + a subset of a known email address could match.
 + there are special characters (at least '.') in email addresses.

Here is the rule I use:

   # White-base filtering
   WHITELIST_DIR=$HOME/whitelist
   WHITELIST=$WHITELIST_DIR/whitelist
   OTHERS=$WHITELIST_DIR/others
   ffield=`formail -XFrom: | formail -r -xTo: | tr -d ' '`
   :0:
   * ! ? grep -F -i -x -q "$ffield" $WHITELIST $OTHERS
     INBOX-unknown

But I think that it is going to be really cool. Thanks Gerald for
helping me switch.

 1. http://www.pobox.com/~gsutter/junkfilter/

--
Hugo Haas <[email protected]> - http://larve.net/people/hugo/
- Surely you can't put a price on your family's lives! - I didn't think
so either, but here we are. -- Homer J. Simpson

--cNdxnHkX5QqsyA0e
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=atw

#! /bin/sh
# atw: Add To Whitelist
# Modification of Gerald Oskoboiny's original atw
# See:
# http://impressive.net/people/gerald/2000/12/spam-filtering.html
# (c) 2001 Hugo Haas - Public domain

PATH=/bin:/usr/bin
WHITELIST_DIR=$HOME/whitelist
WHITELIST=$WHITELIST_DIR/whitelist
OTHERS=$WHITELIST_DIR/others
LOCKFILE=$WHITELIST_DIR/whitelist.lock

umask 077

# Get a lock
lock() {
 lockfile $LOCKFILE
 # Ensure that the lock will be removed when we are done
 trap "rm -f $LOCKFILE" 0 2 3 15
 [ -f $WHITELIST ] || touch $WHITELIST
}

# Add an email address to the whitelist
add_address() {
 echo -n "Checking for $1... "
 grep -F -i -x -q "$1" $WHITELIST $OTHERS
 if [ $? = 1 ]
 then
   echo "$1" >> $WHITELIST
   echo "added."
 else
   echo "already listed."
 fi
}

# If argument -a is given, add the list of email addresses given as arguments.
# If argument -m is given, add a single RFC822 message (from stdin).
# If argument -M is given, import a list of Mutt aliases (from stdin).
# Else read a list of RFC822 messages from stdin and add the From line to
# the white list.

if [ "$1" = '-a' ]
then
 lock
 shift
 while [ $# != 0 ]
 do
   add_address $1
   shift
 done
 exit
elif [ "$1" = '-m' ]
then
 lock
 add_address `formail -XFrom: | formail -r -xTo: | tr -d " "`
 exit
elif [ "$1" = '-M' ]
then
 perl -n -e 'next if (! m/^\w*alias/); chomp; $_ =~ m/[^ ]+@[^ ]+/; $_ = $&; s/^[<]//g; s/[>]$//; print $_."\n";' | exec xargs $0 -a
fi

exec formail -s $0 -m

--cNdxnHkX5QqsyA0e--

Re: Spam filters

Replies:

  • None.

Parents:


--HcAYCG3uE/tztfnV
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Something else that I noted is that Gerald says in his documentation[1]:

  Then:
   touch .whitelist
   atw < $MAIL
   atw < mail/w3c/inbox
   # (repeat with any other non-list mailboxes you have that don't have spam)

A good idea to get a list of email addresses which work is to extract
the recipients of emails that you sent out.

I have added a '-t' option to atw (attached) which scans the To and Cc
fields instead of the From field:

atw -t < ootbox

 1. http://impressive.net/people/gerald/2000/12/spam-filtering.html

--
Hugo Haas <[email protected]> - http://larve.net/people/hugo/
What kind of side dishes will we be enjoying this evening with our
frozen waffles?

--HcAYCG3uE/tztfnV
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=atw

#! /bin/sh
# atw: Add To Whitelist
# Modification of Gerald Oskoboiny's original atw
# See:
# http://impressive.net/people/gerald/2000/12/spam-filtering.html
# (c) 2001 Hugo Haas - Public domain

PATH=/bin:/usr/bin
WHITELIST_DIR=$HOME/whitelist
WHITELIST=$WHITELIST_DIR/whitelist
OTHERS=$WHITELIST_DIR/others
LOCKFILE=$WHITELIST_DIR/whitelist.lock

umask 077

# Get a lock
lock() {
 lockfile $LOCKFILE
 # Ensure that the lock will be removed when we are done
 trap "rm -f $LOCKFILE" 0 2 3 15
 [ -f $WHITELIST ] || touch $WHITELIST
}

# Add an email address to the whitelist
add_address() {
 echo -n "Checking for $1... "
 grep -F -i -x -q "$1" $WHITELIST $OTHERS
 if [ $? = 1 ]
 then
   echo "$1" >> $WHITELIST
   echo "added."
 else
   echo "already listed."
 fi
}

# Add a list of adresses
add_addresses() {
 for email in $*
 do
   add_address $email
 done
 exit
}

# If -t is given as a first argument, scan the To and Cc fields instead of
# the From line.
if [ "$1" = '-t' ]
then
 to='-t'
 shift
fi

# If argument -a is given, add the list of email addresses given as arguments.
# If argument -m is given, add a single RFC822 message (from stdin).
# If argument -M is given, import a list of Mutt aliases (from stdin).
# Else read a list of RFC822 messages from stdin.

if [ "$1" = '-a' ]
then
 lock
 shift
 add_addresses $*
elif [ "$1" = '-m' ]
then
 lock
 if [ "$to" = '-t' ]
 then
   addresses=`formail -x To -x Cc | perl -pn -e 's/,/\n/g' | perl -n -e 'chomp; s/\".*?\"//g; m/[^ ]+@[^ ]+/; $_ = $&; s/^[<]//g; s/[>]$//; print $_."\n";'`
 else
   addresses=`formail -XFrom: | formail -r -xTo: | tr -d " "`
 fi
 add_addresses $addresses
elif [ "$1" = '-M' ]
then
 addresses=`perl -n -e 'next if (! m/^\w*alias\w/); chomp; s/\".*?\"//g; m/[^ ]+@[^ ]+/; $_ = $&; s/^[<]//g; s/[>]$//; print $_."\n";'`
 add_addresses $addresses
fi

exec formail -s $0 $to -m

--HcAYCG3uE/tztfnV--

Re: Spam filters

* Hugo Haas <[email protected]> [2001-01-27 19:49-0500]
> I have finally switched from Junkfilter[1] to whitelist based
> filtering.
[..]

* Hugo Haas <[email protected]> [2002-04-03 11:10-0500]
[..]
> It seems that there is no immediate nor easy technological answer, and
> no easy legal action either.

I have changed my spam filtering techniques taking into account the
new type of spam. I talked to Max who started using SpamAssassin[2]
and was happy about it. I had a look and found it cool. But I didn't
want to abandon my whitelist filtering.

I therefore am using 3 different folders:
- emails identified as spam.
- emails not identified as spam from people I know (who are on my
 whitelist).
- emails not identified as spam from people I don't know.

SpamAssassin works with a scoring system. I use my whitelist to
decrease the score when somebody is on my whitelist. It is therefore
easier to be considered as a spammer if the address in not on my
whitelist.

I have also enabled Vipul's Razor[3] for increasing my detection
accuracy. When I detect spam which isn't registered in Razor, I do so.

Here is what it looks like:

-*- Promailrc
=============

Whitelist detection:

 # White-base filtering
 WHITELIST_DIR=$HOME/whitelist
 WHITELIST=$WHITELIST_DIR/whitelist
 ffield=`formail -XFrom: | formail -r -xTo: | tr -d ' '`
 :0fhw
 * ? grep -F -i -x -q "$ffield" $WHITELIST
 | formail -i "X-HH-Whitelist: YES"

 :0Efhw
 | formail -i "X-HH-Whitelist: NO"

Spam filtering:

 SPAMASSASSINDIR=$HOME/spam/spamassassin

 :0fw
 | $SPAMASSASSINDIR/spamassassin -c $SPAMASSASSINDIR/rules -P

 :0:
 * ^X-Spam-Flag: YES
   spam

If something hasn't been classified as spam, see if I know the guy:

 INCLUDERC=$HOME/.procmail/spamfiltering

 :0:
 * ^X-HH-Whitelist: NO
   unknown

-*- SpamAssassin
================

Here is how I use my whitelist:

 # Whitelist filtering
 header          ON_WHITELIST    X-HH-Whitelist  =~      /^YES$/
 describe        ON_WHITELIST    Sender whitelisted
 score           ON_WHITELIST    -5.0

I have a few other non-related settings:

 # Don't rewrite the subject
 rewrite_subject 0

 # Leave the content-type alone
 defang_mime 0

 # Report in the header
 report_header 1
 use_terse_report 1

-*- Muttrc
==========

A few things that I configured to make my life easier:

 # Spam stuff
 # Show spam headers
 unignore X-Spam-Status X-Spam-Report
 # Highlight spam
 #ifndef USE_IMAP
 color index     red     default "~h '^X-Spam-Flag: YES'"
 color index     red     blue "~h '^X-Spam-Flag: YES' ! ~h '^X-Spam-Status: .*RAZOR_CHECK'"
 #endif
 # How to report spam
 #define REPORT_BULK_SPAM ";|formail -s spamassassin -r -D\n"
 macro index \eR "T! ~s '\^[[]Moderator Action[]] ' ~h '\^X-Spam-Flag: YES' ! ~h '\^X-Spam-Status: .*RAZOR_CHECK'\n"
 macro index \eS REPORT_BULK_SPAM
 macro pager \eS REPORT_BULK_SPAM

Note that there are spp commands because I preprocess my muttrc[4].

I am going to test that extensively and tweak it if necessary.

 2. http://spamassassin.taint.org/
 3. http://razor.sf.net/
 4. http://larve.net/people/hugo/2002/04/mutt-cpp
--
Hugo Haas <[email protected]> - http://larve.net/people/hugo/
Perhaps, but let's not get bogged down in semantics. -- Homer J.
Simpson

Re: Spam filters

On Mon, Apr 15, 2002 at 06:13:01PM -0400, Hugo Haas wrote:
> * Hugo Haas <[email protected]> [2001-01-27 19:49-0500]
> > I have finally switched from Junkfilter[1] to whitelist based
> > filtering.
> [..]

> I have changed my spam filtering techniques taking into account the
> new type of spam. I talked to Max who started using SpamAssassin[2]
> and was happy about it. I had a look and found it cool. But I didn't
> want to abandon my whitelist filtering.

SpamAssassin looks excellent from what I have seen. I understand
it has some kind of automatic whitelist feature: every time you
receive non-spam from someone, their whitelist score increases?
(or something like that)

Thanks for the docs on your setup, I'm sure that will be useful.

> Note that there are [cpp] commands because I preprocess my muttrc[4].

>   2. http://spamassassin.taint.org/
>   4. http://larve.net/people/hugo/2002/04/mutt-cpp

I'm curious why you need to use cpp; I have most of my settings
in my .muttrc [5], and use a couple extra files [6] for other stuff
that is specific to a certain environment (personal or w3c mail)

For my w3c mail, I invoke mutt with "w3cmutt", which is aliased to:
   zot "w3c mail"; localsuffix="-w3c" mutt

(zot just changes the rxvt title bar; it's called zot because that's
what it was called when I got it from a friend 10 years ago)

Hmm... I guess you tried something like that before switching to
cpp; I'm just wondering what it was you finally needed cpp for.

[5] http://impressive.net/people/gerald/misc/dotfiles/muttrc
[6] http://impressive.net/people/gerald/misc/dotfiles/muttrc-local-devo
   http://impressive.net/people/gerald/misc/dotfiles/muttrc-local-w3c

(I don't need different configs for local/remote, since I always
store all my mail on my laptop.)

--
Gerald Oskoboiny <[email protected]>
http://impressive.net/people/gerald/

Re: Spam filters

Replies:

  • None.

Parents:

* Gerald Oskoboiny <[email protected]> [2002-04-15 23:07-0400]
> SpamAssassin looks excellent from what I have seen. I understand
> it has some kind of automatic whitelist feature: every time you
> receive non-spam from someone, their whitelist score increases?
> (or something like that)
[..]

Yes, there is an auto-whitelist feature. I haven't tried it yet. I
wasn't sure about how to leverage my existing whitelist to bootstrap
it, so I preferred to try and integrate my whitelist another way, and
maybe I will play with the auto-whitelist later.

I was somewhat worried that the whitelist would let everything
through. By default, you need 5 points to be declared as spam. An
email from the 'EMail-IT' True Stealth System that I was complaining
about[7] scores as follows:

 X-Spam-Report:   13.4 hits, 5 required;
   * -0.3 -- Cc: contains similar domains at least 10 times
   *  1.7 -- BODY: Includes a link to send a mail with a subject
   * -0.2 -- BODY: Includes a URL link to send an email
   *  3.5 -- BODY: Link to a URL containing "remove"
   *  3.0 -- Listed in Razor, see http://razor.sourceforge.net/
   *  4.5 -- HTML-only mail, with no text version
   *  0.2 -- From and To the same address
   *  1.0 -- Received via a relay in orbs.dorkslayers.com
     [RBL check: found 150.82.130.139.orbs.dorkslayers.com.]

With my whitelist 5 point-bonus, it scores 8.4 and is still recognized
as spam. From what I have seen, their whitelist had a 100 point-bonus,
which seems for too much[8]:

header From: address is in the user's white-list USER_IN_WHITELIST -100.0

There must be something I haven't understood about it yet.

> > Note that there are [cpp] commands because I preprocess my muttrc[4].
>
> >   2. http://spamassassin.taint.org/
> >   4. http://larve.net/people/hugo/2002/04/mutt-cpp
>
> I'm curious why you need to use cpp; I have most of my settings
> in my .muttrc [5], and use a couple extra files [6] for other stuff
> that is specific to a certain environment (personal or w3c mail)
>
> For my w3c mail, I invoke mutt with "w3cmutt", which is aliased to:
>     zot "w3c mail"; localsuffix="-w3c" mutt
>
> (zot just changes the rxvt title bar; it's called zot because that's
> what it was called when I got it from a friend 10 years ago)
>
> Hmm... I guess you tried something like that before switching to
> cpp; I'm just wondering what it was you finally needed cpp for.

Indeed I tried something like that, but it got rapidly very complex.
On my laptop, depending on if I read my private mail or my work mail,
if I use isync to read my IMAP folders locally or if I read them
remotely, I have 4 aliases:

 imutt='my_mutt -DWORK_CONF -DON_LAPTOP -DUSE_IMAP --'
 imuttp='my_mutt -DON_LAPTOP -DUSE_IMAP --'
 mutt='my_mutt -DWORK_CONF -DON_LAPTOP --'
 muttp='my_mutt -DON_LAPTOP --'

and at work, I have:

 mutt='my_mutt -DWORK_CONF --'
 muttp='my_mutt --'

My configuration is fairly complex because I have lots of different
settings for each of them. Here is an example:

#ifdef WORK_CONF
  #ifdef USE_IMAP
    #define FOLDER "{localhost:1430}mail"
  #else
    #define FOLDER "~/mail"
  #endif
#else
  #ifdef USE_IMAP
    #define FOLDER "{localhost:1430}private-mail"
  #else
    #define FOLDER "~/private-mail"
  #endif
#endif

set folder=FOLDER

and another one:

#ifndef USE_IMAP
  #ifndef ON_LAPTOP
    # Hide the IMAP server messages
    folder-hook . "push \"<limit> ! (~s 'DELETE THIS MESSAGE -- FOLDER INTERNAL DATA' ~f MAILER-DAEMON)\n\""
  #endif
#endif

I used your technique for a long time, but it just became too complex
to manage so many configurations.

 7. http://impressive.net/archives/fogo/[email protected]
 8. http://spamassassin.taint.org/tests.html
--
Hugo Haas <[email protected]> - http://larve.net/people/hugo/
Kids, your mother's under a lot of pressure, why don't we let her clear
the table in peace? -- Homer J. Simpson

Re: Spam filters

Replies:

  • None.

Parents:

* Hugo Haas <[email protected]> [2002-04-15 18:13-0400]
> I have changed my spam filtering techniques taking into account the
> new type of spam. I talked to Max who started using SpamAssassin[2]
> and was happy about it. I had a look and found it cool. But I didn't
> want to abandon my whitelist filtering.
>
> I therefore am using 3 different folders:
> - emails identified as spam.
> - emails not identified as spam from people I know (who are on my
>   whitelist).
> - emails not identified as spam from people I don't know.
>
> SpamAssassin works with a scoring system. I use my whitelist to
> decrease the score when somebody is on my whitelist. It is therefore
> easier to be considered as a spammer if the address in not on my
> whitelist.
>
> I have also enabled Vipul's Razor[3] for increasing my detection
> accuracy. When I detect spam which isn't registered in Razor, I do so.

I wanted to give an update an my spam filtering system. With the new
version of SpamAssassin (2.20) and Razor (1.20), my spam filter
catches about 98% of the spam (I lowered the threshold to 3.6 hits and
tweaked a couple of other rules). The 2% of spams that got through
went into my unknown sender folder.

The only non-spam email I saw it catch were bounces from mailing
lists.

In order to make sure that I improve my (and everybody else's) spam
filtering, I systematically bounce spam that went through to
spamassassin-sightings[4] (ESC-B in my Mutt session) and register all
confirmed spam with Razor[5] (ESC-R ; ESC-Z in my Mutt session). This
is easy enough that it just takes a few seconds every day or two.

Basically, I am *very* happy about this new system, and would
encourage people to use it: the more people use Razor and report spams
to it, the less spam we will see.

 4. http://lists.sourceforge.net/lists/listinfo/spamassassin-sightings
 5. http://razor.sourceforge.net/
--
Hugo Haas <[email protected]> - http://larve.net/people/hugo/
Mais alors, tout se recoupe !

HURL: fogo mailing list archives, maintained by Gerald Oskoboiny