Re: Spam filters

from Hugo Haas <hugo@larve.net>, Sat, 27 Jan 2001 19:49:57 -0500

Replies:

Parents:

--cNdxnHkX5QqsyA0e
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Mon, Dec 18, 2000, Gerald Oskoboiny wrote:
> I've been getting a ton of spam lately (~36 messages per day this
> December, out of a total of 384 messages per day), so I implemented
> this whitelist-based filtering. Notes/code:
>
> http://impressive.net/people/gerald/2000/12/spam-filtering.html

I have finally switched from Junkfilter[1] to whitelist based
filtering. I read Gerald's page, and I made a few changes to Gerald's
implementation.

1/ Add to whitelist (atw) script

Gerald's version is unsafe: it doesn't lock the whitelist before
writing to it, it executes a program that it creates in /tmp.

I have rewritten it and it is attached to this email.

I have added two new features: it is possible to add a list of email
addresses and to import email addresses from Mutt aliases:
- 'atw': process RFC822 message from stdin.
- 'atw -a': takes a list of email addresses as argument.
- 'atw -M': process Mutt aliases from stdin.

2/ Procmail rule

It has two few minor mistakes:
- it does not lock the folder.
- it uses regular expression matching, which has two problems:
+ it is case sensitive.
+ a subset of a known email address could match.
+ there are special characters (at least '.') in email addresses.

Here is the rule I use:

# White-base filtering
WHITELIST_DIR=$HOME/whitelist
WHITELIST=$WHITELIST_DIR/whitelist
OTHERS=$WHITELIST_DIR/others
ffield=`formail -XFrom: | formail -r -xTo: | tr -d ' '`
:0:
* ! ? grep -F -i -x -q "$ffield" $WHITELIST $OTHERS
INBOX-unknown

But I think that it is going to be really cool. Thanks Gerald for
helping me switch.

1. http://www.pobox.com/~gsutter/junkfilter/

--
Hugo Haas <hugo@larve.net> - http://larve.net/people/hugo/
- Surely you can't put a price on your family's lives! - I didn't think
so either, but here we are. -- Homer J. Simpson

--cNdxnHkX5QqsyA0e
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=atw

#! /bin/sh
# atw: Add To Whitelist
# Modification of Gerald Oskoboiny's original atw
# See:
# http://impressive.net/people/gerald/2000/12/spam-filtering.html
# (c) 2001 Hugo Haas - Public domain

PATH=/bin:/usr/bin
WHITELIST_DIR=$HOME/whitelist
WHITELIST=$WHITELIST_DIR/whitelist
OTHERS=$WHITELIST_DIR/others
LOCKFILE=$WHITELIST_DIR/whitelist.lock

umask 077

# Get a lock
lock() {
lockfile $LOCKFILE
# Ensure that the lock will be removed when we are done
trap "rm -f $LOCKFILE" 0 2 3 15
[ -f $WHITELIST ] || touch $WHITELIST
}

# Add an email address to the whitelist
add_address() {
echo -n "Checking for $1... "
grep -F -i -x -q "$1" $WHITELIST $OTHERS
if [ $? = 1 ]
then
echo "$1" >> $WHITELIST
echo "added."
else
echo "already listed."
fi
}

# If argument -a is given, add the list of email addresses given as arguments.
# If argument -m is given, add a single RFC822 message (from stdin).
# If argument -M is given, import a list of Mutt aliases (from stdin).
# Else read a list of RFC822 messages from stdin and add the From line to
# the white list.

if [ "$1" = '-a' ]
then
lock
shift
while [ $# != 0 ]
do
add_address $1
shift
done
exit
elif [ "$1" = '-m' ]
then
lock
add_address `formail -XFrom: | formail -r -xTo: | tr -d " "`
exit
elif [ "$1" = '-M' ]
then
perl -n -e 'next if (! m/^\w*alias/); chomp; $_ =~ m/[^ ]+@[^ ]+/; $_ = $&; s/^[<]//g; s/[>]$//; print $_."\n";' | exec xargs $0 -a
fi

exec formail -s $0 -m

--cNdxnHkX5QqsyA0e--

Re: Spam filters

from Hugo Haas <hugo@larve.net>, Sun, 28 Jan 2001 02:38:25 -0500

Replies:

None.

Parents:

--HcAYCG3uE/tztfnV
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Something else that I noted is that Gerald says in his documentation[1]:

Then:
touch .whitelist
atw < $MAIL
atw < mail/w3c/inbox
# (repeat with any other non-list mailboxes you have that don't have spam)

A good idea to get a list of email addresses which work is to extract
the recipients of emails that you sent out.

I have added a '-t' option to atw (attached) which scans the To and Cc
fields instead of the From field:

atw -t < ootbox

1. http://impressive.net/people/gerald/2000/12/spam-filtering.html

--
Hugo Haas <hugo@larve.net> - http://larve.net/people/hugo/
What kind of side dishes will we be enjoying this evening with our
frozen waffles?

--HcAYCG3uE/tztfnV
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=atw

#! /bin/sh
# atw: Add To Whitelist
# Modification of Gerald Oskoboiny's original atw
# See:
# http://impressive.net/people/gerald/2000/12/spam-filtering.html
# (c) 2001 Hugo Haas - Public domain

PATH=/bin:/usr/bin
WHITELIST_DIR=$HOME/whitelist
WHITELIST=$WHITELIST_DIR/whitelist
OTHERS=$WHITELIST_DIR/others
LOCKFILE=$WHITELIST_DIR/whitelist.lock

umask 077

# Get a lock
lock() {
lockfile $LOCKFILE
# Ensure that the lock will be removed when we are done
trap "rm -f $LOCKFILE" 0 2 3 15
[ -f $WHITELIST ] || touch $WHITELIST
}

# Add an email address to the whitelist
add_address() {
echo -n "Checking for $1... "
grep -F -i -x -q "$1" $WHITELIST $OTHERS
if [ $? = 1 ]
then
echo "$1" >> $WHITELIST
echo "added."
else
echo "already listed."
fi
}

# Add a list of adresses
add_addresses() {
for email in $*
do
add_address $email
done
exit
}

# If -t is given as a first argument, scan the To and Cc fields instead of
# the From line.
if [ "$1" = '-t' ]
then
to='-t'
shift
fi

# If argument -a is given, add the list of email addresses given as arguments.
# If argument -m is given, add a single RFC822 message (from stdin).
# If argument -M is given, import a list of Mutt aliases (from stdin).
# Else read a list of RFC822 messages from stdin.

if [ "$1" = '-a' ]
then
lock
shift
add_addresses $*
elif [ "$1" = '-m' ]
then
lock
if [ "$to" = '-t' ]
then
addresses=`formail -x To -x Cc | perl -pn -e 's/,/\n/g' | perl -n -e 'chomp; s/\".*?\"//g; m/[^ ]+@[^ ]+/; $_ = $&; s/^[<]//g; s/[>]$//; print $_."\n";'`
else
addresses=`formail -XFrom: | formail -r -xTo: | tr -d " "`
fi
add_addresses $addresses
elif [ "$1" = '-M' ]
then
addresses=`perl -n -e 'next if (! m/^\w*alias\w/); chomp; s/\".*?\"//g; m/[^ ]+@[^ ]+/; $_ = $&; s/^[<]//g; s/[>]$//; print $_."\n";'`
add_addresses $addresses
fi

exec formail -s $0 $to -m

--HcAYCG3uE/tztfnV--

Re: Spam filters

from Hugo Haas <hugo@larve.net>, Mon, 15 Apr 2002 18:13:01 -0400

Replies:

Parents:

* Hugo Haas <hugo@larve.net> [2001-01-27 19:49-0500]
> I have finally switched from Junkfilter[1] to whitelist based
> filtering.
[..]

* Hugo Haas <hugo@larve.net> [2002-04-03 11:10-0500]
[..]
> It seems that there is no immediate nor easy technological answer, and
> no easy legal action either.

I have changed my spam filtering techniques taking into account the
new type of spam. I talked to Max who started using SpamAssassin[2]
and was happy about it. I had a look and found it cool. But I didn't
want to abandon my whitelist filtering.

I therefore am using 3 different folders:
- emails identified as spam.
- emails not identified as spam from people I know (who are on my
whitelist).
- emails not identified as spam from people I don't know.

SpamAssassin works with a scoring system. I use my whitelist to
decrease the score when somebody is on my whitelist. It is therefore
easier to be considered as a spammer if the address in not on my
whitelist.

I have also enabled Vipul's Razor[3] for increasing my detection
accuracy. When I detect spam which isn't registered in Razor, I do so.

Here is what it looks like:

-*- Promailrc
=============

Whitelist detection:

# White-base filtering
WHITELIST_DIR=$HOME/whitelist
WHITELIST=$WHITELIST_DIR/whitelist
ffield=`formail -XFrom: | formail -r -xTo: | tr -d ' '`
:0fhw
* ? grep -F -i -x -q "$ffield" $WHITELIST
| formail -i "X-HH-Whitelist: YES"

:0Efhw
| formail -i "X-HH-Whitelist: NO"

Spam filtering:

SPAMASSASSINDIR=$HOME/spam/spamassassin

:0fw
| $SPAMASSASSINDIR/spamassassin -c $SPAMASSASSINDIR/rules -P

:0:
* ^X-Spam-Flag: YES
spam

If something hasn't been classified as spam, see if I know the guy:

INCLUDERC=$HOME/.procmail/spamfiltering

:0:
* ^X-HH-Whitelist: NO
unknown

-*- SpamAssassin
================

Here is how I use my whitelist:

# Whitelist filtering
header ON_WHITELIST X-HH-Whitelist =~ /^YES$/
describe ON_WHITELIST Sender whitelisted
score ON_WHITELIST -5.0

I have a few other non-related settings:

# Don't rewrite the subject
rewrite_subject 0

# Leave the content-type alone
defang_mime 0

# Report in the header
report_header 1
use_terse_report 1

-*- Muttrc
==========

A few things that I configured to make my life easier:

# Spam stuff
# Show spam headers
unignore X-Spam-Status X-Spam-Report
# Highlight spam
#ifndef USE_IMAP
color index red default "~h '^X-Spam-Flag: YES'"
color index red blue "~h '^X-Spam-Flag: YES' ! ~h '^X-Spam-Status: .*RAZOR_CHECK'"
#endif
# How to report spam
#define REPORT_BULK_SPAM ";|formail -s spamassassin -r -D\n"
macro index \eR "T! ~s '\^[[]Moderator Action[]] ' ~h '\^X-Spam-Flag: YES' ! ~h '\^X-Spam-Status: .*RAZOR_CHECK'\n"
macro index \eS REPORT_BULK_SPAM
macro pager \eS REPORT_BULK_SPAM

Note that there are spp commands because I preprocess my muttrc[4].

I am going to test that extensively and tweak it if necessary.

2. http://spamassassin.taint.org/
3. http://razor.sf.net/
4. http://larve.net/people/hugo/2002/04/mutt-cpp
--
Hugo Haas <hugo@larve.net> - http://larve.net/people/hugo/
Perhaps, but let's not get bogged down in semantics. -- Homer J.
Simpson

Re: Spam filters

from Gerald Oskoboiny <gerald@impressive.net>, Mon, 15 Apr 2002 23:07:31 -0400

Replies:

hugo

Parents:

On Mon, Apr 15, 2002 at 06:13:01PM -0400, Hugo Haas wrote:
> * Hugo Haas <hugo@larve.net> [2001-01-27 19:49-0500]
> > I have finally switched from Junkfilter[1] to whitelist based
> > filtering.
> [..]

> I have changed my spam filtering techniques taking into account the
> new type of spam. I talked to Max who started using SpamAssassin[2]
> and was happy about it. I had a look and found it cool. But I didn't
> want to abandon my whitelist filtering.

SpamAssassin looks excellent from what I have seen. I understand
it has some kind of automatic whitelist feature: every time you
receive non-spam from someone, their whitelist score increases?
(or something like that)

Thanks for the docs on your setup, I'm sure that will be useful.

> Note that there are [cpp] commands because I preprocess my muttrc[4].

> 2. http://spamassassin.taint.org/
> 4. http://larve.net/people/hugo/2002/04/mutt-cpp

I'm curious why you need to use cpp; I have most of my settings
in my .muttrc [5], and use a couple extra files [6] for other stuff
that is specific to a certain environment (personal or w3c mail)

For my w3c mail, I invoke mutt with "w3cmutt", which is aliased to:
zot "w3c mail"; localsuffix="-w3c" mutt

(zot just changes the rxvt title bar; it's called zot because that's
what it was called when I got it from a friend 10 years ago)

Hmm... I guess you tried something like that before switching to
cpp; I'm just wondering what it was you finally needed cpp for.

[5] http://impressive.net/people/gerald/misc/dotfiles/muttrc
[6] http://impressive.net/people/gerald/misc/dotfiles/muttrc-local-devo
http://impressive.net/people/gerald/misc/dotfiles/muttrc-local-w3c

(I don't need different configs for local/remote, since I always
store all my mail on my laptop.)

--
Gerald Oskoboiny <gerald@impressive.net>
http://impressive.net/people/gerald/

Re: Spam filters

from Hugo Haas <hugo@larve.net>, Tue, 16 Apr 2002 07:40:00 -0400

Replies:

None.

Parents:

* Gerald Oskoboiny <gerald@impressive.net> [2002-04-15 23:07-0400]
> SpamAssassin looks excellent from what I have seen. I understand
> it has some kind of automatic whitelist feature: every time you
> receive non-spam from someone, their whitelist score increases?
> (or something like that)
[..]

Yes, there is an auto-whitelist feature. I haven't tried it yet. I
wasn't sure about how to leverage my existing whitelist to bootstrap
it, so I preferred to try and integrate my whitelist another way, and
maybe I will play with the auto-whitelist later.

I was somewhat worried that the whitelist would let everything
through. By default, you need 5 points to be declared as spam. An
email from the 'EMail-IT' True Stealth System that I was complaining
about[7] scores as follows:

X-Spam-Report: 13.4 hits, 5 required;
* -0.3 -- Cc: contains similar domains at least 10 times
* 1.7 -- BODY: Includes a link to send a mail with a subject
* -0.2 -- BODY: Includes a URL link to send an email
* 3.5 -- BODY: Link to a URL containing "remove"
* 3.0 -- Listed in Razor, see http://razor.sourceforge.net/
* 4.5 -- HTML-only mail, with no text version
* 0.2 -- From and To the same address
* 1.0 -- Received via a relay in orbs.dorkslayers.com
[RBL check: found 150.82.130.139.orbs.dorkslayers.com.]

With my whitelist 5 point-bonus, it scores 8.4 and is still recognized
as spam. From what I have seen, their whitelist had a 100 point-bonus,
which seems for too much[8]:

header From: address is in the user's white-list USER_IN_WHITELIST -100.0

There must be something I haven't understood about it yet.

> > Note that there are [cpp] commands because I preprocess my muttrc[4].
>
> > 2. http://spamassassin.taint.org/
> > 4. http://larve.net/people/hugo/2002/04/mutt-cpp
>
> I'm curious why you need to use cpp; I have most of my settings
> in my .muttrc [5], and use a couple extra files [6] for other stuff
> that is specific to a certain environment (personal or w3c mail)
>
> For my w3c mail, I invoke mutt with "w3cmutt", which is aliased to:
> zot "w3c mail"; localsuffix="-w3c" mutt
>
> (zot just changes the rxvt title bar; it's called zot because that's
> what it was called when I got it from a friend 10 years ago)
>
> Hmm... I guess you tried something like that before switching to
> cpp; I'm just wondering what it was you finally needed cpp for.

Indeed I tried something like that, but it got rapidly very complex.
On my laptop, depending on if I read my private mail or my work mail,
if I use isync to read my IMAP folders locally or if I read them
remotely, I have 4 aliases:

imutt='my_mutt -DWORK_CONF -DON_LAPTOP -DUSE_IMAP --'
imuttp='my_mutt -DON_LAPTOP -DUSE_IMAP --'
mutt='my_mutt -DWORK_CONF -DON_LAPTOP --'
muttp='my_mutt -DON_LAPTOP --'

and at work, I have:

mutt='my_mutt -DWORK_CONF --'
muttp='my_mutt --'

My configuration is fairly complex because I have lots of different
settings for each of them. Here is an example:

#ifdef WORK_CONF
#ifdef USE_IMAP
#define FOLDER "{localhost:1430}mail"
#else
#define FOLDER "~/mail"
#endif
#else
#ifdef USE_IMAP
#define FOLDER "{localhost:1430}private-mail"
#else
#define FOLDER "~/private-mail"
#endif
#endif

set folder=FOLDER

and another one:

#ifndef USE_IMAP
#ifndef ON_LAPTOP
# Hide the IMAP server messages
folder-hook . "push \"<limit> ! (~s 'DELETE THIS MESSAGE -- FOLDER INTERNAL DATA' ~f MAILER-DAEMON)\n\""
#endif
#endif

I used your technique for a long time, but it just became too complex
to manage so many configurations.

7. http://impressive.net/archives/fogo/20020403161017.GA26122@jibboom.w3.org
8. http://spamassassin.taint.org/tests.html
--
Hugo Haas <hugo@larve.net> - http://larve.net/people/hugo/
Kids, your mother's under a lot of pressure, why don't we let her clear
the table in peace? -- Homer J. Simpson

Re: Spam filters

from Hugo Haas <hugo@larve.net>, Tue, 23 Apr 2002 08:30:31 -0400

Replies:

None.

Parents:

* Hugo Haas <hugo@larve.net> [2002-04-15 18:13-0400]
> I have changed my spam filtering techniques taking into account the
> new type of spam. I talked to Max who started using SpamAssassin[2]
> and was happy about it. I had a look and found it cool. But I didn't
> want to abandon my whitelist filtering.
>
> I therefore am using 3 different folders:
> - emails identified as spam.
> - emails not identified as spam from people I know (who are on my
> whitelist).
> - emails not identified as spam from people I don't know.
>
> SpamAssassin works with a scoring system. I use my whitelist to
> decrease the score when somebody is on my whitelist. It is therefore
> easier to be considered as a spammer if the address in not on my
> whitelist.
>
> I have also enabled Vipul's Razor[3] for increasing my detection
> accuracy. When I detect spam which isn't registered in Razor, I do so.

I wanted to give an update an my spam filtering system. With the new
version of SpamAssassin (2.20) and Razor (1.20), my spam filter
catches about 98% of the spam (I lowered the threshold to 3.6 hits and
tweaked a couple of other rules). The 2% of spams that got through
went into my unknown sender folder.

The only non-spam email I saw it catch were bounces from mailing
lists.

In order to make sure that I improve my (and everybody else's) spam
filtering, I systematically bounce spam that went through to
spamassassin-sightings[4] (ESC-B in my Mutt session) and register all
confirmed spam with Razor[5] (ESC-R ; ESC-Z in my Mutt session). This
is easy enough that it just takes a few seconds every day or two.

Basically, I am *very* happy about this new system, and would
encourage people to use it: the more people use Razor and report spams
to it, the less spam we will see.

4. http://lists.sourceforge.net/lists/listinfo/spamassassin-sightings
5. http://razor.sourceforge.net/
--
Hugo Haas <hugo@larve.net> - http://larve.net/people/hugo/
Mais alors, tout se recoupe !