Re: Spam filters

from Hugo Haas <hugo@larve.net>, Mon, 15 Apr 2002 18:13:01 -0400

Replies:

Parents:

* Hugo Haas <hugo@larve.net> [2001-01-27 19:49-0500]
> I have finally switched from Junkfilter[1] to whitelist based
> filtering.
[..]

* Hugo Haas <hugo@larve.net> [2002-04-03 11:10-0500]
[..]
> It seems that there is no immediate nor easy technological answer, and
> no easy legal action either.

I have changed my spam filtering techniques taking into account the
new type of spam. I talked to Max who started using SpamAssassin[2]
and was happy about it. I had a look and found it cool. But I didn't
want to abandon my whitelist filtering.

I therefore am using 3 different folders:
- emails identified as spam.
- emails not identified as spam from people I know (who are on my
whitelist).
- emails not identified as spam from people I don't know.

SpamAssassin works with a scoring system. I use my whitelist to
decrease the score when somebody is on my whitelist. It is therefore
easier to be considered as a spammer if the address in not on my
whitelist.

I have also enabled Vipul's Razor[3] for increasing my detection
accuracy. When I detect spam which isn't registered in Razor, I do so.

Here is what it looks like:

-*- Promailrc
=============

Whitelist detection:

# White-base filtering
WHITELIST_DIR=$HOME/whitelist
WHITELIST=$WHITELIST_DIR/whitelist
ffield=`formail -XFrom: | formail -r -xTo: | tr -d ' '`
:0fhw
* ? grep -F -i -x -q "$ffield" $WHITELIST
| formail -i "X-HH-Whitelist: YES"

:0Efhw
| formail -i "X-HH-Whitelist: NO"

Spam filtering:

SPAMASSASSINDIR=$HOME/spam/spamassassin

:0fw
| $SPAMASSASSINDIR/spamassassin -c $SPAMASSASSINDIR/rules -P

:0:
* ^X-Spam-Flag: YES
spam

If something hasn't been classified as spam, see if I know the guy:

INCLUDERC=$HOME/.procmail/spamfiltering

:0:
* ^X-HH-Whitelist: NO
unknown

-*- SpamAssassin
================

Here is how I use my whitelist:

# Whitelist filtering
header ON_WHITELIST X-HH-Whitelist =~ /^YES$/
describe ON_WHITELIST Sender whitelisted
score ON_WHITELIST -5.0

I have a few other non-related settings:

# Don't rewrite the subject
rewrite_subject 0

# Leave the content-type alone
defang_mime 0

# Report in the header
report_header 1
use_terse_report 1

-*- Muttrc
==========

A few things that I configured to make my life easier:

# Spam stuff
# Show spam headers
unignore X-Spam-Status X-Spam-Report
# Highlight spam
#ifndef USE_IMAP
color index red default "~h '^X-Spam-Flag: YES'"
color index red blue "~h '^X-Spam-Flag: YES' ! ~h '^X-Spam-Status: .*RAZOR_CHECK'"
#endif
# How to report spam
#define REPORT_BULK_SPAM ";|formail -s spamassassin -r -D\n"
macro index \eR "T! ~s '\^[[]Moderator Action[]] ' ~h '\^X-Spam-Flag: YES' ! ~h '\^X-Spam-Status: .*RAZOR_CHECK'\n"
macro index \eS REPORT_BULK_SPAM
macro pager \eS REPORT_BULK_SPAM

Note that there are spp commands because I preprocess my muttrc[4].

I am going to test that extensively and tweak it if necessary.

2. http://spamassassin.taint.org/
3. http://razor.sf.net/
4. http://larve.net/people/hugo/2002/04/mutt-cpp
--
Hugo Haas <hugo@larve.net> - http://larve.net/people/hugo/
Perhaps, but let's not get bogged down in semantics. -- Homer J.
Simpson

Re: Spam filters

from Gerald Oskoboiny <gerald@impressive.net>, Mon, 15 Apr 2002 23:07:31 -0400

Replies:

hugo

Parents:

On Mon, Apr 15, 2002 at 06:13:01PM -0400, Hugo Haas wrote:
> * Hugo Haas <hugo@larve.net> [2001-01-27 19:49-0500]
> > I have finally switched from Junkfilter[1] to whitelist based
> > filtering.
> [..]

> I have changed my spam filtering techniques taking into account the
> new type of spam. I talked to Max who started using SpamAssassin[2]
> and was happy about it. I had a look and found it cool. But I didn't
> want to abandon my whitelist filtering.

SpamAssassin looks excellent from what I have seen. I understand
it has some kind of automatic whitelist feature: every time you
receive non-spam from someone, their whitelist score increases?
(or something like that)

Thanks for the docs on your setup, I'm sure that will be useful.

> Note that there are [cpp] commands because I preprocess my muttrc[4].

> 2. http://spamassassin.taint.org/
> 4. http://larve.net/people/hugo/2002/04/mutt-cpp

I'm curious why you need to use cpp; I have most of my settings
in my .muttrc [5], and use a couple extra files [6] for other stuff
that is specific to a certain environment (personal or w3c mail)

For my w3c mail, I invoke mutt with "w3cmutt", which is aliased to:
zot "w3c mail"; localsuffix="-w3c" mutt

(zot just changes the rxvt title bar; it's called zot because that's
what it was called when I got it from a friend 10 years ago)

Hmm... I guess you tried something like that before switching to
cpp; I'm just wondering what it was you finally needed cpp for.

[5] http://impressive.net/people/gerald/misc/dotfiles/muttrc
[6] http://impressive.net/people/gerald/misc/dotfiles/muttrc-local-devo
http://impressive.net/people/gerald/misc/dotfiles/muttrc-local-w3c

(I don't need different configs for local/remote, since I always
store all my mail on my laptop.)

--
Gerald Oskoboiny <gerald@impressive.net>
http://impressive.net/people/gerald/

Re: Spam filters

from Hugo Haas <hugo@larve.net>, Tue, 16 Apr 2002 07:40:00 -0400

Replies:

None.

Parents:

* Gerald Oskoboiny <gerald@impressive.net> [2002-04-15 23:07-0400]
> SpamAssassin looks excellent from what I have seen. I understand
> it has some kind of automatic whitelist feature: every time you
> receive non-spam from someone, their whitelist score increases?
> (or something like that)
[..]

Yes, there is an auto-whitelist feature. I haven't tried it yet. I
wasn't sure about how to leverage my existing whitelist to bootstrap
it, so I preferred to try and integrate my whitelist another way, and
maybe I will play with the auto-whitelist later.

I was somewhat worried that the whitelist would let everything
through. By default, you need 5 points to be declared as spam. An
email from the 'EMail-IT' True Stealth System that I was complaining
about[7] scores as follows:

X-Spam-Report: 13.4 hits, 5 required;
* -0.3 -- Cc: contains similar domains at least 10 times
* 1.7 -- BODY: Includes a link to send a mail with a subject
* -0.2 -- BODY: Includes a URL link to send an email
* 3.5 -- BODY: Link to a URL containing "remove"
* 3.0 -- Listed in Razor, see http://razor.sourceforge.net/
* 4.5 -- HTML-only mail, with no text version
* 0.2 -- From and To the same address
* 1.0 -- Received via a relay in orbs.dorkslayers.com
[RBL check: found 150.82.130.139.orbs.dorkslayers.com.]

With my whitelist 5 point-bonus, it scores 8.4 and is still recognized
as spam. From what I have seen, their whitelist had a 100 point-bonus,
which seems for too much[8]:

header From: address is in the user's white-list USER_IN_WHITELIST -100.0

There must be something I haven't understood about it yet.

> > Note that there are [cpp] commands because I preprocess my muttrc[4].
>
> > 2. http://spamassassin.taint.org/
> > 4. http://larve.net/people/hugo/2002/04/mutt-cpp
>
> I'm curious why you need to use cpp; I have most of my settings
> in my .muttrc [5], and use a couple extra files [6] for other stuff
> that is specific to a certain environment (personal or w3c mail)
>
> For my w3c mail, I invoke mutt with "w3cmutt", which is aliased to:
> zot "w3c mail"; localsuffix="-w3c" mutt
>
> (zot just changes the rxvt title bar; it's called zot because that's
> what it was called when I got it from a friend 10 years ago)
>
> Hmm... I guess you tried something like that before switching to
> cpp; I'm just wondering what it was you finally needed cpp for.

Indeed I tried something like that, but it got rapidly very complex.
On my laptop, depending on if I read my private mail or my work mail,
if I use isync to read my IMAP folders locally or if I read them
remotely, I have 4 aliases:

imutt='my_mutt -DWORK_CONF -DON_LAPTOP -DUSE_IMAP --'
imuttp='my_mutt -DON_LAPTOP -DUSE_IMAP --'
mutt='my_mutt -DWORK_CONF -DON_LAPTOP --'
muttp='my_mutt -DON_LAPTOP --'

and at work, I have:

mutt='my_mutt -DWORK_CONF --'
muttp='my_mutt --'

My configuration is fairly complex because I have lots of different
settings for each of them. Here is an example:

#ifdef WORK_CONF
#ifdef USE_IMAP
#define FOLDER "{localhost:1430}mail"
#else
#define FOLDER "~/mail"
#endif
#else
#ifdef USE_IMAP
#define FOLDER "{localhost:1430}private-mail"
#else
#define FOLDER "~/private-mail"
#endif
#endif

set folder=FOLDER

and another one:

#ifndef USE_IMAP
#ifndef ON_LAPTOP
# Hide the IMAP server messages
folder-hook . "push \"<limit> ! (~s 'DELETE THIS MESSAGE -- FOLDER INTERNAL DATA' ~f MAILER-DAEMON)\n\""
#endif
#endif

I used your technique for a long time, but it just became too complex
to manage so many configurations.

7. http://impressive.net/archives/fogo/20020403161017.GA26122@jibboom.w3.org
8. http://spamassassin.taint.org/tests.html
--
Hugo Haas <hugo@larve.net> - http://larve.net/people/hugo/
Kids, your mother's under a lot of pressure, why don't we let her clear
the table in peace? -- Homer J. Simpson

Re: Spam filters

from Hugo Haas <hugo@larve.net>, Tue, 23 Apr 2002 08:30:31 -0400

Replies:

None.

Parents:

* Hugo Haas <hugo@larve.net> [2002-04-15 18:13-0400]
> I have changed my spam filtering techniques taking into account the
> new type of spam. I talked to Max who started using SpamAssassin[2]
> and was happy about it. I had a look and found it cool. But I didn't
> want to abandon my whitelist filtering.
>
> I therefore am using 3 different folders:
> - emails identified as spam.
> - emails not identified as spam from people I know (who are on my
> whitelist).
> - emails not identified as spam from people I don't know.
>
> SpamAssassin works with a scoring system. I use my whitelist to
> decrease the score when somebody is on my whitelist. It is therefore
> easier to be considered as a spammer if the address in not on my
> whitelist.
>
> I have also enabled Vipul's Razor[3] for increasing my detection
> accuracy. When I detect spam which isn't registered in Razor, I do so.

I wanted to give an update an my spam filtering system. With the new
version of SpamAssassin (2.20) and Razor (1.20), my spam filter
catches about 98% of the spam (I lowered the threshold to 3.6 hits and
tweaked a couple of other rules). The 2% of spams that got through
went into my unknown sender folder.

The only non-spam email I saw it catch were bounces from mailing
lists.

In order to make sure that I improve my (and everybody else's) spam
filtering, I systematically bounce spam that went through to
spamassassin-sightings[4] (ESC-B in my Mutt session) and register all
confirmed spam with Razor[5] (ESC-R ; ESC-Z in my Mutt session). This
is easy enough that it just takes a few seconds every day or two.

Basically, I am *very* happy about this new system, and would
encourage people to use it: the more people use Razor and report spams
to it, the less spam we will see.

4. http://lists.sourceforge.net/lists/listinfo/spamassassin-sightings
5. http://razor.sourceforge.net/
--
Hugo Haas <hugo@larve.net> - http://larve.net/people/hugo/
Mais alors, tout se recoupe !