Re: Spam filters (was Re: mjd on heuristics, patents, stupid questions)

On Fri, Feb 18, 2000 at 08:55:58AM -0500, Hugo Haas wrote:
> On Thu, Feb 17, 2000, Gerald Oskoboiny wrote:
> [..]
> >     -- mjd, on http://www.perl.com/pub/2000/02/spamfilter.html
>
> Even if it was not the subject of Gerald's email, I read this article
> about spam filtering and I'd like to know how everybody is filtering
> spam.

I used something called rblcheck invoked from within my procmailrc
that looks up domains in various blackhole lists out there, and it
seemed to work pretty well (trapped a lot of spam), but occasionally
it would also trap some valid mail (like the confirmation messages
for my recent speaker purchase.) So I turned it off a while ago.

Here's the stuff I was using in my procmailrc:

###########################################################################
# check RBL for blackholed IPs
# see http://www.procmail.org/jari/pm-tips-body.html#software_rbl_lookup_tool__c
:0
* ^Received: from.*\[\/[0-9.]+\].*by omicron\.pair\.com
{
   IP = $MATCH
   # trim it down to just the IP address
   :0
   * IP ?? ^^\/[0-9.]+
   {
       IP = $MATCH
       :0 W:
       * ! ? /usr/local/bin/rblcheck -q $IP
       | formail -A"RBL-Check-Info: `echo; /usr/local/bin/rblcheck -t $IP | sed 's/^/ /'`" >> $MAILDIR/lists/rbl-filtered
   }
}

So now I just delete spam from my inbox as it arrives, and try not
to get annoyed by it.

I think if I ever try to deal with it again, I'll handle it using
a whitelist (as opposed to a blacklist), with a list of people or
domains I expect to receive mail from, and filter everything else
into a mailbox that I scan once a week or so.

--
Gerald Oskoboiny <[email protected]>
http://impressive.net/people/gerald/

Re: Spam filters

On Sun, Feb 20, 2000 at 11:38:10PM -0500, Gerald Oskoboiny wrote:
:
> So now I just delete spam from my inbox as it arrives, and try not
> to get annoyed by it.
>
> I think if I ever try to deal with it again, I'll handle it using
> a whitelist (as opposed to a blacklist), with a list of people or
> domains I expect to receive mail from, and filter everything else
> into a mailbox that I scan once a week or so.

I've been getting a ton of spam lately (~36 messages per day this
December, out of a total of 384 messages per day), so I implemented
this whitelist-based filtering. Notes/code:

   http://impressive.net/people/gerald/2000/12/spam-filtering.html

Woohoo, no more spam in my inbox, ever!

--
Gerald Oskoboiny <[email protected]>
http://impressive.net/people/gerald/

Re: Spam filters

Replies:

Parents:

On Mon, Dec 18, 2000, Gerald Oskoboiny wrote:
> I've been getting a ton of spam lately (~36 messages per day this
> December, out of a total of 384 messages per day), so I implemented
> this whitelist-based filtering.
[..]

Did you have a look at lbdb[1]? It basically does this and more, such as
queries inside Mutt.

 1. http://www.spinnaker.de/lbdb/

--
Hugo Haas <[email protected]> - http://larve.net/people/hugo/
La vraie paresse, c'est de se lever � 6 heures du matin pour avoir plus
longtemps � ne rien faire. -- Tristan Bernard

Re: Spam filters

Replies:

Parents:

On Mon, Dec 18, 2000 at 08:03:26AM -0500, Hugo Haas wrote:
> On Mon, Dec 18, 2000, Gerald Oskoboiny wrote:
> > I've been getting a ton of spam lately (~36 messages per day this
> > December, out of a total of 384 messages per day), so I implemented
> > this whitelist-based filtering.
> [..]
>
> Did you have a look at lbdb[1]? It basically does this and more, such as
> queries inside Mutt.
>
>   1. http://www.spinnaker.de/lbdb/

No, my implementation is just simple standard formail/grep stuff.

That lbdb web page doesn't seem to say anything about what it
does or why you would want to use it!?

Anyway, I'm pretty sure I wouldn't want to use it for this.
(why would I want to do queries inside Mutt?)

--
Gerald Oskoboiny <[email protected]>
http://impressive.net/people/gerald/

Re: Spam filters

Replies:

  • None.

Parents:

On Mon, Dec 18, 2000, Gerald Oskoboiny wrote:
> That lbdb web page doesn't seem to say anything about what it
> does or why you would want to use it!?

That is true that the Web page is not very explicit, but it is a set of
tools to:
- build a database of known email addresses (with lbdb-fetchaddr[1]).
- be able to access various lists of email addresses (from this
 particular database, a PGP keyring, Mutt aliases, Pine's addressbook,
 etc).

lbdb-fetchaddr is basically your atw, and m_inmail is your grep I think,
whose output format is compatible with Mutt.

> Anyway, I'm pretty sure I wouldn't want to use it for this.
> (why would I want to do queries inside Mutt?)

If you want to send an email to somebody that you know but whose email
address is not in your Mutt aliases file, you can query the database you
built with lbdb-fetchaddr within Mutt. It is faster than digging into
your mail archives to find the exact email address.

I have been wanting to use that for quite a while now but haven't got
around to doing it yet.

 1. http://www.spinnaker.de/lbdb/lbdb-fetchaddr.html

--
Hugo Haas <[email protected]> - http://larve.net/people/hugo/
I would kill everyone in this room for a drop of sweet beer. -- Homer
J. Simpson

lbdb and Mutt (was Re: Spam filters)

Replies:

  • None.

Parents:

On Mon, Dec 18, 2000, Gerald Oskoboiny wrote:
> That lbdb web page doesn't seem to say anything about what it
> does or why you would want to use it!?

The Little Brother's Database homepage[1] says it does more or less
what the Insidious Big Brother Database[2] does, i.e. builds a
collection of email addresses that you can then query.

> Anyway, I'm pretty sure I wouldn't want to use it for this.
> (why would I want to do queries inside Mutt?)

I have run all the email I received through lbdb yesterday and added a
procmail rule. Now I have a list of all the people who I received
emails from.

Suppose that I want to send an email to you, that I know that you are
Gerald something but I don't know your exact address, I can do a
query[3] inside Mutt and I get a list of all the Gerald's who ever
sent me email, and I pick your email address from there. No screwing
around with Mutt aliases anymore.

Moreover, I have imported an LDIF address book into abook[4], and I can
look things up in there with an lbdb query too.

My procmailrc now includes:

:0hc
| $HOME/lbdb/bin/lbdb-fetchaddr

and my muttrc now specifies:

set query_command="$HOME/lbdb/bin/lbdbq '%s'"

If I press 'Q' in the index or '^T' in an address field, I can run a
query. And I like it.

How does this relate to whitelist spam filtering? From this list of
emails that you got, you can get rid of the dupicates (not done for
efficiency reasons) and then have the same list you have. Of course,
you would have to run lbdb-fetchaddr manually and not from your
procmailrc, but I was looking for the query feature, not the filtering
one.

 1. http://www.spinnaker.de/lbdb/
 2. http://www.jwz.org/bbdb/
 3. http://www.mutt.org/doc/manual/manual-4.html#ss4.5
 4. http://abook.sourceforge.net/

--
Hugo Haas <[email protected]> - http://larve.net/people/hugo/
A vaincre sans p�ril, on �vite les ennuis.

Re: Spam filters


--cNdxnHkX5QqsyA0e
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Mon, Dec 18, 2000, Gerald Oskoboiny wrote:
> I've been getting a ton of spam lately (~36 messages per day this
> December, out of a total of 384 messages per day), so I implemented
> this whitelist-based filtering. Notes/code:
>
>     http://impressive.net/people/gerald/2000/12/spam-filtering.html

I have finally switched from Junkfilter[1] to whitelist based
filtering. I read Gerald's page, and I made a few changes to Gerald's
implementation.

1/ Add to whitelist (atw) script

Gerald's version is unsafe: it doesn't lock the whitelist before
writing to it, it executes a program that it creates in /tmp.

I have rewritten it and it is attached to this email.

I have added two new features: it is possible to add a list of email
addresses and to import email addresses from Mutt aliases:
- 'atw': process RFC822 message from stdin.
- 'atw -a': takes a list of email addresses as argument.
- 'atw -M': process Mutt aliases from stdin.

2/ Procmail rule

It has two few minor mistakes:
- it does not lock the folder.
- it uses regular expression matching, which has two problems:
 + it is case sensitive.
 + a subset of a known email address could match.
 + there are special characters (at least '.') in email addresses.

Here is the rule I use:

   # White-base filtering
   WHITELIST_DIR=$HOME/whitelist
   WHITELIST=$WHITELIST_DIR/whitelist
   OTHERS=$WHITELIST_DIR/others
   ffield=`formail -XFrom: | formail -r -xTo: | tr -d ' '`
   :0:
   * ! ? grep -F -i -x -q "$ffield" $WHITELIST $OTHERS
     INBOX-unknown

But I think that it is going to be really cool. Thanks Gerald for
helping me switch.

 1. http://www.pobox.com/~gsutter/junkfilter/

--
Hugo Haas <[email protected]> - http://larve.net/people/hugo/
- Surely you can't put a price on your family's lives! - I didn't think
so either, but here we are. -- Homer J. Simpson

--cNdxnHkX5QqsyA0e
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=atw

#! /bin/sh
# atw: Add To Whitelist
# Modification of Gerald Oskoboiny's original atw
# See:
# http://impressive.net/people/gerald/2000/12/spam-filtering.html
# (c) 2001 Hugo Haas - Public domain

PATH=/bin:/usr/bin
WHITELIST_DIR=$HOME/whitelist
WHITELIST=$WHITELIST_DIR/whitelist
OTHERS=$WHITELIST_DIR/others
LOCKFILE=$WHITELIST_DIR/whitelist.lock

umask 077

# Get a lock
lock() {
 lockfile $LOCKFILE
 # Ensure that the lock will be removed when we are done
 trap "rm -f $LOCKFILE" 0 2 3 15
 [ -f $WHITELIST ] || touch $WHITELIST
}

# Add an email address to the whitelist
add_address() {
 echo -n "Checking for $1... "
 grep -F -i -x -q "$1" $WHITELIST $OTHERS
 if [ $? = 1 ]
 then
   echo "$1" >> $WHITELIST
   echo "added."
 else
   echo "already listed."
 fi
}

# If argument -a is given, add the list of email addresses given as arguments.
# If argument -m is given, add a single RFC822 message (from stdin).
# If argument -M is given, import a list of Mutt aliases (from stdin).
# Else read a list of RFC822 messages from stdin and add the From line to
# the white list.

if [ "$1" = '-a' ]
then
 lock
 shift
 while [ $# != 0 ]
 do
   add_address $1
   shift
 done
 exit
elif [ "$1" = '-m' ]
then
 lock
 add_address `formail -XFrom: | formail -r -xTo: | tr -d " "`
 exit
elif [ "$1" = '-M' ]
then
 perl -n -e 'next if (! m/^\w*alias/); chomp; $_ =~ m/[^ ]+@[^ ]+/; $_ = $&; s/^[<]//g; s/[>]$//; print $_."\n";' | exec xargs $0 -a
fi

exec formail -s $0 -m

--cNdxnHkX5QqsyA0e--

Re: Spam filters

Replies:

  • None.

Parents:


--HcAYCG3uE/tztfnV
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Something else that I noted is that Gerald says in his documentation[1]:

  Then:
   touch .whitelist
   atw < $MAIL
   atw < mail/w3c/inbox
   # (repeat with any other non-list mailboxes you have that don't have spam)

A good idea to get a list of email addresses which work is to extract
the recipients of emails that you sent out.

I have added a '-t' option to atw (attached) which scans the To and Cc
fields instead of the From field:

atw -t < ootbox

 1. http://impressive.net/people/gerald/2000/12/spam-filtering.html

--
Hugo Haas <[email protected]> - http://larve.net/people/hugo/
What kind of side dishes will we be enjoying this evening with our
frozen waffles?

--HcAYCG3uE/tztfnV
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=atw

#! /bin/sh
# atw: Add To Whitelist
# Modification of Gerald Oskoboiny's original atw
# See:
# http://impressive.net/people/gerald/2000/12/spam-filtering.html
# (c) 2001 Hugo Haas - Public domain

PATH=/bin:/usr/bin
WHITELIST_DIR=$HOME/whitelist
WHITELIST=$WHITELIST_DIR/whitelist
OTHERS=$WHITELIST_DIR/others
LOCKFILE=$WHITELIST_DIR/whitelist.lock

umask 077

# Get a lock
lock() {
 lockfile $LOCKFILE
 # Ensure that the lock will be removed when we are done
 trap "rm -f $LOCKFILE" 0 2 3 15
 [ -f $WHITELIST ] || touch $WHITELIST
}

# Add an email address to the whitelist
add_address() {
 echo -n "Checking for $1... "
 grep -F -i -x -q "$1" $WHITELIST $OTHERS
 if [ $? = 1 ]
 then
   echo "$1" >> $WHITELIST
   echo "added."
 else
   echo "already listed."
 fi
}

# Add a list of adresses
add_addresses() {
 for email in $*
 do
   add_address $email
 done
 exit
}

# If -t is given as a first argument, scan the To and Cc fields instead of
# the From line.
if [ "$1" = '-t' ]
then
 to='-t'
 shift
fi

# If argument -a is given, add the list of email addresses given as arguments.
# If argument -m is given, add a single RFC822 message (from stdin).
# If argument -M is given, import a list of Mutt aliases (from stdin).
# Else read a list of RFC822 messages from stdin.

if [ "$1" = '-a' ]
then
 lock
 shift
 add_addresses $*
elif [ "$1" = '-m' ]
then
 lock
 if [ "$to" = '-t' ]
 then
   addresses=`formail -x To -x Cc | perl -pn -e 's/,/\n/g' | perl -n -e 'chomp; s/\".*?\"//g; m/[^ ]+@[^ ]+/; $_ = $&; s/^[<]//g; s/[>]$//; print $_."\n";'`
 else
   addresses=`formail -XFrom: | formail -r -xTo: | tr -d " "`
 fi
 add_addresses $addresses
elif [ "$1" = '-M' ]
then
 addresses=`perl -n -e 'next if (! m/^\w*alias\w/); chomp; s/\".*?\"//g; m/[^ ]+@[^ ]+/; $_ = $&; s/^[<]//g; s/[>]$//; print $_."\n";'`
 add_addresses $addresses
fi

exec formail -s $0 $to -m

--HcAYCG3uE/tztfnV--

Re: Spam filters

* Hugo Haas <[email protected]> [2001-01-27 19:49-0500]
> I have finally switched from Junkfilter[1] to whitelist based
> filtering.
[..]

* Hugo Haas <[email protected]> [2002-04-03 11:10-0500]
[..]
> It seems that there is no immediate nor easy technological answer, and
> no easy legal action either.

I have changed my spam filtering techniques taking into account the
new type of spam. I talked to Max who started using SpamAssassin[2]
and was happy about it. I had a look and found it cool. But I didn't
want to abandon my whitelist filtering.

I therefore am using 3 different folders:
- emails identified as spam.
- emails not identified as spam from people I know (who are on my
 whitelist).
- emails not identified as spam from people I don't know.

SpamAssassin works with a scoring system. I use my whitelist to
decrease the score when somebody is on my whitelist. It is therefore
easier to be considered as a spammer if the address in not on my
whitelist.

I have also enabled Vipul's Razor[3] for increasing my detection
accuracy. When I detect spam which isn't registered in Razor, I do so.

Here is what it looks like:

-*- Promailrc
=============

Whitelist detection:

 # White-base filtering
 WHITELIST_DIR=$HOME/whitelist
 WHITELIST=$WHITELIST_DIR/whitelist
 ffield=`formail -XFrom: | formail -r -xTo: | tr -d ' '`
 :0fhw
 * ? grep -F -i -x -q "$ffield" $WHITELIST
 | formail -i "X-HH-Whitelist: YES"

 :0Efhw
 | formail -i "X-HH-Whitelist: NO"

Spam filtering:

 SPAMASSASSINDIR=$HOME/spam/spamassassin

 :0fw
 | $SPAMASSASSINDIR/spamassassin -c $SPAMASSASSINDIR/rules -P

 :0:
 * ^X-Spam-Flag: YES
   spam

If something hasn't been classified as spam, see if I know the guy:

 INCLUDERC=$HOME/.procmail/spamfiltering

 :0:
 * ^X-HH-Whitelist: NO
   unknown

-*- SpamAssassin
================

Here is how I use my whitelist:

 # Whitelist filtering
 header          ON_WHITELIST    X-HH-Whitelist  =~      /^YES$/
 describe        ON_WHITELIST    Sender whitelisted
 score           ON_WHITELIST    -5.0

I have a few other non-related settings:

 # Don't rewrite the subject
 rewrite_subject 0

 # Leave the content-type alone
 defang_mime 0

 # Report in the header
 report_header 1
 use_terse_report 1

-*- Muttrc
==========

A few things that I configured to make my life easier:

 # Spam stuff
 # Show spam headers
 unignore X-Spam-Status X-Spam-Report
 # Highlight spam
 #ifndef USE_IMAP
 color index     red     default "~h '^X-Spam-Flag: YES'"
 color index     red     blue "~h '^X-Spam-Flag: YES' ! ~h '^X-Spam-Status: .*RAZOR_CHECK'"
 #endif
 # How to report spam
 #define REPORT_BULK_SPAM ";|formail -s spamassassin -r -D\n"
 macro index \eR "T! ~s '\^[[]Moderator Action[]] ' ~h '\^X-Spam-Flag: YES' ! ~h '\^X-Spam-Status: .*RAZOR_CHECK'\n"
 macro index \eS REPORT_BULK_SPAM
 macro pager \eS REPORT_BULK_SPAM

Note that there are spp commands because I preprocess my muttrc[4].

I am going to test that extensively and tweak it if necessary.

 2. http://spamassassin.taint.org/
 3. http://razor.sf.net/
 4. http://larve.net/people/hugo/2002/04/mutt-cpp
--
Hugo Haas <[email protected]> - http://larve.net/people/hugo/
Perhaps, but let's not get bogged down in semantics. -- Homer J.
Simpson

Re: Spam filters

On Mon, Apr 15, 2002 at 06:13:01PM -0400, Hugo Haas wrote:
> * Hugo Haas <[email protected]> [2001-01-27 19:49-0500]
> > I have finally switched from Junkfilter[1] to whitelist based
> > filtering.
> [..]

> I have changed my spam filtering techniques taking into account the
> new type of spam. I talked to Max who started using SpamAssassin[2]
> and was happy about it. I had a look and found it cool. But I didn't
> want to abandon my whitelist filtering.

SpamAssassin looks excellent from what I have seen. I understand
it has some kind of automatic whitelist feature: every time you
receive non-spam from someone, their whitelist score increases?
(or something like that)

Thanks for the docs on your setup, I'm sure that will be useful.

> Note that there are [cpp] commands because I preprocess my muttrc[4].

>   2. http://spamassassin.taint.org/
>   4. http://larve.net/people/hugo/2002/04/mutt-cpp

I'm curious why you need to use cpp; I have most of my settings
in my .muttrc [5], and use a couple extra files [6] for other stuff
that is specific to a certain environment (personal or w3c mail)

For my w3c mail, I invoke mutt with "w3cmutt", which is aliased to:
   zot "w3c mail"; localsuffix="-w3c" mutt

(zot just changes the rxvt title bar; it's called zot because that's
what it was called when I got it from a friend 10 years ago)

Hmm... I guess you tried something like that before switching to
cpp; I'm just wondering what it was you finally needed cpp for.

[5] http://impressive.net/people/gerald/misc/dotfiles/muttrc
[6] http://impressive.net/people/gerald/misc/dotfiles/muttrc-local-devo
   http://impressive.net/people/gerald/misc/dotfiles/muttrc-local-w3c

(I don't need different configs for local/remote, since I always
store all my mail on my laptop.)

--
Gerald Oskoboiny <[email protected]>
http://impressive.net/people/gerald/

Re: Spam filters

Replies:

  • None.

Parents:

* Gerald Oskoboiny <[email protected]> [2002-04-15 23:07-0400]
> SpamAssassin looks excellent from what I have seen. I understand
> it has some kind of automatic whitelist feature: every time you
> receive non-spam from someone, their whitelist score increases?
> (or something like that)
[..]

Yes, there is an auto-whitelist feature. I haven't tried it yet. I
wasn't sure about how to leverage my existing whitelist to bootstrap
it, so I preferred to try and integrate my whitelist another way, and
maybe I will play with the auto-whitelist later.

I was somewhat worried that the whitelist would let everything
through. By default, you need 5 points to be declared as spam. An
email from the 'EMail-IT' True Stealth System that I was complaining
about[7] scores as follows:

 X-Spam-Report:   13.4 hits, 5 required;
   * -0.3 -- Cc: contains similar domains at least 10 times
   *  1.7 -- BODY: Includes a link to send a mail with a subject
   * -0.2 -- BODY: Includes a URL link to send an email
   *  3.5 -- BODY: Link to a URL containing "remove"
   *  3.0 -- Listed in Razor, see http://razor.sourceforge.net/
   *  4.5 -- HTML-only mail, with no text version
   *  0.2 -- From and To the same address
   *  1.0 -- Received via a relay in orbs.dorkslayers.com
     [RBL check: found 150.82.130.139.orbs.dorkslayers.com.]

With my whitelist 5 point-bonus, it scores 8.4 and is still recognized
as spam. From what I have seen, their whitelist had a 100 point-bonus,
which seems for too much[8]:

header From: address is in the user's white-list USER_IN_WHITELIST -100.0

There must be something I haven't understood about it yet.

> > Note that there are [cpp] commands because I preprocess my muttrc[4].
>
> >   2. http://spamassassin.taint.org/
> >   4. http://larve.net/people/hugo/2002/04/mutt-cpp
>
> I'm curious why you need to use cpp; I have most of my settings
> in my .muttrc [5], and use a couple extra files [6] for other stuff
> that is specific to a certain environment (personal or w3c mail)
>
> For my w3c mail, I invoke mutt with "w3cmutt", which is aliased to:
>     zot "w3c mail"; localsuffix="-w3c" mutt
>
> (zot just changes the rxvt title bar; it's called zot because that's
> what it was called when I got it from a friend 10 years ago)
>
> Hmm... I guess you tried something like that before switching to
> cpp; I'm just wondering what it was you finally needed cpp for.

Indeed I tried something like that, but it got rapidly very complex.
On my laptop, depending on if I read my private mail or my work mail,
if I use isync to read my IMAP folders locally or if I read them
remotely, I have 4 aliases:

 imutt='my_mutt -DWORK_CONF -DON_LAPTOP -DUSE_IMAP --'
 imuttp='my_mutt -DON_LAPTOP -DUSE_IMAP --'
 mutt='my_mutt -DWORK_CONF -DON_LAPTOP --'
 muttp='my_mutt -DON_LAPTOP --'

and at work, I have:

 mutt='my_mutt -DWORK_CONF --'
 muttp='my_mutt --'

My configuration is fairly complex because I have lots of different
settings for each of them. Here is an example:

#ifdef WORK_CONF
  #ifdef USE_IMAP
    #define FOLDER "{localhost:1430}mail"
  #else
    #define FOLDER "~/mail"
  #endif
#else
  #ifdef USE_IMAP
    #define FOLDER "{localhost:1430}private-mail"
  #else
    #define FOLDER "~/private-mail"
  #endif
#endif

set folder=FOLDER

and another one:

#ifndef USE_IMAP
  #ifndef ON_LAPTOP
    # Hide the IMAP server messages
    folder-hook . "push \"<limit> ! (~s 'DELETE THIS MESSAGE -- FOLDER INTERNAL DATA' ~f MAILER-DAEMON)\n\""
  #endif
#endif

I used your technique for a long time, but it just became too complex
to manage so many configurations.

 7. http://impressive.net/archives/fogo/[email protected]
 8. http://spamassassin.taint.org/tests.html
--
Hugo Haas <[email protected]> - http://larve.net/people/hugo/
Kids, your mother's under a lot of pressure, why don't we let her clear
the table in peace? -- Homer J. Simpson

Re: Spam filters

Replies:

  • None.

Parents:

* Hugo Haas <[email protected]> [2002-04-15 18:13-0400]
> I have changed my spam filtering techniques taking into account the
> new type of spam. I talked to Max who started using SpamAssassin[2]
> and was happy about it. I had a look and found it cool. But I didn't
> want to abandon my whitelist filtering.
>
> I therefore am using 3 different folders:
> - emails identified as spam.
> - emails not identified as spam from people I know (who are on my
>   whitelist).
> - emails not identified as spam from people I don't know.
>
> SpamAssassin works with a scoring system. I use my whitelist to
> decrease the score when somebody is on my whitelist. It is therefore
> easier to be considered as a spammer if the address in not on my
> whitelist.
>
> I have also enabled Vipul's Razor[3] for increasing my detection
> accuracy. When I detect spam which isn't registered in Razor, I do so.

I wanted to give an update an my spam filtering system. With the new
version of SpamAssassin (2.20) and Razor (1.20), my spam filter
catches about 98% of the spam (I lowered the threshold to 3.6 hits and
tweaked a couple of other rules). The 2% of spams that got through
went into my unknown sender folder.

The only non-spam email I saw it catch were bounces from mailing
lists.

In order to make sure that I improve my (and everybody else's) spam
filtering, I systematically bounce spam that went through to
spamassassin-sightings[4] (ESC-B in my Mutt session) and register all
confirmed spam with Razor[5] (ESC-R ; ESC-Z in my Mutt session). This
is easy enough that it just takes a few seconds every day or two.

Basically, I am *very* happy about this new system, and would
encourage people to use it: the more people use Razor and report spams
to it, the less spam we will see.

 4. http://lists.sourceforge.net/lists/listinfo/spamassassin-sightings
 5. http://razor.sourceforge.net/
--
Hugo Haas <[email protected]> - http://larve.net/people/hugo/
Mais alors, tout se recoupe !

Re: Spam filters

Replies:

Parents:

On Mon, Dec 18, 2000 at 06:54:51AM -0500, Gerald Oskoboiny wrote:

> I've been getting a ton of spam lately (~36 messages per day this
> December, out of a total of 384 messages per day), so I implemented
> this whitelist-based filtering. Notes/code:
>
>     http://impressive.net/people/gerald/2000/12/spam-filtering.html
>
> Woohoo, no more spam in my inbox, ever!

I just noticed that almost all of the spam that gets trapped by
this filter has From: lines like this:

   From: [email protected]

rather than the usual formats:

   From: Gerald Oskoboiny <[email protected]>
   From: [email protected] (Gerald Oskoboiny)

i.e. they are missing full names. (this is the case for 90 out of
the 103 new messages in my current unknown-sender mailbox.)

So I modified my whitelist filter to only capture mail that does not
contain a space anywhere after '^From: ' in the header. (so from now
on, almost all the mail that gets trapped by this filter will be spam,
and I can check it even less often than before.)

The new line I added to my .procmailrc is:

   * ^From: .*\

(the trailing space is important)

Sample of recent crud that got trapped by my whitelist filter:

136   + Mar 17 Amazon.com      (9.6K) Your Source for Home Office Accessories  
137 N   Mar 17 carlos2576@500a (1.8K) INCREASE YOUR Sales!
138 N   Mar 17 [email protected] (4.2K) Watch Cable Tv!
139 N   Mar 18 globe002@talk21 (1.1K) Make Money NOW!
140 N   Mar 17 [email protected] (2.7K) Kiss Your Job Goodbye!!!!!!!
141 N   Mar 17 frank_zane@msn. (4.7K) Set Your Own Work Schedule!!!!!!!
142 N   Mar 18 [email protected] (5.2K) HUMAN GROWTH HORMONE - Reduce Body Fat
143 N + Mar 18 hm_renaolds@yah ( 14K) It's Your Turn !
144 N   Mar 18 seoinfo2@earthl (1.2K) Is Your Site Lost in Cyberspace?
145 N   Mar 18 [email protected] (2.7K) Set Your Own Work Schedule!!!!!!!
146 N   Mar 17 21770520@excite (3.9K) U.S. NEWS AND WORLD REPORT "A GOLD RUSH
147 N   Mar 18 15067040@excite (3.9K) U.S. NEWS AND WORLD REPORT "A GOLD RUSH
148 N   Mar 18 07333353@hotmai (3.1K) Exciting Business Opportunity not mlm $$
149 N   Mar 18 bk22bk89@yahoo. (1.5K) At Your Service
150 N   Mar 18 juzyrvreyy@hotb (0.5K) FREE Mortgage Rate Quote!
151 N   Mar 11 [email protected]     (6.6K) How To Advertise To 16-Million People
152 N   Mar 18 [email protected] (5.2K) HUMAN GROWTH HORMONE - Reduce Body Fat

(note that almost none of them have real names, just email addresses)

--
Gerald Oskoboiny <[email protected]>
http://impressive.net/people/gerald/

Re: Spam filters

Replies:

  • None.

Parents:

On Tue, Mar 20, 2001 at 02:58:41AM -0500, Gerald Oskoboiny wrote:
:
> So I modified my whitelist filter to only capture mail that does not
> contain a space anywhere after '^From: ' in the header. (so from now
> on, almost all the mail that gets trapped by this filter will be spam,
> and I can check it even less often than before.)
>
> The new line I added to my .procmailrc is:
>
>     * ^From: .*\
>
> (the trailing space is important)

Oops... no, that doesn't work, because procmail strips leading
and trailing whitespace. (I thought I tested it and it worked,
but I guess not.)

So I changed that to:

   SPACE=" "
   :0
   *$ ^From: .*$SPACE

etc.

--
Gerald Oskoboiny <[email protected]>
http://impressive.net/people/gerald/

HURL: fogo mailing list archives, maintained by Gerald Oskoboiny