mjd on heuristics, patents, stupid questions

Mark-Jason Dominus writes a lot of great stuff.

I really like this line:

   "Of course, this is a heuristic, which is a fancy way of
   saying that it doesn't work."
   -- mjd, on http://www.perl.com/pub/2000/02/spamfilter.html

Lots more interesting/entertaining stuff is linked from his home page:

   http://www.plover.com/~mjd/

some of my favorites are:

   Why Questions go Unanswered
   http://www.plover.com/~mjd/perl/Questions.html

   The Cardinal Rule of Reporting Technical Problems
   http://www.plover.com/~mjd/perl/Questions4.html

and here's a great one on stupid patents and boycotting amazon.com:

   Why I am Boycotting Amazon
   http://www.plover.com/~mjd/amazon.html

included below:

> Why I am Boycotting Amazon
>
> Amazon recently sued Barnes and Noble for patent infringement.
>
> The patent in question is for an utterly trivial invention called
> `one-click ordering'. `One-click ordering' means that the first
> time you order, they remember your address and credit card number
> in a database, and key the database by a browser cookie. Then if
> you come back and order again, you don't have to fill out another
> form; they retrieve the information from the database.
>
> Patents are the result of an exchange between the community and
> the inventor. The community gives the inventor an exclusive
> license to an invention in return for the inventor divulging the
> secret. But in this case there is no secret at all; it is totally
> obvious to anyone who is even a little bit skilled at web
> programming. The patent office is not normally supposed to grant
> patents for inventions that would be obvious to skilled
> practitioners of the relevant art. I am skilled in the relevant
> art and to me this invention is really, really, obvious. It is
> obviously no invention at all. The patent office screwed up here,
> and Amazon received a valuable license for free at the public
> expense.
>
> Just because the patent office screwed up does not give Amazon
> leave to take advantage of the mistake in an unethical way. If
> your neighbor leaves their door unlocked, you are not entitled to
> go into their house and take their belongings.
>
> Why the Lawsuit is Unethical
>
> Amazon's lawsuit is unethical because it is bad for everyone but
> Amazon. You are not supposed to be able to enrich yourself to the
> detriment of the general public. If Amazon can sue Barnes and
> Noble for offering `one-click ordering', they can sue anyone.
> That means that nobody but Amazon is allowed to have this
> convenient and simple feature on their web site. Every web site
> in the United States is required to operate in a suboptimal way
> because of Amazon's actions. That hurts web site designers,
> programmers, and web customers. Amazon was probably only
> interested in sabotaging their competitor, Barnes and Noble, but
> to do it they did not balk at sabotaging everyone else too.
>
> How the Lawsuit Hurts Me
>
> I am an independent programmer. I make a living by programming
> for my clients, including web ordering systems. Now if one of my
> clients asks me for a `one-click' ordering system, instead of
> saying that I know how to do that and it will be easy, I will
> have to warn them that a `one-click' ordering system may lay them
> open to a big patent infringement lawsuit from Amazon, and
> probably they will not be willing to take the risk. Damages for
> patent infringement suits can be very large. So much the worse
> for them, their web site, their customers, and for me.
>
> Software patents threaten my livelihood. Every program I write
> becomes a ticking time bomb because every program is full of
> obvious techniques that have nevertheless been patented. Every
> time I write a program I am laying myself open to suits for the
> most trivial features, such as the use of exclusive-or to draw
> `rubber bands' in a window system. Big companies may be able to
> afford to defend against these suits; I can't. I might have to go
> out of business instead.
>
> Why Boycott?
>
> Amazon's suit is a direct threat to me and my customers. It is
> against my best interest to give money to a company that is
> acting directly to put me out of business. I will not do business
> with Amazon until they abandon their offensive patent lawsuit.
>
> I urge you to do the same. If we mount a strong boycott, Amazon
> may eventually end their harmful suit, and other companies with
> absurd software patents may decide not to enforce them for fear
> of angering their customers. Boycotting may also draw attention
> to the root issues and yield reform to the broken patent system
> that abetted Amazon in the first place.
>
> How to Join
>
> Simply buy your books from someone else. There are many
> booksellers online. Also be sure to write to amazon at
> [email protected] to tell them what you are doing, and why. I
> send a reminder letter to Amazon every time I buy a book from
> someone else.
>
> Send a copy of your message to [email protected] to let them know
> what you are doing.
>
> The Boycott So Far
>
> As of 20 January 2000, I have bought $581.96 of books from other
> companies instead of from Amazon.
>
> Note: Dr. John Keating points out that if you contact a publisher
> directly, and tell them that Amazon is offering a discounted
> price on one of their books, they will often sell you the same
> book for the same price.
>
> For More Information
>
> The Free Software Foundation is leading a boycott of Amazon. I am
> in support of the FSF's policies on this matter.
>
> Mail me at [email protected] if you
> have questions or remarks.
>
> Copyright (C) 1999 Mark-Jason Dominus.
>
> Verbatim copying and distribution of this entire article is
> permitted in any medium, provided this notice is preserved.


--
Gerald Oskoboiny <[email protected]>
http://impressive.net/people/gerald/

Spam filters (was Re: mjd on heuristics, patents, stupid questions)

On Thu, Feb 17, 2000, Gerald Oskoboiny wrote:
[..]
>     -- mjd, on http://www.perl.com/pub/2000/02/spamfilter.html

Even if it was not the subject of Gerald's email, I read this article
about spam filtering and I'd like to know how everybody is filtering
spam.

I use a filter which used to be very efficient: junkfilter[1]. It has a
list of domains to block, patterns in the email (such as 'to be removed
from our list', fields to enter credit cards details, etc), and
sophisticated headers analysis.

However, it's not developed anymore: the latest version is dated
19990331, the development mailing list is dead. And it seems that
spammers have new techniques and now one third of the spam I receive
gets through.

Is there a kick-ass procmail filter out there?

I found (but didn't try) The Spam Bouncer[2] which seems to be ok.
However, this one has an auto-complain option, and I hate those
automatic spam reports: most of them are broken and we receive several
emails a week at w3.org because a spam was containing the URI of an XML
namespace at w3.org...

Life would be much better without spammers...

 1. http://www.pobox.com/~gsutter/junkfilter/
 2. http://www.hrweb.org/spambouncer/

--
Hugo Haas <[email protected]> - http://www.larve.net/people/hugo
Je crois ce que je vois, je vois ce que je regarde et je regarde ce que
je veux.

Re: Spam filters (was Re: mjd on heuristics, patents, stupid questions)

Replies:

Parents:

On Fri, 18 Feb 2000, Hugo Haas wrote:

> On Thu, Feb 17, 2000, Gerald Oskoboiny wrote:
> [..]
> >     -- mjd, on http://www.perl.com/pub/2000/02/spamfilter.html
>
> Even if it was not the subject of Gerald's email, I read this article
> about spam filtering and I'd like to know how everybody is filtering
> spam.

Try this:
http://pharos.inria.fr/Applications/search?term=s1:128
Pharos is a community directory, kinda cool.

> Je crois ce que je vois, je vois ce que je regarde et je regarde ce que
> je veux.

Zyva qu'il veux impressionner la taspeche!
(Sorry guys, you have to learn more french to understand this one ;) )

     /\        Sometimes I think the surest sign that intelligent life
 /\ /  \       exists elsewhere in the universe is that none of it has
/  \    \/\    tried to contact us.                 -- Calvin & Hobbes
/    \   /  \                

Re: Spam filters (was Re: mjd on heuristics, patents, stupid questions)

Replies:

  • None.

Parents:

On Fri, Feb 18, 2000, [email protected] wrote:
> Try this:
> http://pharos.inria.fr/Applications/search?term=s1:128
> Pharos is a community directory, kinda cool.

Yeah, interesting site.

> > Je crois ce que je vois, je vois ce que je regarde et je regarde ce que
> > je veux.
>
> Zyva qu'il veux impressionner la taspeche!
> (Sorry guys, you have to learn more french to understand this one ;) )

Well, here's the translation for those who don't understand French:

I believe what I see, I see what I look at, and I look at what I
want.

I did not write that. :-)

Unfortunately, I don't remember who did. I think that it was Blaise
Pascal, but I'm not sure and I couldn't find the information on the web,
although I spent more than an hour looking at sites with citations.

Any confirmation is welcome. Maybe I should add "Blaise Pascal (?)" at
the end and somebody will eventually flame me or tell me that I'm right.

--
Hugo Haas <[email protected]> - http://www.larve.net/people/hugo
Alright Brain... It's all up to you. -- Homer J. Simpson

Re: Spam filters (was Re: mjd on heuristics, patents, stupid questions)

On Fri, Feb 18, 2000 at 08:55:58AM -0500, Hugo Haas wrote:
> On Thu, Feb 17, 2000, Gerald Oskoboiny wrote:
> [..]
> >     -- mjd, on http://www.perl.com/pub/2000/02/spamfilter.html
>
> Even if it was not the subject of Gerald's email, I read this article
> about spam filtering and I'd like to know how everybody is filtering
> spam.

I used something called rblcheck invoked from within my procmailrc
that looks up domains in various blackhole lists out there, and it
seemed to work pretty well (trapped a lot of spam), but occasionally
it would also trap some valid mail (like the confirmation messages
for my recent speaker purchase.) So I turned it off a while ago.

Here's the stuff I was using in my procmailrc:

###########################################################################
# check RBL for blackholed IPs
# see http://www.procmail.org/jari/pm-tips-body.html#software_rbl_lookup_tool__c
:0
* ^Received: from.*\[\/[0-9.]+\].*by omicron\.pair\.com
{
   IP = $MATCH
   # trim it down to just the IP address
   :0
   * IP ?? ^^\/[0-9.]+
   {
       IP = $MATCH
       :0 W:
       * ! ? /usr/local/bin/rblcheck -q $IP
       | formail -A"RBL-Check-Info: `echo; /usr/local/bin/rblcheck -t $IP | sed 's/^/ /'`" >> $MAILDIR/lists/rbl-filtered
   }
}

So now I just delete spam from my inbox as it arrives, and try not
to get annoyed by it.

I think if I ever try to deal with it again, I'll handle it using
a whitelist (as opposed to a blacklist), with a list of people or
domains I expect to receive mail from, and filter everything else
into a mailbox that I scan once a week or so.

--
Gerald Oskoboiny <[email protected]>
http://impressive.net/people/gerald/

Re: Spam filters

On Sun, Feb 20, 2000 at 11:38:10PM -0500, Gerald Oskoboiny wrote:
:
> So now I just delete spam from my inbox as it arrives, and try not
> to get annoyed by it.
>
> I think if I ever try to deal with it again, I'll handle it using
> a whitelist (as opposed to a blacklist), with a list of people or
> domains I expect to receive mail from, and filter everything else
> into a mailbox that I scan once a week or so.

I've been getting a ton of spam lately (~36 messages per day this
December, out of a total of 384 messages per day), so I implemented
this whitelist-based filtering. Notes/code:

   http://impressive.net/people/gerald/2000/12/spam-filtering.html

Woohoo, no more spam in my inbox, ever!

--
Gerald Oskoboiny <[email protected]>
http://impressive.net/people/gerald/

Re: Spam filters

Replies:

Parents:

On Mon, Dec 18, 2000, Gerald Oskoboiny wrote:
> I've been getting a ton of spam lately (~36 messages per day this
> December, out of a total of 384 messages per day), so I implemented
> this whitelist-based filtering.
[..]

Did you have a look at lbdb[1]? It basically does this and more, such as
queries inside Mutt.

 1. http://www.spinnaker.de/lbdb/

--
Hugo Haas <[email protected]> - http://larve.net/people/hugo/
La vraie paresse, c'est de se lever � 6 heures du matin pour avoir plus
longtemps � ne rien faire. -- Tristan Bernard

Re: Spam filters

Replies:

Parents:

On Mon, Dec 18, 2000 at 08:03:26AM -0500, Hugo Haas wrote:
> On Mon, Dec 18, 2000, Gerald Oskoboiny wrote:
> > I've been getting a ton of spam lately (~36 messages per day this
> > December, out of a total of 384 messages per day), so I implemented
> > this whitelist-based filtering.
> [..]
>
> Did you have a look at lbdb[1]? It basically does this and more, such as
> queries inside Mutt.
>
>   1. http://www.spinnaker.de/lbdb/

No, my implementation is just simple standard formail/grep stuff.

That lbdb web page doesn't seem to say anything about what it
does or why you would want to use it!?

Anyway, I'm pretty sure I wouldn't want to use it for this.
(why would I want to do queries inside Mutt?)

--
Gerald Oskoboiny <[email protected]>
http://impressive.net/people/gerald/

Re: Spam filters

Replies:

  • None.

Parents:

On Mon, Dec 18, 2000, Gerald Oskoboiny wrote:
> That lbdb web page doesn't seem to say anything about what it
> does or why you would want to use it!?

That is true that the Web page is not very explicit, but it is a set of
tools to:
- build a database of known email addresses (with lbdb-fetchaddr[1]).
- be able to access various lists of email addresses (from this
 particular database, a PGP keyring, Mutt aliases, Pine's addressbook,
 etc).

lbdb-fetchaddr is basically your atw, and m_inmail is your grep I think,
whose output format is compatible with Mutt.

> Anyway, I'm pretty sure I wouldn't want to use it for this.
> (why would I want to do queries inside Mutt?)

If you want to send an email to somebody that you know but whose email
address is not in your Mutt aliases file, you can query the database you
built with lbdb-fetchaddr within Mutt. It is faster than digging into
your mail archives to find the exact email address.

I have been wanting to use that for quite a while now but haven't got
around to doing it yet.

 1. http://www.spinnaker.de/lbdb/lbdb-fetchaddr.html

--
Hugo Haas <[email protected]> - http://larve.net/people/hugo/
I would kill everyone in this room for a drop of sweet beer. -- Homer
J. Simpson

lbdb and Mutt (was Re: Spam filters)

Replies:

  • None.

Parents:

On Mon, Dec 18, 2000, Gerald Oskoboiny wrote:
> That lbdb web page doesn't seem to say anything about what it
> does or why you would want to use it!?

The Little Brother's Database homepage[1] says it does more or less
what the Insidious Big Brother Database[2] does, i.e. builds a
collection of email addresses that you can then query.

> Anyway, I'm pretty sure I wouldn't want to use it for this.
> (why would I want to do queries inside Mutt?)

I have run all the email I received through lbdb yesterday and added a
procmail rule. Now I have a list of all the people who I received
emails from.

Suppose that I want to send an email to you, that I know that you are
Gerald something but I don't know your exact address, I can do a
query[3] inside Mutt and I get a list of all the Gerald's who ever
sent me email, and I pick your email address from there. No screwing
around with Mutt aliases anymore.

Moreover, I have imported an LDIF address book into abook[4], and I can
look things up in there with an lbdb query too.

My procmailrc now includes:

:0hc
| $HOME/lbdb/bin/lbdb-fetchaddr

and my muttrc now specifies:

set query_command="$HOME/lbdb/bin/lbdbq '%s'"

If I press 'Q' in the index or '^T' in an address field, I can run a
query. And I like it.

How does this relate to whitelist spam filtering? From this list of
emails that you got, you can get rid of the dupicates (not done for
efficiency reasons) and then have the same list you have. Of course,
you would have to run lbdb-fetchaddr manually and not from your
procmailrc, but I was looking for the query feature, not the filtering
one.

 1. http://www.spinnaker.de/lbdb/
 2. http://www.jwz.org/bbdb/
 3. http://www.mutt.org/doc/manual/manual-4.html#ss4.5
 4. http://abook.sourceforge.net/

--
Hugo Haas <[email protected]> - http://larve.net/people/hugo/
A vaincre sans p�ril, on �vite les ennuis.

Re: Spam filters


--cNdxnHkX5QqsyA0e
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Mon, Dec 18, 2000, Gerald Oskoboiny wrote:
> I've been getting a ton of spam lately (~36 messages per day this
> December, out of a total of 384 messages per day), so I implemented
> this whitelist-based filtering. Notes/code:
>
>     http://impressive.net/people/gerald/2000/12/spam-filtering.html

I have finally switched from Junkfilter[1] to whitelist based
filtering. I read Gerald's page, and I made a few changes to Gerald's
implementation.

1/ Add to whitelist (atw) script

Gerald's version is unsafe: it doesn't lock the whitelist before
writing to it, it executes a program that it creates in /tmp.

I have rewritten it and it is attached to this email.

I have added two new features: it is possible to add a list of email
addresses and to import email addresses from Mutt aliases:
- 'atw': process RFC822 message from stdin.
- 'atw -a': takes a list of email addresses as argument.
- 'atw -M': process Mutt aliases from stdin.

2/ Procmail rule

It has two few minor mistakes:
- it does not lock the folder.
- it uses regular expression matching, which has two problems:
 + it is case sensitive.
 + a subset of a known email address could match.
 + there are special characters (at least '.') in email addresses.

Here is the rule I use:

   # White-base filtering
   WHITELIST_DIR=$HOME/whitelist
   WHITELIST=$WHITELIST_DIR/whitelist
   OTHERS=$WHITELIST_DIR/others
   ffield=`formail -XFrom: | formail -r -xTo: | tr -d ' '`
   :0:
   * ! ? grep -F -i -x -q "$ffield" $WHITELIST $OTHERS
     INBOX-unknown

But I think that it is going to be really cool. Thanks Gerald for
helping me switch.

 1. http://www.pobox.com/~gsutter/junkfilter/

--
Hugo Haas <[email protected]> - http://larve.net/people/hugo/
- Surely you can't put a price on your family's lives! - I didn't think
so either, but here we are. -- Homer J. Simpson

--cNdxnHkX5QqsyA0e
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=atw

#! /bin/sh
# atw: Add To Whitelist
# Modification of Gerald Oskoboiny's original atw
# See:
# http://impressive.net/people/gerald/2000/12/spam-filtering.html
# (c) 2001 Hugo Haas - Public domain

PATH=/bin:/usr/bin
WHITELIST_DIR=$HOME/whitelist
WHITELIST=$WHITELIST_DIR/whitelist
OTHERS=$WHITELIST_DIR/others
LOCKFILE=$WHITELIST_DIR/whitelist.lock

umask 077

# Get a lock
lock() {
 lockfile $LOCKFILE
 # Ensure that the lock will be removed when we are done
 trap "rm -f $LOCKFILE" 0 2 3 15
 [ -f $WHITELIST ] || touch $WHITELIST
}

# Add an email address to the whitelist
add_address() {
 echo -n "Checking for $1... "
 grep -F -i -x -q "$1" $WHITELIST $OTHERS
 if [ $? = 1 ]
 then
   echo "$1" >> $WHITELIST
   echo "added."
 else
   echo "already listed."
 fi
}

# If argument -a is given, add the list of email addresses given as arguments.
# If argument -m is given, add a single RFC822 message (from stdin).
# If argument -M is given, import a list of Mutt aliases (from stdin).
# Else read a list of RFC822 messages from stdin and add the From line to
# the white list.

if [ "$1" = '-a' ]
then
 lock
 shift
 while [ $# != 0 ]
 do
   add_address $1
   shift
 done
 exit
elif [ "$1" = '-m' ]
then
 lock
 add_address `formail -XFrom: | formail -r -xTo: | tr -d " "`
 exit
elif [ "$1" = '-M' ]
then
 perl -n -e 'next if (! m/^\w*alias/); chomp; $_ =~ m/[^ ]+@[^ ]+/; $_ = $&; s/^[<]//g; s/[>]$//; print $_."\n";' | exec xargs $0 -a
fi

exec formail -s $0 -m

--cNdxnHkX5QqsyA0e--

Re: Spam filters

Replies:

  • None.

Parents:


--HcAYCG3uE/tztfnV
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Something else that I noted is that Gerald says in his documentation[1]:

  Then:
   touch .whitelist
   atw < $MAIL
   atw < mail/w3c/inbox
   # (repeat with any other non-list mailboxes you have that don't have spam)

A good idea to get a list of email addresses which work is to extract
the recipients of emails that you sent out.

I have added a '-t' option to atw (attached) which scans the To and Cc
fields instead of the From field:

atw -t < ootbox

 1. http://impressive.net/people/gerald/2000/12/spam-filtering.html

--
Hugo Haas <[email protected]> - http://larve.net/people/hugo/
What kind of side dishes will we be enjoying this evening with our
frozen waffles?

--HcAYCG3uE/tztfnV
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=atw

#! /bin/sh
# atw: Add To Whitelist
# Modification of Gerald Oskoboiny's original atw
# See:
# http://impressive.net/people/gerald/2000/12/spam-filtering.html
# (c) 2001 Hugo Haas - Public domain

PATH=/bin:/usr/bin
WHITELIST_DIR=$HOME/whitelist
WHITELIST=$WHITELIST_DIR/whitelist
OTHERS=$WHITELIST_DIR/others
LOCKFILE=$WHITELIST_DIR/whitelist.lock

umask 077

# Get a lock
lock() {
 lockfile $LOCKFILE
 # Ensure that the lock will be removed when we are done
 trap "rm -f $LOCKFILE" 0 2 3 15
 [ -f $WHITELIST ] || touch $WHITELIST
}

# Add an email address to the whitelist
add_address() {
 echo -n "Checking for $1... "
 grep -F -i -x -q "$1" $WHITELIST $OTHERS
 if [ $? = 1 ]
 then
   echo "$1" >> $WHITELIST
   echo "added."
 else
   echo "already listed."
 fi
}

# Add a list of adresses
add_addresses() {
 for email in $*
 do
   add_address $email
 done
 exit
}

# If -t is given as a first argument, scan the To and Cc fields instead of
# the From line.
if [ "$1" = '-t' ]
then
 to='-t'
 shift
fi

# If argument -a is given, add the list of email addresses given as arguments.
# If argument -m is given, add a single RFC822 message (from stdin).
# If argument -M is given, import a list of Mutt aliases (from stdin).
# Else read a list of RFC822 messages from stdin.

if [ "$1" = '-a' ]
then
 lock
 shift
 add_addresses $*
elif [ "$1" = '-m' ]
then
 lock
 if [ "$to" = '-t' ]
 then
   addresses=`formail -x To -x Cc | perl -pn -e 's/,/\n/g' | perl -n -e 'chomp; s/\".*?\"//g; m/[^ ]+@[^ ]+/; $_ = $&; s/^[<]//g; s/[>]$//; print $_."\n";'`
 else
   addresses=`formail -XFrom: | formail -r -xTo: | tr -d " "`
 fi
 add_addresses $addresses
elif [ "$1" = '-M' ]
then
 addresses=`perl -n -e 'next if (! m/^\w*alias\w/); chomp; s/\".*?\"//g; m/[^ ]+@[^ ]+/; $_ = $&; s/^[<]//g; s/[>]$//; print $_."\n";'`
 add_addresses $addresses
fi

exec formail -s $0 $to -m

--HcAYCG3uE/tztfnV--

Re: Spam filters

* Hugo Haas <[email protected]> [2001-01-27 19:49-0500]
> I have finally switched from Junkfilter[1] to whitelist based
> filtering.
[..]

* Hugo Haas <[email protected]> [2002-04-03 11:10-0500]
[..]
> It seems that there is no immediate nor easy technological answer, and
> no easy legal action either.

I have changed my spam filtering techniques taking into account the
new type of spam. I talked to Max who started using SpamAssassin[2]
and was happy about it. I had a look and found it cool. But I didn't
want to abandon my whitelist filtering.

I therefore am using 3 different folders:
- emails identified as spam.
- emails not identified as spam from people I know (who are on my
 whitelist).
- emails not identified as spam from people I don't know.

SpamAssassin works with a scoring system. I use my whitelist to
decrease the score when somebody is on my whitelist. It is therefore
easier to be considered as a spammer if the address in not on my
whitelist.

I have also enabled Vipul's Razor[3] for increasing my detection
accuracy. When I detect spam which isn't registered in Razor, I do so.

Here is what it looks like:

-*- Promailrc
=============

Whitelist detection:

 # White-base filtering
 WHITELIST_DIR=$HOME/whitelist
 WHITELIST=$WHITELIST_DIR/whitelist
 ffield=`formail -XFrom: | formail -r -xTo: | tr -d ' '`
 :0fhw
 * ? grep -F -i -x -q "$ffield" $WHITELIST
 | formail -i "X-HH-Whitelist: YES"

 :0Efhw
 | formail -i "X-HH-Whitelist: NO"

Spam filtering:

 SPAMASSASSINDIR=$HOME/spam/spamassassin

 :0fw
 | $SPAMASSASSINDIR/spamassassin -c $SPAMASSASSINDIR/rules -P

 :0:
 * ^X-Spam-Flag: YES
   spam

If something hasn't been classified as spam, see if I know the guy:

 INCLUDERC=$HOME/.procmail/spamfiltering

 :0:
 * ^X-HH-Whitelist: NO
   unknown

-*- SpamAssassin
================

Here is how I use my whitelist:

 # Whitelist filtering
 header          ON_WHITELIST    X-HH-Whitelist  =~      /^YES$/
 describe        ON_WHITELIST    Sender whitelisted
 score           ON_WHITELIST    -5.0

I have a few other non-related settings:

 # Don't rewrite the subject
 rewrite_subject 0

 # Leave the content-type alone
 defang_mime 0

 # Report in the header
 report_header 1
 use_terse_report 1

-*- Muttrc
==========

A few things that I configured to make my life easier:

 # Spam stuff
 # Show spam headers
 unignore X-Spam-Status X-Spam-Report
 # Highlight spam
 #ifndef USE_IMAP
 color index     red     default "~h '^X-Spam-Flag: YES'"
 color index     red     blue "~h '^X-Spam-Flag: YES' ! ~h '^X-Spam-Status: .*RAZOR_CHECK'"
 #endif
 # How to report spam
 #define REPORT_BULK_SPAM ";|formail -s spamassassin -r -D\n"
 macro index \eR "T! ~s '\^[[]Moderator Action[]] ' ~h '\^X-Spam-Flag: YES' ! ~h '\^X-Spam-Status: .*RAZOR_CHECK'\n"
 macro index \eS REPORT_BULK_SPAM
 macro pager \eS REPORT_BULK_SPAM

Note that there are spp commands because I preprocess my muttrc[4].

I am going to test that extensively and tweak it if necessary.

 2. http://spamassassin.taint.org/
 3. http://razor.sf.net/
 4. http://larve.net/people/hugo/2002/04/mutt-cpp
--
Hugo Haas <[email protected]> - http://larve.net/people/hugo/
Perhaps, but let's not get bogged down in semantics. -- Homer J.
Simpson

Re: Spam filters

On Mon, Apr 15, 2002 at 06:13:01PM -0400, Hugo Haas wrote:
> * Hugo Haas <[email protected]> [2001-01-27 19:49-0500]
> > I have finally switched from Junkfilter[1] to whitelist based
> > filtering.
> [..]

> I have changed my spam filtering techniques taking into account the
> new type of spam. I talked to Max who started using SpamAssassin[2]
> and was happy about it. I had a look and found it cool. But I didn't
> want to abandon my whitelist filtering.

SpamAssassin looks excellent from what I have seen. I understand
it has some kind of automatic whitelist feature: every time you
receive non-spam from someone, their whitelist score increases?
(or something like that)

Thanks for the docs on your setup, I'm sure that will be useful.

> Note that there are [cpp] commands because I preprocess my muttrc[4].

>   2. http://spamassassin.taint.org/
>   4. http://larve.net/people/hugo/2002/04/mutt-cpp

I'm curious why you need to use cpp; I have most of my settings
in my .muttrc [5], and use a couple extra files [6] for other stuff
that is specific to a certain environment (personal or w3c mail)

For my w3c mail, I invoke mutt with "w3cmutt", which is aliased to:
   zot "w3c mail"; localsuffix="-w3c" mutt

(zot just changes the rxvt title bar; it's called zot because that's
what it was called when I got it from a friend 10 years ago)

Hmm... I guess you tried something like that before switching to
cpp; I'm just wondering what it was you finally needed cpp for.

[5] http://impressive.net/people/gerald/misc/dotfiles/muttrc
[6] http://impressive.net/people/gerald/misc/dotfiles/muttrc-local-devo
   http://impressive.net/people/gerald/misc/dotfiles/muttrc-local-w3c

(I don't need different configs for local/remote, since I always
store all my mail on my laptop.)

--
Gerald Oskoboiny <[email protected]>
http://impressive.net/people/gerald/

Re: Spam filters

Replies:

  • None.

Parents:

* Gerald Oskoboiny <[email protected]> [2002-04-15 23:07-0400]
> SpamAssassin looks excellent from what I have seen. I understand
> it has some kind of automatic whitelist feature: every time you
> receive non-spam from someone, their whitelist score increases?
> (or something like that)
[..]

Yes, there is an auto-whitelist feature. I haven't tried it yet. I
wasn't sure about how to leverage my existing whitelist to bootstrap
it, so I preferred to try and integrate my whitelist another way, and
maybe I will play with the auto-whitelist later.

I was somewhat worried that the whitelist would let everything
through. By default, you need 5 points to be declared as spam. An
email from the 'EMail-IT' True Stealth System that I was complaining
about[7] scores as follows:

 X-Spam-Report:   13.4 hits, 5 required;
   * -0.3 -- Cc: contains similar domains at least 10 times
   *  1.7 -- BODY: Includes a link to send a mail with a subject
   * -0.2 -- BODY: Includes a URL link to send an email
   *  3.5 -- BODY: Link to a URL containing "remove"
   *  3.0 -- Listed in Razor, see http://razor.sourceforge.net/
   *  4.5 -- HTML-only mail, with no text version
   *  0.2 -- From and To the same address
   *  1.0 -- Received via a relay in orbs.dorkslayers.com
     [RBL check: found 150.82.130.139.orbs.dorkslayers.com.]

With my whitelist 5 point-bonus, it scores 8.4 and is still recognized
as spam. From what I have seen, their whitelist had a 100 point-bonus,
which seems for too much[8]:

header From: address is in the user's white-list USER_IN_WHITELIST -100.0

There must be something I haven't understood about it yet.

> > Note that there are [cpp] commands because I preprocess my muttrc[4].
>
> >   2. http://spamassassin.taint.org/
> >   4. http://larve.net/people/hugo/2002/04/mutt-cpp
>
> I'm curious why you need to use cpp; I have most of my settings
> in my .muttrc [5], and use a couple extra files [6] for other stuff
> that is specific to a certain environment (personal or w3c mail)
>
> For my w3c mail, I invoke mutt with "w3cmutt", which is aliased to:
>     zot "w3c mail"; localsuffix="-w3c" mutt
>
> (zot just changes the rxvt title bar; it's called zot because that's
> what it was called when I got it from a friend 10 years ago)
>
> Hmm... I guess you tried something like that before switching to
> cpp; I'm just wondering what it was you finally needed cpp for.

Indeed I tried something like that, but it got rapidly very complex.
On my laptop, depending on if I read my private mail or my work mail,
if I use isync to read my IMAP folders locally or if I read them
remotely, I have 4 aliases:

 imutt='my_mutt -DWORK_CONF -DON_LAPTOP -DUSE_IMAP --'
 imuttp='my_mutt -DON_LAPTOP -DUSE_IMAP --'
 mutt='my_mutt -DWORK_CONF -DON_LAPTOP --'
 muttp='my_mutt -DON_LAPTOP --'

and at work, I have:

 mutt='my_mutt -DWORK_CONF --'
 muttp='my_mutt --'

My configuration is fairly complex because I have lots of different
settings for each of them. Here is an example:

#ifdef WORK_CONF
  #ifdef USE_IMAP
    #define FOLDER "{localhost:1430}mail"
  #else
    #define FOLDER "~/mail"
  #endif
#else
  #ifdef USE_IMAP
    #define FOLDER "{localhost:1430}private-mail"
  #else
    #define FOLDER "~/private-mail"
  #endif
#endif

set folder=FOLDER

and another one:

#ifndef USE_IMAP
  #ifndef ON_LAPTOP
    # Hide the IMAP server messages
    folder-hook . "push \"<limit> ! (~s 'DELETE THIS MESSAGE -- FOLDER INTERNAL DATA' ~f MAILER-DAEMON)\n\""
  #endif
#endif

I used your technique for a long time, but it just became too complex
to manage so many configurations.

 7. http://impressive.net/archives/fogo/[email protected]
 8. http://spamassassin.taint.org/tests.html
--
Hugo Haas <[email protected]> - http://larve.net/people/hugo/
Kids, your mother's under a lot of pressure, why don't we let her clear
the table in peace? -- Homer J. Simpson

Re: Spam filters

Replies:

  • None.

Parents:

* Hugo Haas <[email protected]> [2002-04-15 18:13-0400]
> I have changed my spam filtering techniques taking into account the
> new type of spam. I talked to Max who started using SpamAssassin[2]
> and was happy about it. I had a look and found it cool. But I didn't
> want to abandon my whitelist filtering.
>
> I therefore am using 3 different folders:
> - emails identified as spam.
> - emails not identified as spam from people I know (who are on my
>   whitelist).
> - emails not identified as spam from people I don't know.
>
> SpamAssassin works with a scoring system. I use my whitelist to
> decrease the score when somebody is on my whitelist. It is therefore
> easier to be considered as a spammer if the address in not on my
> whitelist.
>
> I have also enabled Vipul's Razor[3] for increasing my detection
> accuracy. When I detect spam which isn't registered in Razor, I do so.

I wanted to give an update an my spam filtering system. With the new
version of SpamAssassin (2.20) and Razor (1.20), my spam filter
catches about 98% of the spam (I lowered the threshold to 3.6 hits and
tweaked a couple of other rules). The 2% of spams that got through
went into my unknown sender folder.

The only non-spam email I saw it catch were bounces from mailing
lists.

In order to make sure that I improve my (and everybody else's) spam
filtering, I systematically bounce spam that went through to
spamassassin-sightings[4] (ESC-B in my Mutt session) and register all
confirmed spam with Razor[5] (ESC-R ; ESC-Z in my Mutt session). This
is easy enough that it just takes a few seconds every day or two.

Basically, I am *very* happy about this new system, and would
encourage people to use it: the more people use Razor and report spams
to it, the less spam we will see.

 4. http://lists.sourceforge.net/lists/listinfo/spamassassin-sightings
 5. http://razor.sourceforge.net/
--
Hugo Haas <[email protected]> - http://larve.net/people/hugo/
Mais alors, tout se recoupe !

Re: Spam filters

Replies:

Parents:

On Mon, Dec 18, 2000 at 06:54:51AM -0500, Gerald Oskoboiny wrote:

> I've been getting a ton of spam lately (~36 messages per day this
> December, out of a total of 384 messages per day), so I implemented
> this whitelist-based filtering. Notes/code:
>
>     http://impressive.net/people/gerald/2000/12/spam-filtering.html
>
> Woohoo, no more spam in my inbox, ever!

I just noticed that almost all of the spam that gets trapped by
this filter has From: lines like this:

   From: [email protected]

rather than the usual formats:

   From: Gerald Oskoboiny <[email protected]>
   From: [email protected] (Gerald Oskoboiny)

i.e. they are missing full names. (this is the case for 90 out of
the 103 new messages in my current unknown-sender mailbox.)

So I modified my whitelist filter to only capture mail that does not
contain a space anywhere after '^From: ' in the header. (so from now
on, almost all the mail that gets trapped by this filter will be spam,
and I can check it even less often than before.)

The new line I added to my .procmailrc is:

   * ^From: .*\

(the trailing space is important)

Sample of recent crud that got trapped by my whitelist filter:

136   + Mar 17 Amazon.com      (9.6K) Your Source for Home Office Accessories  
137 N   Mar 17 carlos2576@500a (1.8K) INCREASE YOUR Sales!
138 N   Mar 17 [email protected] (4.2K) Watch Cable Tv!
139 N   Mar 18 globe002@talk21 (1.1K) Make Money NOW!
140 N   Mar 17 [email protected] (2.7K) Kiss Your Job Goodbye!!!!!!!
141 N   Mar 17 frank_zane@msn. (4.7K) Set Your Own Work Schedule!!!!!!!
142 N   Mar 18 [email protected] (5.2K) HUMAN GROWTH HORMONE - Reduce Body Fat
143 N + Mar 18 hm_renaolds@yah ( 14K) It's Your Turn !
144 N   Mar 18 seoinfo2@earthl (1.2K) Is Your Site Lost in Cyberspace?
145 N   Mar 18 [email protected] (2.7K) Set Your Own Work Schedule!!!!!!!
146 N   Mar 17 21770520@excite (3.9K) U.S. NEWS AND WORLD REPORT "A GOLD RUSH
147 N   Mar 18 15067040@excite (3.9K) U.S. NEWS AND WORLD REPORT "A GOLD RUSH
148 N   Mar 18 07333353@hotmai (3.1K) Exciting Business Opportunity not mlm $$
149 N   Mar 18 bk22bk89@yahoo. (1.5K) At Your Service
150 N   Mar 18 juzyrvreyy@hotb (0.5K) FREE Mortgage Rate Quote!
151 N   Mar 11 [email protected]     (6.6K) How To Advertise To 16-Million People
152 N   Mar 18 [email protected] (5.2K) HUMAN GROWTH HORMONE - Reduce Body Fat

(note that almost none of them have real names, just email addresses)

--
Gerald Oskoboiny <[email protected]>
http://impressive.net/people/gerald/

Re: Spam filters

Replies:

  • None.

Parents:

On Tue, Mar 20, 2001 at 02:58:41AM -0500, Gerald Oskoboiny wrote:
:
> So I modified my whitelist filter to only capture mail that does not
> contain a space anywhere after '^From: ' in the header. (so from now
> on, almost all the mail that gets trapped by this filter will be spam,
> and I can check it even less often than before.)
>
> The new line I added to my .procmailrc is:
>
>     * ^From: .*\
>
> (the trailing space is important)

Oops... no, that doesn't work, because procmail strips leading
and trailing whitespace. (I thought I tested it and it worked,
but I guess not.)

So I changed that to:

   SPACE=" "
   :0
   *$ ^From: .*$SPACE

etc.

--
Gerald Oskoboiny <[email protected]>
http://impressive.net/people/gerald/

HURL: fogo mailing list archives, maintained by Gerald Oskoboiny