* Gerald Oskoboiny <
[email protected]> [2004-01-14 00:38-0500]
> impressive.net now rejects any mail that spamassassin scores
> higher than 10. Woohoo!
>
> The mail is rejected during the initial delivery attempt, with
> this error message:
>
> 550-Sorry, this smells like spam; rejected. For more info, please see
> 550
http://impressive.net/people/gerald/2004/01/spam.html
The following is only marginally interesting to others, I just
wanted to archive some thoughts and recent stats on my spam
filtering setup.
For the last few months any email to impressive.net with a SA
score > 10 has been rejected, and any email I receive with score
5-10 goes into a mailbox called 'probable-spam' which in theory I
could review periodically for false positives, but in practice
just gets ignored. (it has 38643 messages since Jan 14, 292/day)
I hate silently ignoring email, so I wonder if I should decrease
the rejection threshold to 5 or something. Or maybe set up a
challenge/response system for that mail: since mr-burns rejects
forgeries using SPF, I could tell anyone who complains about
bogus challenges to publish SPF records and leave me alone.
The spamassassin that runs at SMTP time is a generic one that
doesn't learn over time because it doesn't have a bayes DB that
it can write to (because it runs as user nobody), so it is much
less effective than it could be.
I had planned to figure out how to set up a bogus user with a
bayes DB that I could train over time, but it seems tricky to do
that with exiscan-acl so maybe I should just configure SA on
mr-burns to use my personal bayes DB.
recent stats:
263 msgs/day rejected by generic SA > 10 at SMTP time (no bayes DB)
in the last week.
292 msgs/day trapped with SA > 5 by autolearning SA (not manually trained)
(since Jan 14)
of those 292,
185 msgs/day scored >10 when rescored using my bayes DB (autolearned)
107 msgs/day scored >5 but <10 when rescored using my bayes DB
so if I switched the global spamassassin config to use my personal
bayes DB on mr-burns, I could start rejecting another 185 messages/
day immediately.
If I lowered the rejection threshold to 5, I could start rejecting
another 107 messages/day.
If I started training SA manually, I could do even better.
(I still get about 50-100 spams/day in my low-priority mailbox:
stuff that scored < 5 and was not from someone on my whitelist)
Oh... I haven't received a single complaint about real mail being
rejected, though ~35k messages have been rejected by now. (But I
wouldn't necessarily hear about list subscriptions being cancelled.)
distributions of stuff that scored >5 using autolearned bayes DB:
$ cat probable-spam | formail -s formail -XX-Spam-Status: | fmt -1 | egrep ^hits= | cut -d. -f1 | sort -n -t= +1 | uniq -c
1 hits=-4
1 hits=-0
4 hits=1
12 hits=2
7 hits=3
22 hits=4
2485 hits=5
2641 hits=6
3077 hits=7
2977 hits=8
2948 hits=9
3583 hits=10
3352 hits=11
2829 hits=12
2550 hits=13
2284 hits=14
1811 hits=15
1489 hits=16
1038 hits=17
755 hits=18
535 hits=19
394 hits=20
350 hits=21
285 hits=22
269 hits=23
230 hits=24
202 hits=25
284 hits=26
268 hits=27
337 hits=28
247 hits=29
270 hits=30
214 hits=31
167 hits=32
140 hits=33
133 hits=34
120 hits=35
76 hits=36
53 hits=37
45 hits=38
36 hits=39
43 hits=40
31 hits=41
21 hits=42
11 hits=43
4 hits=44
5 hits=45
3 hits=46
3 hits=47
(a few messages that scored < 5 went to probable-spam due to
various other filters in my procmailrc)
--
Gerald Oskoboiny <
[email protected]>
http://impressive.net/people/gerald/