Sunday, February 19, 2006
Are you having issues with spam? I'm not!
Back on February 1st I mentioned my email server. Since then I've gotten a couple of inquiries into why I have one, and how it helps eliminate spam.
The reason I have one is that I like to read my email from several PCs, including PCs at other peoples' houses, without entrusting all my email to potentially snoopy providers, such as Gmail or my ISP, and having to use a web client.
The reason I can't move between machines easily is because the popular email clients' Bayesian spam filters only work on one machine. Each machine's client learns to filter a different subset of spam message types, so no one machine can do a very good job.
Because I have my own email server, I run a Bayesian filter on it, and my wife and I both train it as necessary. That server lets each of us use whatever email client we want (as long as it uses non-proprietary protocols), on any computer we want, while still having the spam consistently removed from our inboxes. When the filter finds a spammy message for my wife or me, it puts it into a personal "Likely Spam" folder.
The "Likely Spam" folder takes no effort to patrol. I can't remember ever seeing a legitimate email message in that folder, but I still check it, just in case. For this folder I also use my email client's Bayesian filter, and because it's a different implementation that I train less often, it marks the really really spammy messages with a little trash icon. I just delete those messages from the "Likely Spam" folder without looking at them very hard. Then I scan the names and subjects of the remaining messages, potentially rescue any that aren't really spam, mark the rest as spam to train my local email client, then delete them for good. It sounds harder than it is. I can almost do all this in a few keystrokes.
My wife and I both train the shared spam filter with the newest stupid spamming tricks. When the server's filter lets a spam message through, we just drag it into our personal "Training for Spam" folder. That's all there is to it.
Although my wife and I each have a "Training for Ham" folder, we almost never have to use it. On the rare occasion we do find a message that's not really spam, we copy it into our "Training for Ham" folder.
The server's spam filter periodically checks each person's training folders, and trains itself with the messages it finds.
Each week I still see a few spam messages sneak through, but I think I know how to increase the detection another order of magnitude without changing the user experience at all. In addition to the Bayesian filter called "SpamProbe", I plan to install "SpamAssassin", which uses entirely different techniques, such as looking up the sender's domain name to see if it's in a "black list". I'm expecting the combination of the two filters to add a few more nines to the percentage of spam messages filtered.
I'm thinking about making a little network appliance with all of this software preinstalled on a small-form-factor hard drive. Do you think you would want to test one? Do you think you would buy one if it were cheap enough?