Followup on Spam Filtering

I figured that several readers are also bloggers in their own right, and might be interested in some information that I’ve gathered about spam and my efforts to block it.

This blog, which is not a terribly popular one, gets a substantial amount of comment spam. For example, here’s the amount of spam that was received for the last few months:

Dec2010: 5,028
Jan2011: 6,544
Feb2011: 4,712
Mar2011: 5,596

Compare that to the 25-30 legitimate comments made monthly, and you see that the ratio is extremely skewed in favor of spam. Since this blog was founded in 2008, 53,881 spams have been received, compared to 854 total legitimate messages.

Ideally, there would be no comment spam. Since this is not possible, I want to reduce spam by the maximum amount possible, inconvenience users as little as possible, and keep the spam queue in the WordPress administrative interface as empty as I can.

Now, WordPress comes with an outstanding spam filter called Akismet. When activated, all incoming comments are sent to Akismet for a spam/not-spam review. Since the service is centralized, they’re able to accumulate a huge amount of data about spammy and legitimate messages, adapt to changing spam patterns, and do remarkably well (99.96% according to my calculations) at detecting spam and allowing legitimate messages to pass. If it misses spam, or mistakenly flags legitimate mail as spam, I can override the Akismet decision (and that override is sent to Akismet so it can adapt).

Messages flagged as spam by Akismet go into the spam queue for my review. Unfortunately, this means that more than 150 spams a day get shunted there. Reviewing these messages is tedious and time-consuming. What if I could block the spam from even being submitted, thus reducing the amount of spam that I need to wade through?

Since all WordPress blogs have the same comments.php file, spammers don’t even need to fill in the normal comments form on the website: they can submit their spam directly to the comments.php file with the appropriate fields already filled in. Of course, since this is all done automatically by software, a slight change to the comments.php file will result in the spambots being unable to submit messages. Enter NoSpamNX, a very handy plugin that makes these changes that breaks spambots but doesn’t affect humans. Specifically, it adds certain fields to the human-readable contact form that are filled in with a randomly-generated bunch of text (to avoid the spammers adapting, it changes these random values every 24 hours).

If a comment does not include these hidden fields with that day’s random text, that means that the comment was not submitted through the ordinary human-readable form, and therefore must be spam. One can elect to then mark the message as spam, or simply delete it outright.

This simple plugin has blocked 37,775 spams since I installed it in June 2010. During that same period, a total of 39,113 spams were submitted to my site. This means that NoSpamNX alone would have blocked about 96.6% of spam. Not bad, particularly for something that does not burden legitimate commenters with any additional steps like CAPTCHAs.

In my particular case, I like contributing spam messages to Akismet since it improves their statistics, so I elected to have NoSpamNX simply mark messages as spam rather than deleting them (the deletion would occur before the messages get submitted to Akismet). Thus, my spam queue had lots of messages for me to review. I needed something more, something that would provide a second opinion to Akismet and NoSpamNX.

In my December 14th post, I mentioned that I was testing out a plugin called Conditional CAPTCHA. This one is particularly useful: it waits for messages to get reviewed by existing spam filters such as Akismet. If Akismet says the message is legitimate, Conditional CAPTCHA does nothing, and the message is posted immediately. However, if the message is flagged as spam, then Conditional CAPTCHA presents a reCAPTCHA. If the CAPTCHA is solved incorrectly or no attempt to solve it is made within 10 minutes, the message is silently deleted and not added to the spam queue. If the CAPTCHA is solved correctly, the message is then placed into the moderation queue (I’m a bit suspicious, as it was marked as spam, so I want to review it prior to it being posted).

Using Conditional CAPTCHA means that the vast majority of legitimate commenters are not inconvenienced by always facing a CAPTCHA. Only comments flagged as spam are presented with such a challenge.

So far, Conditional CAPTCHA has stopped 18,589 spams since it was installed, essentially 100% of the spam submitted to this site. There have been exactly four messages that were flagged as spam and resulted in the CAPTCHA being solved correctly. All of these have been spam, and never made it out of the moderation queue.

In my particular case, NoSpamNX is a bit redundant: I use it simply to keep a measure of how many spammers submit spam directly to the comments.php file versus how many submit comments using the human-readable form.

In conclusion, if you are a WordPress blogger and are inundated with spam, both on your site and in your spam queue, I heartily recommend using both Akismet (which you should already be using) and Conditional CAPTCHA. Doing so should reduce your spam to practically nothing.

If other bloggers out there have some statistics on the spam they receive, what they use to combat it, and how effective those measures are, I would be quite interested in hearing about it.

On Couchsurfing

Since I live in the Phoenix area (my wife has a condo there) and work in Tucson during the week, I’ve been couch-surfing with friends during the work week for the last year. This has allowed me to not need to get a separate apartment, thus saving hundreds of dollars a month. In exchange for housing, I maintain my friend’s cars, computers, and do other such tasks. So far, it’s worked out pretty well for both parties.

Imagine my interest when I discovered CouchSurfing.org. In essence, it consists of people willing to provide a place to sleep for others. It’s not meant to be permanent, but is geared towards a similar group of people as those who stay in hostels whilst traveling. Hosts and visitors have a profile page which also includes reviews from other CSers, so one can be reasonably assured that they’ll not inviting in an axe murderer or crazy person. Very cool concept, and something that my wife and I will take advantage of when we move to Europe and travel frequently.

Hostels are inexpensive and nice (for the most part), but you don’t really get a feel for the people of an area in most cases. With CS, one actually stays with a local (or is a local and hosts a traveler), and so can get a much more in-depth feel for the people and culture in an area. Very cool.

I Got Nothing

Sorry folks. Nothing much has been happening recently. I haven’t been to the range in months, haven’t taken new shooters out in a while longer, have been about a month behind the times when it comes to gun-related news, have fallen behind in reading other blogs, etc.

I’m alive (at least for now; I’m going to be skiing all next week), excited about having gotten into graduate school, and generally getting along fine.

As an aside, if you haven’t played the video games Mass Effect and Mass Effect 2, you’re missing out. I was a bit skeptical of a third-person shooter/RPG, but I was wrong. They’ve seriously been the most-bang-for-the-buck entertainment that I’ve had in years (since Star Wars: Knights of the Old Republic which, interestingly enough, is made by the same company as Mass Effect). Tons of replay value, too.

Private