Meta: Blog Spam

One of the issues with maintaining a presence online, such as a public email address or blog, is spam. Comment spams are particularly obnoxious, and I’m doing what I can to reduce or eliminate them entirely on this site.
I use several methods to help prevent spam on this blog:

  • Akismet. This is provided by Automattic, the developers of WordPress and is quite effective. It works similarly to how Google Mail’s spam filter works: all comments posted here are submitted to Akismet which makes the spam/not-spam decision and reports back to this blog. It can learn and adapt, so as to block new spam, and to avoid blocking legitimate messages. It handles both comment and trackback/pingback spam.
  • Yet Another WordPress Anti Spam Plugin. This is run entirely locally, and modifies the “backend” of how comments are submitted. Specifically, it changes the “author”, “url”, “email”, and other such fields in the comment submission form to random values. These values change every 24 hours. This will not affect human users at all, but spam bots don’t expect these values and attempt to send spam using the default values and fail. So far, this seems quite effective, and doesn’t require any sort of extra work from viewers, which keeps things simple. This only works with comment spam.
  • Honeypots. Spam bots follow links in an attempt to find email addresses, comment forms, or more sites to spam. The HTML code of this blog contains hidden links to several “honeypots” which exist to attract harvesters and spammers. These are not visible to users. When viewed, a unique, one-time email address is generated specifically for that viewer and information is logged in a database. If that unique address receives spam, it can be matched with the harvester. Similar things occur to ensnare comment spammers. This data is then fed to the http:BL which can be used to block spammers from being able to post to this blog. While I do have several honeypots, I don’t currently use the http:BL, as other methods seem quite effective, but I may do so if the spam gets to greater levels. Unless you’re a spammer and go visiting these honeypots, this will not affect you at all, nor log any information about you whatsoever.
  • Manual efforts. I routinely monitor trackbacks/pingbacks and comments to see if spam gets through and attempt to remove it manually. Similarly, I review the “spam queue” for Akismet and will mark messages as spam/not-spam as necessary.

I was considering using reCAPTCHA to block spam, as it assists Carnegie Mellon University’s efforts to correctly transcribe books in the public domain. Many of the words it presents are ones that their Optical Character Recognition (which turns scanned images of books into text) are unable to read, and so humans can help correctly “read” these words by entering them.
This benefits the university and general public (more knowledge in the form of books), and website owners (less spam). However, it can annoy readerswho have to fill out the form each time they want to comment. It’s for this reason that I am not currently using CAPTCHAs for comments, but do use them on my contact information page to keep my email address from being harvested.
If you have any issues at all related to my anti-spam measures, please don’t hesitate to contact me. I want to provide a simple, seamless way for readers to leave comments while making it all but impossible for spammers to post their ads.
If something doesn’t work right, please let me know and I’ll disable it until I can figure out what went wrong and how to keep it from happening again.