Post Archive

Page 1 of posts tagged "spam" from November 26, 2011 to September 07, 2014.

Catching Comment Spam in a Honeypot

How a tempting target can reveal automated spammers.

A couple years ago I wrote about using Akismet to catch spam.

Since then, Akismet has successfully captured tens of thousands of spam comments to this site. However, since I'm not comfortable completely accepting the results from a Baysian filter, I've dutifully been stuffing them into my database. However, it is getting a little silly:

$ sqlite3 main.sqlite
sqlite> SELECT is_spam, count(1) FROM blog_comments GROUP BY is_spam;
0|30
1|13656

Ouch. Lets clean that out and see what happens.

$ cp main.sqlite bak.sqlite
$ sqlite3 main.sqlite
sqlite> DELETE FROM blog_comments WHERE is_spam AND NOT visible;
sqlite> vacuum;
sqlite> .quit
$ ls -lh
-rw-rw----  1 mikeboers mikeboers 19905536 Sep  7 16:35 bak.sqlite
-rw-rw----  1 mikeboers mikeboers  2811904 Sep  7 16:37 main.sqlite

17MB of my 20MB database was spam comments!

In my first post I outlined the various methods of spam detection: manual auditing, captchas, honeypots, and contextual filtering (i.e. Akismet). Lets quickly add another one of these to exponentially increase our confidence.

Posted on September 07, 2014. Categories:

Cleaning Comments with Akismet

My site recently (finally) started to get hit by automated comment spam. There are few ways that one can traditionally deal with this sort of thing:

Manual auditing: Manually approve each and every comment that is made to the website. Given the low volume of comments I currently have this wouldn't be too much of a hassle, but what fun would that be?
Captchas: Force the user to prove they are human. ReCaptcha is the nicest in the field, but even it has been broken. But this doesn't stop human who are being paid (very little).
Honey pots: Add an extra field¹ to the form (e.g. last name, which I currently do not have) that is hidden by CSS. If it is filled out one can assume a robot did it and mark the comment as spam. This still doesn't beat humans.
Contextual filtering: Use Baysian spam filtering to profile every comment as it comes in. By correcting incorrect profiles we will slowly improve the quality of the filter. This is the only automated method which is able to catch humans.

I decided to go with the last option, as offered by Akismet, the fine folks who also provide Gravatar (which I have talked about before). They have a free API (for personal use) that is really easy to integrate into whatever project you are working on.

Now it is time to try it out. I've been averaging about a dozen automated spam comments a day. With luck, none of them will show up here.

*crosses his fingers *

Update:
I was just in touch with Akismet support to offer them a suggestion regarding their documentation. Out of nowhere they took a look at the API calls I was making to their service and pointed out how I could modify it to make my requests more effective in catching spam!

That is spectacular support!

The previously linked article is dead as of Sept. 2014. ↩

Posted on November 26, 2011. Categories:

There are no more posts tagged "spam".