Blog Archive

Viewing page 1 of posts tagged "website" from August 17, 2011\ to September 07, 2014\ .

Catching Comment Spam in a Honeypot

How a tempting target can reveal automated spammers.

A couple years ago I wrote about using Akismet to catch spam.

Since then, Akismet has successfully captured tens of thousands of spam comments to this site. However, since I'm not comfortable completely accepting the results from a Baysian filter, I've dutifully been stuffing them into my database. However, it is getting a little silly:

$ sqlite3 main.sqlite
sqlite> SELECT is_spam, count(1) FROM blog_comments GROUP BY is_spam;
0|30
1|13656

Ouch. Lets clean that out and see what happens.

$ cp main.sqlite bak.sqlite
$ sqlite3 main.sqlite
sqlite> DELETE FROM blog_comments WHERE is_spam AND NOT visible;
sqlite> vacuum;
sqlite> .quit
$ ls -lh
-rw-rw----  1 mikeboers mikeboers 19905536 Sep  7 16:35 bak.sqlite
-rw-rw----  1 mikeboers mikeboers  2811904 Sep  7 16:37 main.sqlite

17MB of my 20MB database was spam comments!

In my first post I outlined the various methods of spam detection: manual auditing, captchas, honeypots, and contextual filtering (i.e. Akismet). Lets quickly add another one of these to exponentially increase our confidence.

Read more... (1 minute remaining to read.)

Posted . Categories: .

This commit marks the end of an era for me -> https://github.com/mikeboers/Nitrogen/blob/master/README.md . Farewell, my dear web framework.

@mikeboers on . Visit on Twitter.

Categories: .

Friendlier (and Safe) Blog Post URLs

Until very recently, the URLs for individual blog posts on this site looked something like:

http://mikeboers.com/blog/601/friendlier-and-safe-blog-post-urls

The 601 is the ID of this post in the site's database. I have always had two issues with this:

  1. The ID is meaningless to the user, but it is what drives the site.
  2. The title is meaningless to the site (you could change it to whatever you want), but it is what appears important to the user.

What they would ideally look like is:

http://mikeboers.com/blog/friendlier-and-safe-blog-post-urls

But since I tend to quickly get a new post up and then edit it a dozen times before I am satisfied (including the title) the URL would not be stable, and implementations I have seen in other blog platforms would force the URL to retain the original title of the post, not the current title.

So I have come up with something more flexible that gives me URLs very similar to what I want, but allow for (relatively) safe changes in the title of the post (and therefore the URL).

Read more... (2 minutes remaining to read.)

Posted . Categories: .

I don't like needing to patch live websites from a terminal on my phone, but I really appreciate that I can.

@mikeboers on . Visit on Twitter.

Categories: .

I'm starting to transition all my sites from Apache/FastCGI to nginx/gunicorn for the asynchronous deliciousness.

@mikeboers on . Visit on Twitter.

Categories: .

@akismet Just let me know (out of almost nowhere) how I could be using their API more effectively. Amazing support! http://mikeboers.com/blog/572/Cleaning-Comments-with-Akismet

@mikeboers on . Visit on Twitter.

Categories: .

Cleaning Comments with Akismet

My site recently (finally) started to get hit by automated comment spam. There are few ways that one can traditionally deal with this sort of thing:

  1. Manual auditing: Manually approve each and every comment that is made to the website. Given the low volume of comments I currently have this wouldn't be too much of a hassle, but what fun would that be?
  2. Captchas: Force the user to prove they are human. ReCaptcha is the nicest in the field, but even it has been broken. But this doesn't stop human who are being paid (very little).
  3. Honey pots: Add an extra field1 to the form (e.g. last name, which I currently do not have) that is hidden by CSS. If it is filled out one can assume a robot did it and mark the comment as spam. This still doesn't beat humans.
  4. Contextual filtering: Use Baysian spam filtering to profile every comment as it comes in. By correcting incorrect profiles we will slowly improve the quality of the filter. This is the only automated method which is able to catch humans.

I decided to go with the last option, as offered by Akismet, the fine folks who also provide Gravatar (which I have talked about before). They have a free API (for personal use) that is really easy to integrate into whatever project you are working on.

Now it is time to try it out. I've been averaging about a dozen automated spam comments a day. With luck, none of them will show up here.

*crosses his fingers *

Update:
I was just in touch with Akismet support to offer them a suggestion regarding their documentation. Out of nowhere they took a look at the API calls I was making to their service and pointed out how I could modify it to make my requests more effective in catching spam!

That is spectacular support!


  1. The previously linked article is dead as of Sept. 2014. 

Posted . Categories: .

RoboHash and Gravatar

I recently discovered a charming web service called RoboHash which returns an image of a robot deterministically as a function of some input text. Take a gander at a smattering of random robots:

These would make an awesome fallback as an avatar for those without a Gravatar set up, since it will always give you the same robot if you enter the same email address. So of course I implemented it for this site!

Read more... (1 minute remaining to read.)

Posted . Categories: .

Upgrades

MathJax, open source licenses, and more!

I have pushed a lot of changes to my website in the last week.

\[ J_\alpha(x) = \sum_{m=0}^\infty \frac{(-1)^m}{m! \, \Gamma(m + \alpha + 1)}{\left({\frac{x}{2}}\right)}^{2 m + \alpha} \]

Hopefully that doesn't look like a jumble of backslashes. *crosses his fingers*

Posted . Categories: .

Seems like #Firefox has restored #WebSockets and added Server-Sent-Events. Hooray! Now we just have to wait a decade for IE to catch up.

@mikeboers on . Visit on Twitter.

Categories: .
View posts before August 17, 2011