How a tempting target can reveal automated spammers.
Since then, Akismet has successfully captured tens of thousands of spam comments to this site. However, since I'm not comfortable completely accepting the results from a Baysian filter, I've dutifully been stuffing them into my database. However, it is getting a little silly:
$ sqlite3 main.sqlite sqlite> SELECT is_spam, count(1) FROM blog_comments GROUP BY is_spam; 0|30 1|13656
Ouch. Lets clean that out and see what happens.
$ cp main.sqlite bak.sqlite $ sqlite3 main.sqlite sqlite> DELETE FROM blog_comments WHERE is_spam AND NOT visible; sqlite> vacuum; sqlite> .quit $ ls -lh -rw-rw---- 1 mikeboers mikeboers 19905536 Sep 7 16:35 bak.sqlite -rw-rw---- 1 mikeboers mikeboers 2811904 Sep 7 16:37 main.sqlite
17MB of my 20MB database was spam comments!
In my first post I outlined the various methods of spam detection: manual auditing, captchas, honeypots, and contextual filtering (i.e. Akismet). Lets quickly add another one of these to exponentially increase our confidence.
There are no more posts tagged "python".