Blog Archive

Viewing page 1 from archive of September 2014

CommonMark's olive branch FAQ changing from "Yes" to "We used to think so" was the saddest signal through the years. http://commonmark.org/

@mikeboers on . Visit on Twitter.


Digital Ocean is Stingy on the Swap

Sometimes a little swap space is all you need, but you have to put in a little effort for it.

I've been provisioning a pile of tiny VPSes (from Digital Ocean) for tiny web services for the last few weeks. While tuning one such site, I made an incorrect assumption that caused MySQL to fall over: Digital Ocean boxes default to having no swap space.

Assuming you want a little bit of leeway in your memory limits, it is easy to add some swap:

# Create a 1GB blank disk image.
dd if=/dev/zero of=/var/swap.img bs=1M count=1024

# Activate it as swap space.
mkswap /var/swap.img
swapon /var/swap.img

# Set it to activate at startup.
echo "/var/swap.img    none    swap    sw    0    0" >> /etc/fstab
Posted . Categories: .

Twitter PSA: The ads dashboard offers #analytics on your tweets, even if you don't pay for any ads. See top of https://ads.twitter.com

@mikeboers on . Visit on Twitter.


"You've blocked <FacebookDouchebag>. We're sorry that you've had this experience." Aww... thanks, #Facebook! *hugs*

@mikeboers on . Visit on Twitter.


I look forward to all major software bugs having a snappy name and logo like #Heartbleed and #Shellshock. #netsec

@mikeboers on . Visit on Twitter.


Streamlining MySQL Authentication

Quick tip for easy access.

I've gotten far too used to Postgres' ability to authenticate you by your system uid, and tire of the continual copy-paste of massive passwords for my MySQL servers.

However, there is a way to streamline this: create a .my.cnf file in your home that looks like:

[client]
user=myname
password=mypassword

Just make sure that you are the only one who can read it (chmod go= .my.cnf), and you are good to go!

Posted . Categories: .

Catching Comment Spam in a Honeypot

How a tempting target can reveal automated spammers.

A couple years ago I wrote about using Akismet to catch spam.

Since then, Akismet has successfully captured tens of thousands of spam comments to this site. However, since I'm not comfortable completely accepting the results from a Baysian filter, I've dutifully been stuffing them into my database. However, it is getting a little silly:

$ sqlite3 main.sqlite
sqlite> SELECT is_spam, count(1) FROM blog_comments GROUP BY is_spam;
0|30
1|13656

Ouch. Lets clean that out and see what happens.

$ cp main.sqlite bak.sqlite
$ sqlite3 main.sqlite
sqlite> DELETE FROM blog_comments WHERE is_spam AND NOT visible;
sqlite> vacuum;
sqlite> .quit
$ ls -lh
-rw-rw----  1 mikeboers mikeboers 19905536 Sep  7 16:35 bak.sqlite
-rw-rw----  1 mikeboers mikeboers  2811904 Sep  7 16:37 main.sqlite

17MB of my 20MB database was spam comments!

In my first post I outlined the various methods of spam detection: manual auditing, captchas, honeypots, and contextual filtering (i.e. Akismet). Lets quickly add another one of these to exponentially increase our confidence.

Read more... (1 minute remaining to read.)

Posted . Categories: .

Parsing Python with Python

(Ab)using the tokenize module.

A few years ago I started writing PyHAML, a Pythonic version of HAML for Ruby.

Since most of the HAML syntax is pretty straight forward, PyHAML's parser uses a series of regular expressions to get the job done. This proved generally inadequate anytime that there was Python source to be isolated, since Python isn't quite so straight forward to parse.

The earliest thing to bite me was nested parenthesis in tag definitions. In PyHAML you can specify a link via %a(href="http://example.com"), essentially treating the %a tag as a function which accepts keyword arguments. The very next thing you will want to do is include a function call, e.g. %a(href=url_for('my_endpoint')).

At this point, you are going to have A Bad Timeā„¢ with regular expressions as you can't deal with arbitrarily deep nesting. I "solved" this particular problem by scanning character by character until we have balanced the parenthesis, with something similar to:

1
2
3
4
5
6
7
def split_balanced_parens(line):
    depth = 0
    for pos, char in enumerate(line):
        depth += {'(': 1, ')': -1}.get(char, 0)
        if not depth:
            return line[:pos+1], line[pos+1:]
    return '', line

And things were great with PyHAML for a long time, until a number of odd restrictions starting getting in the way. For example, you can't have a closing parenthesis in a string in a tag (like %img(title="A sad face looks like ):")), you can't have a colon in a control statement, and statements can't span lines via unbalanced brackets.

If only you could use Python to tokenize Python without fully parsing it...

Read more... (1 minute remaining to read.)

Posted . Categories: .

"Are we getting the same SSL cert?"

A tiny webapp to help answer that question.

A friend had an SSL scare on public WiFi while in a coffee shop today. Her browser was warning her that every SSL certificate was invalid (except for *.google.com). Eventually it stopped, and she overheard others in the coffee shop commenting that their tablet was finally able to connect (it was previously refusing).

I'm not sure, but this could be a man in the middle attack on the WiFi, in which the attacker (somehow) had a valid Google certificate and provided DNS records to point at their own machine.

In this scenario the browser is perfectly content to allow you to connect to this spoofed service. If you are not extremely familiar with SSL certificate authorities, a good way to assert a cert is not a forgery is to compare it to a known-good copy of the certificate. If the signatures match, then you are good to go.

But where can you get a known-good copy?

To answer this question, I quickly make the SSL Cert Fetcher (the source of which is available on GitHub).

Take a look at the certificates for Twitter, Google, and Facebook and see if they match what you are getting. (I'll sit here with my fingers crossed for a while.)

Posted . Categories: .
View posts before July 23, 2014