Post Archive

Page 2 from July 21, 2014 to December 05, 2014.

Repairing Python Virtual Environments

When upgrading Python breaks our environment, we can rebuild.

Python virtual environments are a fantastic method of insulating your projects from each other, allowing each project to have different versions of their requirements.

They work (at a very high level) by making a lightweight copy of the system Python, which symlinks back to the real thing whenever necessary. You can then install whatever you want in lib/pythonX.Y/site-packages (e.g. via pip), and you are good to go.

Depending on what provides your source Python, however, upgrading it can break things. For example, I use Homebrew, which (under the hood) stores everything it builds in versioned directories:

$ readlink $(which python)
../Cellar/python/2.7.8_2/bin/python

Whenever there even a minor change to this Python, symlinks back to that versioned directory may not work anymore, which breaks my virtual environments:

$ python
dyld: Library not loaded: @executable_path/../.Python
  Referenced from: /Users/mikeboers/Documents/Flask-Images/venv/bin/python
  Reason: image not found
Trace/BPT trap: 5

There is an easy fix: manually remove the links back to the old Python, and rebuild the virtual environment.

Posted on December 05, 2014. Categories:

The Evils of Gamifying Git

Your green squares do you little good, and encourage bad behaviour.

Nearly two years ago, GitHub introduced contribution calendars on everyone's profile, which roughly visualize how frequently one has been "contributing" for the past year. Through 2013, mine displayed some interesting patterns and features, many of which scream that they have a story:

Having used GitHub as the primary code host for multiple full-time jobs, and a few growing open-source projects, I now believe that these calendars introduce, for me, two negative effects that vastly outweigh their benefits.

Posted on October 26, 2014. Categories:

Checking IMAP for a Pulse

When you don't trust your mail client to tell you everything.

I have been having trouble receiving email from Gmail via IMAP today. The Google Apps status dashboard does not reveal anything is wrong, so lets go spelunking!

We use openssl to make the connection to transparently handle the TLS connection:

$ openssl s_client -crlf -connect imap.gmail.com:993
* OK Gimap ready for requests from 172.218.163.116 zh1mb199258645pbc

We have a connection! The -crlf is critical for Gmail as the line endings are important to the "Gimap" server (and the protocol in general, but other servers I have tested are more accepting). If you weren't using encryption, you would nc imap.gmail.com 143 instead of using OpenSSL.

Lets poke the server a little.

Posted on October 14, 2014. Categories:

When Normals Aren't

Classifying attributes as "normal" so Houdini will transform them.

I've been building a VFX pipeline for Shadows in the Grass that allows for artists to work in whatever applications they are the most skilled.

One little thing which has been a bit annoying is that somewhere between Blender and Houdini (of all pairings), geometry transferred via Collada does not have properly qualified normals. This results in some wacky behaviour when you start transforming your models:

Note that when spinning the above sphere, the normals are getting dragged along with the surface, but they remain pointing in the same direction in world space; they are not spinning with the surface. A very obvious effect of this is that the light appears to be spinning around the sphere, even though the light is stationary.

There is a simple enough fix, however.

Posted on October 07, 2014. Categories:

Houdini
/ VFX

Digital Ocean is Stingy on the Swap

Sometimes a little swap space is all you need, but you have to put in a little effort for it.

I've been provisioning a pile of tiny VPSes (from Digital Ocean) for tiny web services for the last few weeks. While tuning one such site, I made an incorrect assumption that caused MySQL to fall over: Digital Ocean boxes default to having no swap space.

Assuming you want a little bit of leeway in your memory limits, it is easy to add some swap:

# Create a 1GB blank disk image.
dd if=/dev/zero of=/var/swap.img bs=1M count=1024

# Activate it as swap space.
mkswap /var/swap.img
swapon /var/swap.img

# Set it to activate at startup.
echo "/var/swap.img    none    swap    sw    0    0" >> /etc/fstab

Posted on September 25, 2014. Categories:

devops
/ webdev

Streamlining MySQL Authentication

Quick tip for easy access.

I've gotten far too used to Postgres' ability to authenticate you by your system uid, and tire of the continual copy-paste of massive passwords for my MySQL servers.

However, there is a way to streamline this: create a .my.cnf file in your home that looks like:

[client]
user=myname
password=mypassword

Just make sure that you are the only one who can read it (chmod go= .my.cnf), and you are good to go!

Posted on September 11, 2014. Categories:

Catching Comment Spam in a Honeypot

How a tempting target can reveal automated spammers.

A couple years ago I wrote about using Akismet to catch spam.

Since then, Akismet has successfully captured tens of thousands of spam comments to this site. However, since I'm not comfortable completely accepting the results from a Baysian filter, I've dutifully been stuffing them into my database. However, it is getting a little silly:

$ sqlite3 main.sqlite
sqlite> SELECT is_spam, count(1) FROM blog_comments GROUP BY is_spam;
0|30
1|13656

Ouch. Lets clean that out and see what happens.

$ cp main.sqlite bak.sqlite
$ sqlite3 main.sqlite
sqlite> DELETE FROM blog_comments WHERE is_spam AND NOT visible;
sqlite> vacuum;
sqlite> .quit
$ ls -lh
-rw-rw----  1 mikeboers mikeboers 19905536 Sep  7 16:35 bak.sqlite
-rw-rw----  1 mikeboers mikeboers  2811904 Sep  7 16:37 main.sqlite

17MB of my 20MB database was spam comments!

In my first post I outlined the various methods of spam detection: manual auditing, captchas, honeypots, and contextual filtering (i.e. Akismet). Lets quickly add another one of these to exponentially increase our confidence.

Posted on September 07, 2014. Categories:

Parsing Python with Python

(Ab)using the tokenize module.

A few years ago I started writing PyHAML, a Pythonic version of HAML for Ruby.

Since most of the HAML syntax is pretty straight forward, PyHAML's parser uses a series of regular expressions to get the job done. This proved generally inadequate anytime that there was Python source to be isolated, since Python isn't quite so straight forward to parse.

The earliest thing to bite me was nested parenthesis in tag definitions. In PyHAML you can specify a link via %a(href="http://example.com"), essentially treating the %a tag as a function which accepts keyword arguments. The very next thing you will want to do is include a function call, e.g. %a(href=url_for('my_endpoint')).

At this point, you are going to have A Bad Time™ with regular expressions as you can't deal with arbitrarily deep nesting. I "solved" this particular problem by scanning character by character until we have balanced the parenthesis, with something similar to:

def split_balanced_parens(line):
    depth = 0
    for pos, char in enumerate(line):
        depth += {'(': 1, ')': -1}.get(char, 0)
        if not depth:
            return line[:pos+1], line[pos+1:]
    return '', line

And things were great with PyHAML for a long time, until a number of odd restrictions starting getting in the way. For example, you can't have a closing parenthesis in a string in a tag (like %img(title="A sad face looks like ):")), you can't have a colon in a control statement, and statements can't span lines via unbalanced brackets.

If only you could use Python to tokenize Python without fully parsing it...

Posted on August 23, 2014. Categories:

Python
/ how-to

"Are we getting the same SSL cert?"

A tiny webapp to help answer that question.

A friend had an SSL scare on public WiFi while in a coffee shop today. Her browser was warning her that every SSL certificate was invalid (except for *.google.com). Eventually it stopped, and she overheard others in the coffee shop commenting that their tablet was finally able to connect (it was previously refusing).

I'm not sure, but this could be a man in the middle attack on the WiFi, in which the attacker (somehow) had a valid Google certificate and provided DNS records to point at their own machine.

In this scenario the browser is perfectly content to allow you to connect to this spoofed service. If you are not extremely familiar with SSL certificate authorities, a good way to assert a cert is not a forgery is to compare it to a known-good copy of the certificate. If the signatures match, then you are good to go.

But where can you get a known-good copy?

To answer this question, I quickly make the SSL Cert Fetcher (the source of which is available on GitHub).

Take a look at the certificates for Twitter, Google, and Facebook and see if they match what you are getting. (I'll sit here with my fingers crossed for a while.)

Posted on July 23, 2014. Categories:

development
/ security

Which Python?

Finding packages, modules, and entry points.

which is a handy utility for discovering the location of executables in your shell; it prints the full path to the first executable on your $PATH with the given name:

$ which python
/usr/local/bin/python

In a similar stream, I often want to know the location of a Python package or module that I am importing, or what entry points are available in the current environment.

Lets write a few tools which do just that.

Posted on July 21, 2014. Categories:

Python
/ how-to

View posts before July 21, 2014