Dictionary Building for Word-Search
Mining Wikipedia for a "geek" lexicon.
One of the local pubs styles itself after geek/nerd culture (e.g. sci-fi, fantasy, and board games). The back of the coasters feature a word-search. It reports to contain 45 "geek words and phrases":
Writing code to solve a word-search isn't particularly tricky (if you remember to use a prefix trie) as long as you have a list of words to find, but in this case we are given no such clues.
But, since we are tremendously lazy, how can we solve this with code anyways?
First, I spent a few minutes searching the old-fashioned way (and not particularly well at that):
We have isolated some Star Trek, Star Wars, Game of Thrones, Alien, and Predator. Ergo, lets mine those terms to get a vocabulary.
I wrote a Wikipedia scraper in Python that can recursively walk from a starting point (caching pages at is goes for successive runs), and output the unique wordlist from that set of documents:
$ python wp_fetch.py -d 0 -o depth0.txt 'Star Trek' 'Star Wars' 'Game of Thrones' 'Alien (film)' 'Predator (film)' Star Trek 79383 Star Wars 114584 Game of Thrones 86865 Alien (film) 116904 Predator (film) 32345 $ wc depth0.txt 5335 5335 43433 depth0.txt
We collected 5335 words! Lets quickly OCR the puzzle, write a word-search solver in Python, and try it with our found dictionary:
$ python solve.py -w depth0.txt puzzle.txt Puzzle is 19 by 19 Loading depth0.txt 5335 words <manually snipped to remove lots of garbage results> * 2,11 E: ANDROID * 3, 8 E: PREDATOR * 4,11 E: DROID * 5,14 W: ROBOT * 6,16 W: WIZARD * 7, 2 SE: ENTERPRISE * 9, 5 E: HERO * 9,14 E: STARWARS * 10, 2 SW: ALIEN * 11,15 N: BATTLES * 12,12 SE: DRAGON * 13, 7 SW: MASTER * 15,16 E: WOLF * 17,13 N: STARTREK * 18, 2 SW: BEAST
That found a few more words for us (and a few times as much junk, but this isn't a very smart solver), but we are really missing quite a bit. Lets recurse once into Wikipedia:
$ python wp_fetch.py -d 1 -o depth1.txt 'Star Trek' 'Star Wars' 'Game of Thrones' 'Alien (film)' 'Predator (film)' | wc 2398 9038 57852 $ wc depth1.txt 61245 61245 561530 depth1.txt
That found 12 times as many unique words. Hopefully many more of them will be related to the topics at hand. Lets see!
$ python solve.py -w depth1.txt puzzle.txt Puzzle is 19 by 19 Loading depth0.txt 61245 words <manually snipped for interesting results> * 0,12 N: GOBLIN * 1,12 SE: HOBBIT * 2,11 E: ANDROID * 6,16 W: WIZARD * 8,14 N: LORDOFTHERINGS * 9,17 NW: SAURON * 10, 0 SE: LANNISTER * 11,15 N: BATTLESTAR * 12, 0 S: DUNGEON * 12,13 SW: WALKER * 16, 3 S: DARTHVADER * 18, 0 SW: TOLKIEN * 18,14 N: BRADBURY
Still only 35 words of the reported 45, but we found some words (e.g. "SAURON", "WIZARD", "WALKER") for which we didn't even list the source material. Score!