Automating the Wikipedia game
October 20, 2016
In the Wikipedia game, you typically click the random article button and aim for another Wikipedia article by clicking on the fewest links possible. At a recent Oxford CompSoc event we wrote bots to automate this process.
Before the event I wrote up a few Python scripts to play the game by using the MediaWiki API. There is an API call to fetch a list of links for a given page, but it unfortunately lists them alphabetically rather than in the order they appear. I therefore used the general API for fetching the MediaWiki markup for the page and scraped the links instead. This meant I could use a simple strategy of clicking the first link to get to Philosophy, and then using a known route from there to our goal page. Effectively this just performed a depth first search on the graph of links.
Most people chose to use and adapt this strategy, often by prioritising certain frequently linked to articles and using known routes from those articles, although at least a couple of people experimented with backlinks.
Whilst my “philosophy” strategy tended to reach the goal article within around forty links, the best strategy used a breadth-first search (capped at the first five links of an article) with certain well known routes used to reach the goal at the end. This was still a largely uninformed search, but by “knowing” the category of a Wikipedia article you could almost certainly reduce the number of clicks that you need.
Everyone stuck to search strategies and the Wikipedia API, rather than using the brute force approach of downloading a dump of all Wikipedia’s text, and running Dijkstra’s on it locally.