Being new to OpenStreetMap, I couldn’t help but be reminded of the Total Perspective Vortex in the classic Hitchhiker’s Guide.
The Total Perspective Vortex derives its picture of the whole Universe on the principle of extrapolated matter analyses. To explain – since every piece of matter in the Universe is in some way affected by every other piece of matter in the Universe, it is in theory possible to extrapolate the whole of creation – every sun, every planet, their orbits, their composition and their economic and social history from, say, one small piece of fairy cake.
The man who invented the Total Perspective Vortex did so basically in order to annoy his wife.
Trin Tragula – for that was his name – was a dreamer, a thinker, a speculative philosopher or, as his wife would have it, an idiot. And she would nag him incessantly about the utterly inordinate amount of time he spent staring out into space, or mulling over the mechanics of safety pins, or doing spectrographic analyses of pieces of fairy cake.
“Have some sense of proportion!” she would say, sometimes as often as thirty-eight times in a single day.
And so he built the Total Perspective Vortex – just to show her.
And into one end he plugged the whole of reality as extrapolated from a piece of fairy cake, and into the other end he plugged his wife: so that when he turned it on she saw in one instant the whole infinity of creation and herself in relation to it. To Trin Tragula’s horror, the shock completely annihilated her brain; but to his satisfaction he realized that he had proved conclusively that if life is going to exist in a Universe of this size, then the one thing it cannot afford to have is a sense of proportion.
Douglas Adams, The Restaurant at the End of the Universe, Chapter 11
Downloading the planet-latest file felt like peeking into the vortex. As OSM continues to gain momentum, it gets harder and harder to maintain a sense of proportion without adverse effects on one’s sanity. So if a dataset of this size is to exist, we simply cannot afford to have a sense of proportion. We must allow for a variety of helpful slivers of the planet to help us represent its entirety in a variety of situations.
As already established, OSM is massively, unapologetically HUGE(ly awesome)!!! We all want, love, take part in, and encourage this. Consequently, this leads to us having less and less insight into what is actually in there. It’s hard to gauge coverage, consistency, and accuracy of the data when said data represents so many things to such a diverse community. It is on that diverse community then, to extract valuable subsets of the planet and focus on the quality of that subset independent from the rest of the data.
Mapzen has done just that, and we have chosen administrative boundaries as a starting point. This decision was inspired by the prior work of David Blackman with Foursquare – you can learn more about his efforts here:
Administrative boundary subset of OSM data has tremendous value in a variety of geo problems, so it seemed like a great place to start.
So what have we done exactly? We used osmfilter to extract all relations that have a combination of
"admin_level"=* tags. You can read all about the meaning of these tags on the OSM wiki. We then threw those relations into the clutches of osmium and converted each one to MultiPolygons in GEOJSON files. The planet extract contains a file per
admin_level value, for example
admin_level_2.geojson as well as
admin_level_other.geojson for all the polygons where
admin_level value was non-numeric.
To take it a step further, we decided to also slice that planet boundary data up into manageable country chunks, since often geo problems are highly localized and only require data for a specific country. We had to somehow programmatically decide which countries should be extracted, so why not use the planet boundaries we just extracted? All it took was grabbing the
admin_level_2.geojson from the planet borders, which indicates countries, and pulling out anything that had a
flag tag. This isn’t perfect, but it works surprisingly well, and is what some might call “eating our own dogfood”.
This GIF shows the increasingly detailed admin boundaries of Japan, from
admin_level 2 to 7. (There are three more, running down to the town and neighborhood level, but they are not readily visible at this scale.)
We’ve made this data publicly available for download at mapzen.com/data/borders in the hopes that it will help people in need of such data. More importantly however, we hope that exposing this data will drive the community back to OSM to improve its quality. In order to highlight some of the problems we encounter during the extract process, we’ve included an errors.json file in the planet extract. So if a border you’re looking for isn’t present, check the errors to see if there is something you can do in OSM to fix it. We will continue to run the extraction process monthly to ensure data improvements are reflected.
In addition to the extracts being available for download, we’ve also made the tools used to generate these extracts open-source and available here:
Stay tuned for future posts on how to use the tools to make custom region extracts. As always, we welcome contributions from the community.
This border extract example is just the beginning –– it’s the beginning of a conversation about valuable subsets of goodness hiding within OSM data. These data sets need attention in order to make them complete, accurate, and all around awesome. Sets such as neighborhoods, postal codes, and coastlines are just a few of the other potential extracts to consider. We’d love to hear what you think is important to extract out of OSM and improve as a community.
I should mention that there is one concern to us gaining insight. According to the Hitchhiker’s Guide, “There is a theory which states that if ever anyone discovers exactly what the Universe is for and why it is here, it will instantly disappear and be replaced by something even more bizarre and inexplicable.” Let’s hope that’s not the case with OpenStreetMap!