Today we’re introducing Transitland, a community-edited data service aggregating transit networks across metropolitan and rural areas around the world. For a general overview, see Transitland: Open Transit Data for All.
Geohashes, created by Gustavo Niemeyer in 2008 and placed in the public domain, are an elegant and succinct geographic encoding. Geohashes work by reducing a two-dimensional longitude, latitude pair into a single alphanumeric string where each additional character adds precision to the location. Originally created as part of a URL-shortening service, geohashes have proven useful in a variety of contexts, including unique identifiers, spatial indexing, and search.
Transitland uses geohashes as a prime component of Onestop IDs, our system for creating stable identifiers for public transit operators, routes, and stops. The short length and arbitrary precision of geohashes are extremely useful for approximately geolocating millions of transportation locations. For example, the geohash “9q8znb12j1” is the location of the San Francisco Embarcadero subway stop, and forms part of the Onestop ID “s-9q8znb12j1-embarcadero”.
Constructing a geohash
The beauty of a geohash is in how it’s constructed. In brief, geohashes are a type of grid spatial index, where the world is recursively divided into smaller and smaller grids with each additional bit.
Start with the entire planet, and divide it in half along the Prime Meridian—the first half containing latitudes in the range of (-180,0) as 0 and the second half containing the range (0,180) as 1. The half containing your point becomes the first bit.
Now, divide that half again along the Equator, and add the second bit. Repeat this subdivision, alternating between longitude and latitude, until the remaining area is within the precision desired. Finally, encode the resulting binary sequence using the geohash base 32 character map to create the final geohash string. For example, the longitude, latitude coordinate (37.77564, -122.41365) results in the binary sequence “0100110110010001111011110” and produces the geohash “9q8yy”.
Decoding a geohash
Another way of thinking about about geohashes are as interleaved longitude (even bits) and latitude (odd bits). This provides a simple method for converting a geohash back into longitude and latitude.
As above, each additional bit divides the area in half in a binary search. Latitude begins with a range of (-90, 90) degrees; the first bit of 1 reduces this to (0, 90); the second bit of 0 to (0, 45); the third bit of 1 to (22.5, 45); and so on until with the 12th bit, the range is (37.753, 37.797) with a midpoint of 37.775.
In this case, the width of the area encoded by 12 bits of latitude is +/- 0.022 degrees, or approximately 2.4 km across at the equator. A longer geohash would encode more bits of precision; for example, increasing the geohash from 5 characters to 8 characters would increase latitude to 20 bits, reducing error to +/- 0.00085 degrees, or +/- 0.019 km.
As each character encodes additional precision, shared prefixes denote geographic proximity.
|New York City||dr5regw2z6y||40.71273||-74.00599|
You might notice that all the California locations all begin with “9q”, and that Oakland and Berkeley share the “9q9p” prefix. Shared prefixes allow nearby locations to be identified through very simple string comparisons and efficient searching of indexes.
However, it is important to note the converse is not true! The London and Greenwich geohashes are only about 6 km apart as the crow flies—but have no common prefix. This is because the Prime Meridian separates the two locations, dividing these two geohashes from with the very first flipped bit. Fortunately, it is straightforward to calculate the 8 neighbors for a given geohash. For example, the Southeast neighbor of “gcpv” is “u10h”, successfully jumping across this pesky imaginary boundary.
Searching an area
This method of expanding a geohash to include its neighbors can also be used to efficiently find the set of geohash prefixes to search an area. To find all prefixes within approximately 1 mile of the San Francisco geohash “9q8yykv551w”, first shorten the geohash to 6 characters, “9q8yyk”, then add in all 8 neighbors: 9q8yy7, 9q8yyt, 9q8yy5, 9q8yys, 9q8yym, 9q8yyj, 9q8yyk, 9q8yyh, 9q8yye. Any point in this area is known to start with one of these prefixes.
At Transitland, we use this property to specify simple bouding boxes for transit operators as part of a Onestop ID. For example, the SFMTA has over 3500 bus stops. As described above, it is quite possible that there is no common geohash prefix shared by every stop in the system (or, if there is, the be so short as to be limited usefulness). In the case of the SFMTA, only “9q” prefix is shared in common, which covers a very large portion of the Southwestern United States. However, the geohash prefix “9q8y”, when expanded to include neighbors, provides the 9 prefixes that can be used to find any stop, while covering a much smaller land area. This “neighbors implied” geohash is then used in the Onestop ID for SFMTA: “f-9q8y-sfmta”.
Start where you are
Try the Transitland Playground, a simple and attractive interface to browse transit operators, stops, and routes, including their Onestop IDs and geohashes.
Use our Python library to compute geohashes and their neighbors.
And do keep in touch
Transitland API and documentation links updated on May 3, 2016.