So, what’s new in Who’s On First?
Before the year’s end, I wanted to take a few moments to describe three issues that we’ve recently tackled in Who’s On First. While many of the data issues in Who’s On First can be complex, each issue has to be addressed with a unique set of tools, a specific workflow, and should follow simple rules that we’ve outlined to ensure our changes and adjustments can be added to the database.
We’ve tackled many large-scale issues in 2017, from the Statoids and Mesoshapes imports, to updating neighbourhood records, to the GeoNames name localization work. I wanted to outline a few of the smaller issues we’ve dealt with in the past few months, and describe what it took to fix the issues.
The issues are below, each with a brief description of the problem and solution. Take a look and enjoy!
Santa Barbara Neighbourhoods
In October, a user reported that the record for the neighbourhood of Midco in California had a large geometry. So large, in fact, the geometry spanned some 50+ miles, from Santa Maria, CA to Santa Barbara, CA. This geometry was causing the Who’s On First API to return peculiar hierarchy results for this area.
This geometry issue was two-fold; first, we needed to shrink the size of the Midco neighbourhood geometry down to a size that actually represents Midco and secondly, we needed to find a source for new neighbourhood records in Santa Barbara (Midco covered the entire city of Santa Barbara).
Once Midco was cut down to size, openly licensed neighbourhood data on Santa Barbara’s website was found. After downloading the neighbourhood data, I was able to mint new
wof:id values (thank you, Brooklyn Integers for new Who’s On First records), map the source properties into applicable Who’s On First properties, and import them as new records!
To download Santa Barbara neighbourhoods, use our Bundler tool, here or take a look at the geometries below!
Above: New neighbourhood records for Santa Barbara, CA.
While the type of geometry and property updates in Australia are not new to Who’s On First, they represent the type of updates that many Who’s On First records have received in the past and what many records will receive in the future. In a pull request in the
whosonfirst-data repository, new geometries were added to the records for regions in Australia.
To do this, the source data from Australia Open Data was loaded into QGIS. A new property field was added to the source data with the corresponding
wof:id for each feature; this is essentially allows us to “link” the new source data features to the existing Who’s On First records. Then, using a simple Python script, the geometries for each Who’s On First record were updated and existing geometries were saved as alt-geometry files.
And, just like that, Australia regions were updated!
An example of these geometry updates is shown below. Zooming out, you’ll see the older, less detailed geometry we maintained for Christmas Island. Zooming in, you’ll see the new, more detailed geometry - quite the improvement!
Above: Comparison of region geometries for Christmas Island.
Leningrad and St Petersburg
Piggy-backing on the historical administrative work we started in Yugoslavia, we adjusted records in and around St. Petersburg, Russia to reflect the administrative changes in that area since the Russian Revolution.
An issue was filed in the
whosonfirst-data repository, letting us know there were two distinct records in Who’s On First for St. Petersburg - one for St. Petersburg and one for Leningrad. Upon further inspection, we actually had four duplicate records in that area, all of which were merged, superseded, and updated.
Take a look at the record for the St. Petersburg locality, here.
And, with that, I hope this is given you insight into the daily workings of geometry and property issues in Who’s On First!
Photo Credit: Carol VanHook. (Flickr, CC BY 2.0)