Coleman McCormick

Archive of posts with tag 'Geocoding'

Addresses and Geocoding: Do New Systems Improve What We Have?

August 8, 2015 • #

There’s been a boom in the last couple years of big tech companies trying to reach to the periphery of the globe and bring Internet access to people without connectivity. Facebook is launching giant solar-powered drones with lasers, Google is floating balloons with antennae into the stratosphere, and smartphones are cheaper than ever.

The success rate of these projects is hard to quantify, it’s too early. But for the mapping industry, it’s a fact that billions of people don’t have access to the kinds of map data we have in the US or Europe, and the immaturity of infrastructure and public services like managed street addresses and quality map data are holding back the advance of mobile location-based services. E-commerce companies like Amazon and logistics providers like UPS and FedEx rely on quality geographic data to conduct business. Cities like Lagos, Dhaka, or Kinshasa are enormous booming urban centers, but still don’t have reliable addressing systems for navigating city streets.

House number address

Given the combination of expanding connectivity to disconnected places and the vacuum of reliable geodata, a number of services have sprung up in recent years with systems for global wayfinding and geocoding. The particular focus here is to bring a mechanism for providing addresses to places where there are no other alternatives. When I first read that people were building new systems for geocoding it piqued my interest, so I dug into them to see what they’re all about, and what they might be bringing to the table that we don’t already have.

The Problem

The first step in understanding the problem at hand is to lay down some definitions that differentiate an “address” from a “coordinate”. An address is an identifier for a place where a person, organization, or the like is located or can be found, while a coordinate is a group of numbers used to indicate position in space.

This fundamental difference is important because addresses only truly matter where there are people, but coordinates are universal identifiers for anywhere on the globe. A location in the center of the North Atlantic has a position in any global geographic coordinate system, but having a human-readable address isn’t important; it’s unnecessary for everyday use. Coordinate or grid systems can function as addresses, but the reverse isn’t always the case.

I thought I’d compare some different geocoding systems to see where the pros and cons are. Are they really necessary, or can we make use of existing proliferated systems without reinventing this wheel?

The “neo-addressing” systems

Coordinates in several systems

These systems all provide similar capabilities, with a primary focus of providing memorable human-friendly identifiers for places. There are others out there in the wild, but I’ll just talk about some of the prominent ones I’ve run across:

  • Mapcode - Created by a Dutch non-profit founded by former TomTom employees
  • what3words - A system based on a global grid of 3m x 3m squares, with identifiers composed of triplets of everyday words
  • Open Location Code - An open source system developed and sponsored by Google

Each of these geocoding services have similar sets of objectives: to make addresses derivable for anywhere on Earth using algorithms, assign shorter and more memorable codes than coordinate systems or postal codes, and to have codes that reduce ambiguity (not contain “O” and “0”, or by using distinctly different words and phrases). The interesting thing with all of them is that by deriving coordinates deterministically, the result can be controlled and forcefully made more human-friendly. In the case of what3words, it generates shorter and more memorable word combinations in areas with higher population density. So lives.magma.palace will take you to Philadelphia’s Independence Hall, while conservatory.thrashing.incinerated will get you to the remote Arctic islands of Svalbard. This is a clever method to optimize the pool of words for usage frequency, and obviously not something that can be controlled with traditional coordinate systems.

Algorithmic systems can also allow a user to shorten the code for a less granular location. With OLC, you can knock off the last couple characters and get a larger area containing the original location. 76VVQ9C6+ encompasses the few city blocks around our building. 76VVQ9C6+9M gets you right to my office. Because it represents an area rather than only a point, truncating to get successively larger areas is possible. Truncating a lat/lon coordinate moves the point entirely.

The what3words approach seems the most creative and truly memorable method, though it sounds sort of gimmicky. They’ve done a lot to accommodate for things like offensive words, avoiding homophones, removing ambiguous combinations, and even providing the system in several languages.

Spreading adoption for any of these systems will be an enormous challenge. They all seem to be different varieties of the same wheel. If I was developing mapping applications, which system should I support? All of them? Software developers will have to buy into one or more new systems and users will have to understand how they work.

Another issue is one of ownership. If a new scheme for addressing requires a special algorithm or code library for calculating coordinates, it should be in the public domain and serve as an open standard (if anyone expects adoption to grow). In the age of open source, no platform developer is going to license a proprietary system for generating coordinates with so many open alternatives out there. Both OLC and Mapcodes have an open license, but what3words is currently proprietary.

Let’s compare these tools to what existing coordinate schemes we already have.

Existing models, grids, and coordinate systems

USGS topographic map

Addresses in the classic sense of “123 Main St” make sense for navigation, particularly due to a hundred years of usage and understanding. When I’m searching for “372 Woodlawn Court” in my car, there are some conventions about addressing that help me get there without knowing specific geographic coordinates–odd numbers are on one side and even on the other, numbers follow a sequence in a specific direction–so people can still do some of the wayfinding themselves. Naturally this is reliant on having a trusted, known address format, but nonetheless, adoption of new geocoding systems should be valuable for everyone, not just in places without modern address systems.

How do new means of addressing physical space stack up to the pre-existing constructs we’ve had for decades (or centuries)? Do the benefits outweigh the costs of adopting something new?

Here are several of the common coordinate systems used globally for navigation and mapping:

  • Plain latitude and longitude - in decimal or degree-minute-second format
    • Example: 27.79987, -82.63402 or 27°47’59.5314” N 82°38’2.472” W
    • Pro: In use for centuries, supported across any mapping tools
    • Con: Lengthy coordinates needed to get accurate locations
  • UTM (Universal Transverse Mercator) - a grid-based map projection that segments the world into 60 east/west “zones” of 6° each, with coordinates expressed as a number of meters north of the equator and east of the zone’s central meridian (“northing” and “easting”)
  • Example: 17N 339031 3076104
    • Pro: Uses meters for measurement, great for orienteering with paper maps, nearby coordinates can be compared to measure distance easily
    • Con: Long coordinates, requires knowledge of reference zones to find position, some tools don’t support
  • MGRS (Military grid reference system) - another grid-based standard used by NATO militaries, similar to UTM, but with different naming conventions
    • Example: 17R LL 39031 76104
    • Pro: Same as UTM, somewhat more intuitive scheme with smaller grid cells
    • Con: Same as UTM
  • Geohash - an encoded system similar to the ones mentioned earlier, but the underlying algorithm has been in the public domain since 2008, and there are existing tools that already support it
    • Example: dhvnpsg9zz2
    • Pro: Existing algorithm-based system, open standard, short codes
    • Con: Not human-readable
MGRS grid coverage in the US
MGRS grid coverage in the US

These systems have some distinct advantages over building something new (and naturally some disadvantages). But I think the gains had with algorithmic libraries and services like those mentioned above aren’t enough to warrant convincing millions of people to adopt something new.

If you look back at the primary benefits of Open Location Codes or what3words, it’s memorability. I’ll grant that what3words has a leg up in this department, but the others, not so much. Is 17RLL3861573116 really that much worse than 76VVQ9F6+4V? Neither are very human-friendly to me, but at least something like MGRS has a worldwide existing base of understanding, users, and tools supporting it.

I would concede that memorability and reduced ambiguity could help to replicate the ease-of-use we get with classic addresses. But in the days of ubiquitous GPS, smartphones, and apps, people don’t realistically memorize anything about location anymore. We punch everything into a mapping app or the in-car navigation system. Given that, what benefit are we left with inventing a new system of expressing location?

I think it’s wise to spread adoption of widespread systems like MGRS or UTM before we start asking citizens of developing countries to adopt systems that no one else is using yet, even if those systems do come with some new benefits.

Other Interesting Reading

If you’re interested in reading more background on some of these systems, check out these links:

✦

An Open Database of Addresses

March 27, 2015 • #

One of the coolest open source / open data projects happening right now is OpenAddresses, a growing group effort assembling around the problem of geocoding, the process of turning human-friendly addresses into spatial coordinates (and its reverse). I’ve been following the project for close to a year now, but it seems to have really gained momentum in the last 6 months.

The project was started last year and is happening over on GitHub. It now has over 60 contributors, with over 100 million aggregated address points from 20 countries, and growing by the day. There’s also a live-updating data repository where you can download the entire OpenAddresses dataset online—it’s currently at about 1.1 gigabytes of address points.

Pinellas addresses

Here’s how it works:

Contributors identify data out in the wild online, and contribute small index files with some pointers to where the data is hosted, and some other details indicating how to merge it with the rest of the project’s data format. There’s no need to download any of the data, only find where the CSV file or web service lives and how to get to it. The technique for this is neat in its simplicity, more on this later.

It sounds weird to think something as basic as address data could be so fascinating and exciting. Most people in the geo community understand the potential impact of projects like this on our industry, but let me review for the uninitiated why this is cool.

Why care about boring addresses?

Address data is what makes almost any map useful: it connects our human-friendly identifiers for places into real locations on the ground. Almost everything that consumers do with maps these days has to do with places of interest: Foursquare checkins, Instagramming, turn-by-turn directions. Without connecting the places as we know them to actual map coordinates a computer can understand, we don’t have many useful mapping applications.

There are existing APIs and resources out there to build mapping applications that require addressing and geocoding, but none of them are open to build on. They’re proprietary systems that either have unfriendly licensing structures for use, or are costly to use. Having to pay money for a high quality geocoding service like Google’s isn’t crazy or surprising — building universally searchable and uniform address databases is insanely expensive and hard. Building good geocoding systems is one of the perennial pains in the ass of the geospatial problem set, so it’s understandable that when someone solves it, they’d want to charge for it.

There is the OpenStreetMap project, the free and open map database for the globe, which has tons of potential as a resource for geocoding. By a quick estimate, the OSM database contains something like 50 million specific address points for the globe. But its license is not compatible with most commercial requirements for republication of data, so developers looking for an open resource have had to look elsewhere. There’s still no good worldwide, open resource for address geocoding that app developers and mappers can use with no strings attached. (OSM’s license and its “friendliness” for commercial use has a long history of debate and argument in the community. It’s complicated. I’m not a lawyer.)

Address data harder than it looks

Simple data, big problem

The data that composes a postal address is pretty straightforward: house number, street name, city, admin boundary, postal code. That set of 6 properties gets you to a fixed coordinate on the Earth in most places with organized addressing schemes. Pretty simple, right?

But addressing systems are non-standard, vary widely with geography, and are actually non-existent in many countries. The data literally carpets the developed world and comes in dozens of shapes and formats, so bringing it all together into a consistent, unified whole to create a platform for applications is a huge deal.

In the US, for example, one of the biggest challenges is that there isn’t a single standardized structure for the data, and even worse, no single “owner” of address data. Sometimes data’s maintained at the county level, and sometimes the city level. One county’s GIS division will manage it, and in another it’s the E911 system manager. Then you have the challenge of finding the actual data files. It’s becoming commonplace for municipalities to publish this stuff online, but it’s far from universal. To get data for some (especially rural) counties, you better be ready to take a hard drive down to the property appraiser’s office to get the data, or pay them to burn you a CD.

To me this is where the OpenAddresses model gets interesting. The project is bringing a powerful capability for building a massive open dataset, a distributed network of contributors, and focusing their resources around a common goal. Creating a central place around which the contributors can mobilize and gradually accrete data into a larger and larger whole, that’s the unique angle to this project. Anyone with enough time and energy can go chase down hundreds of datasets, but it’s much easier when a group with a defined mission can divide and conquer — intersecting the open source contribution model with a data production line. It’s not just a platform for aggregating this data into a single database, it’s a petitioning system to start the process of tracking down the data, and to advocate for it to be made open if it currently isn’t publicly available.

OpenAddresses US status

Building the glue

The OpenStreetMap method of contribution is one where contributors are manually finding, converting, and adding data to a separate database. For addresses, this strategy makes ingesting the individual datasets and the thousands of updates per year a huge pain. OA takes a different approach. Instead of manually finding and merging all the datasets together, the main OA repository is a huge pile of index files that function as the glue between all the disparate sources out on the web and a centralized core. It’s an open source ETL system for all flavors of address datasets. People go out and find all the building blocks, and OA is the place where we write the instructions to put them all together.

The project isn’t only the data. It’s tools for working with the data, resources for teaching local advocacy for acquiring the data, and a system of ETL “glue” to bring the sources together to build a platform for other tools and creative mapping projects. Go over to the project and check it out. If you know where some address data is for your neighborhood, dive in and contribute to the effort.

✦
✦