Was it my way, or was it highway? An introduction to map matching.

To make sense of any data it's often important to be able to connect it to the real world. When a vehicle sends its location to the server, it's important to be sure where the vehicle actually is, even if the received information isn't perfect. The problem we'll tackle here is called map matching. Map matching (sometimes called road snapping) is a process of matching the received vehicle coordinates with most likely road segment of a digital road map. Route reconstruction is a process of guessing the most probable route the vehicle traveled, based on the received coordinates (and maybe some other factors).

The use of map matching (and usually more importantly, of route reconstruction) is wide, and if it wasn't such a hard problem, would be even higher. From personal bike riding logs to fleet tracking services, live traffic services and driving range estimations - all of them have use of it. And, at the end, it's visually more appealing than a set of scattered points on a map.

Let's see an example:

Red arrows on the image show the original GPS positions and the direction vehicle is moving, while the blue ones show the result of map matching and the lines that connect them show the result of route reconstruction algorithm. In real life, GPS location received from tracking devices is imprecise - the actual location normally varies a couple of meters, and it can be even more. This might happen for various reasons like small amount of visible satellites which overlap each other or signals bouncing off tall buildings around us. Devices usually provide this information along with other data in some form, usually just a number that the user should interpret. It could be something like "confidence", "precision", "accuracy" as a percent, estimated error in metres or just a grade.

The good measure of precision which is often used is called HDOP. HDOP is short for
horizontal dilution of precision - a number telling us the confidence of the position we received. Without getting into the details, the smaller, the better. Ideal value is 0, values up to 5 are considered accurate, and values above 20 are considered very inaccurate. For example, we in Mireo ignore all positions with HDOP bigger than 20. For an interested reader, more about HDOP and other DOPs can be found here.

Another common set of problems is that sometimes it's impossible to have a good GPS position for the sole reason there is no way that our receiver can receive the position. Common case would be that we are underground - in a garage, shopping mall or in a tunnel. Another common case is vehicle being on a ferry, which can result in some very unusual/weird/uncommon observations/sensor readings (at once). Vehicle travels, but its engine isn't on, and it's not connected to any road(s).

With all that in mind, another key factor in map matching is the frequency of data. If the data is too sparse, it's hard (or even impossible) to determine vehicle's real position and its likely route. It sounds reasonable to expect for vehicle to send the data as often as possible, but that introduces new sort of problems. Some of them are data overload - servers need to process huge amounts of data while they could give the same results with less, which introduces higher production costs, and as well a possibly slower result. Also, as the data transfered is usually done via GPS carriers, the cost of sending the data nonstop could be unreasonably high.

Map matching is a computation-intensive task and often a stumbling block for most systems. Let's see the general approaches usually considered in the standard literature to see why:
  • Geometric analysis focuses only on the geometry of the map and ignore the road connectivity. We can try matching our position to another point on the map (point-to-point), to a curve (point-to-curve) or even our trajectory to a curve (curve-to-curve). This method is simplest and fastest, but lacks in accuracy.
  • Topological analysis takes road connectivity into consideration on top of geometrical analysis. Points are connected to the segment which best matches distance and orientation, only considering roads that are quickly reachable from the current segment.
  • Probabilistic methods take position confidence into consideration as well as other factors mentioned previously. Such method might consider candidates within the error region of the position's precision (mentioned above), grade them based on their distance from the received point, direction (often called heading) relative to the road on which candidate lies and connectivity to the previous point.

Having all that in mind, we can see why map matching can be challenging. While computational geometry isn't something unheard of, it's still CPU intensive - calculating distances between points and curves, projections of points to the roads, closest points on a road and more takes some time. Doing that for several candidates, and evaluating them takes some more.

Let's consider the same image from the start of this article with some more points received:

The result on the first image might have looked erroneous as the northern road might be a better match for the received positions - but further route reconstruction gave us a more probable result.

Here comes another important aspect of the map matching algorithm - is it working
live (online) or offline?

In an offline mode, we can take our time to analyze a history of positions and reconstruct most probable locations and routes for the vehicle. While it is CPU intensive, it can be done in bulk (batches) with multiple processors. That way, by increasing the computational power of a system, we can solve this problem fairly well. There is a problem to consider - in this case a part of history either isn't available (it wasn't processed yet) or can be changed when it's processed. In either case, displaying vehicle's history or generating reports has proven to be a frustrating experience.

In a
live system which wants to display and use the matched data in (almost) real-time, the problem becomes more apparent. Now we cannot solve the problem in batches, we need to solve it as it happens, and it's even more so important to have a good algorithm and infrastructure to support it. Taking us slightly back to the image above, if the route ended after first eight points, we could (and probably should) have taken the northern road as a result. As it's the case, it would have been wrong. Here comes a compromise - how fast do we consolidate the result in the balance of giving ourselves an option to "change our mind" and take the low road. Having the data available momentarily it's possible to use it to your advantage - creating real-time alarms, displaying the vehicle precisely on the map and generating consistent reports.

We in Mireo do it
live, with our own version of probabilistic method which uses our routing system, geometry, digital map and some more math. In the next blog we'll dive deeper into fine details of the process itself.