Thoroughfare

Steps to the rhythm of sulfur Casting deadly spins through the air The walk kept going and cursing bones and shins Feet alarmed but not crying catastrophe or unalloyed misery being close enough to…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




The Curious Case of the Sorted Array

How an understanding of geohashes and microprocessor architecture can speed up code — without changing it.

Microprocessors. Those tiny circuits with their billions of transistors working together at crazy speeds. As a software engineer, we tend to think of them as an engine to get our work done, often sitting in the cloud. Sure, we can have a big engine or a small engine, which in our high-level world usually only translates to being able to concurrently run a lot of threads efficiently or perhaps only a few, but it usually stops there.

I really remember with fondness the day in the office it didn’t stop there.

There is so much to this I could write thousands of words, but in essence, we’re looking at this:

We have a lot of vertices and a lot of connections. We are also going to make heavy use of this service for a whole host of reasons. Thus we’re back in my comfort zone, where efficiency matters. As an example, each neighbourly-connection can have the associated cost pre-calculated in advance.

The goal here isn’t to teach A-Star, Wikipedia and other sources can do that. I just needed to set the scene so the data structure below makes sense.

Did I mention this thing needs to be fast? We chose Rust as the language of choice. So we end up with a data structure which looks something like this:

Even if Rust isn’t your thing, the above should be fairly self-explanatory.

So I spent a good deal of time measuring where the A-Star code was slow, picking the right priority-queue library, tweaking this and that. Then a colleague gave me some advice which seemed random, I tried it and the code ran 15% faster even though I didn’t modify the algorithm.

Sort the world array by geohash.

Six words. We need to really take these apart to understand where that 15% came from.

Imagine my postcode, my zip-code, is 1234567. As you are not familiar with this fictitious country, you might not know where that is, but if I were to say I knew someone living in 1234568, you could guess that probably wasn’t very far away — and most likely you’d be right. You’re almost looking at a geohash.

There are many hashing schemes. Here’s a simple one, to illustrate the point.

Take a point on planet Earth — it’ll have two coordinates, latitude and longitude. Now get a pen and paper out.

… and on we go, subdividing the planet down, literally bit by bit. If I’d chosen New York in the US, we’d have 40º N, 74ºW, so 1 (north), 1(west), 0 (not further north), 0 (not further west), the geohash so far would be 1100.

What is 1100? Apart from a 4-bit binary number which only needs half a byte to store it, using this scheme it looks like this:

It is 1/16th of the planet. Start adding more bits (you can encode this properly in bytes, no need to stick with binary strings), you can start getting pretty precise. Google says there are about 510 million km2 on the planet, so an innocent 32-bit number would narrow it down to 510 x 10⁶ / 2³² ~= 0.12km2. Note the units. This is quite precise!

Geohashes have the property that, apart from some boundary conditions, in general the longer the prefix two geohashes share, the closer the two regions are together. 11100101 and 11100111 are probably pretty close, and with longer geohashes the precision only improves.

How do you sort an array of 2D points? If you sort by latitude (Y coordinate), you’ll get longitude (X coordinate) values all over the place. Ditto if you sort the opposite way. No good.

If you sort by geohash, most (not all, there are boundary conditions) of the points in the final array will be geospatially close to their neighbours. New York will be next to New Jersey within the array, if you will.

So arrays are random access, right? We could have a vertex at, say, index 123 referring to a neighbour at index 1230000. We software engineers know that the indexing operator on an array is pretty fast. Random access. Fast.

Did you know that access time is not constant?

So, to the silicon. A microprocessor uses a memory hierarchy, simply because i) super fast memory costs a fortune and ii) modern systems need a lot of memory. So typically a modern CPU will have:

A modern CPU is packed full of tricks (and thus huge numbers of transistors) aimed at using the high-speed cache memory and registers to the maximum, and avoiding main memory access whenever possible. Another trait of main memory is once accessed, it can provide a small chunk of memory in one go. Caches are designed to accept these chunks, called cache lines. On a typical Intel chip, a cache line is 64 bytes.

So, we’re doing A-Star. We’re at node 123. We need to go to node 1230000. But the latter is not in our cache, we’ve not read it before (a cache miss). Sad face. What the CPU actually does is load a cache line from main memory, grabbing the data you want and some bytes around it. The principle is called locality of reference, as statistically it has been shown if a program accesses data at a certain address, it is likely to want more data nearby shortly afterwards. If the processor can grab memory before the code even requests it, you hide some of the slow main memory latency and things run a lot faster.

We should note at this point, unlike some languages such as Java, when Rust has an array of structures in an array, they are contiguous. This is not an array of object references, this is an array of cookie-cutter structures, their fields laid out, repeating the pattern one after another. This is about to matter.

If, as a pre-processing step, we sort the array of vertices by their geohash (derived from the latitude and longitude coordinates), data which refers to similar locations on the planet will be in similar locations in the array, and thus physical memory. We don’t even need to store the geohash, just use it as a means to sort.

So when A-Star accesses node 123, and then wants neighbouring node 125 (not 1230000 because neighbours are now close within the structure), there is a strong chance the microprocessor having read node 123 has already read node 125 as well when it pulled the cache line in. The performance penalty of accessing comparatively super-slow main memory has been partially hidden.

Thus pre-sorting by geohash the array structure for A-Star, without touching the actual code which uses the data, results in a measured 15% speed increase.

I grant you this is not a game-changing difference in performance, but the enjoyable thing for me at least was the journey. As an engineer, I simply love the fact we can make the code faster with no runtime code changes and no hardware changes!

I feel privileged to be working with such an amazing team. A team which can conjure up suggestions such as “sort the array by geohash” because they know what a geohash is and how a modern microprocessor works!

This has just been some fun, but the insights we have on a daily basis allow us to dig deeper into the vast array of data we hold, extracting more meaning for our customers.

Add a comment

Related posts:

Silver breaks higher

Assets all melted up in the Asia hours, with Silver and Nasdaq futures leading the charge. Nasdaq, however, stalled at the previous high registered last week and fell more than 2% during the New York…

You Deserve a Job

If you work hard to earn a living — to put food on the table, keep a roof over your head, and provide for your family — then you deserve to be treated with dignity and respect by your employer, which…

Reflection Activity

An important point made in Mesch’s article is that the Internet has become a culture where people are opening up and expressing themselves freely online. The online community share their values and…