Joins Don't Scale!

December 22, 2015

“Joins Don’t Scale”. Well, that’s what I heard MongoDB said anyway.

My response was “Huh? Yeh, they do”. So what gives? Who is right? Why the mixup?

Well, first thing to realise is that the implicit topic we are talking about is massively parallel (MPP) databases, so what they are talking about is the scalability of a join between two tables that are spread across multiple nodes of an MPP database, such as Postgres-XL.

All of the MPP databases I’ve worked with allow you to spread data across multiple nodes, typically using a simple hashing method, using one or more columns. That’s often called the Distribution Key, Primary Index or similar. If you have two tables with the same DK, then the join happens evenly across all nodes. That scales very nicely, meaning that if you have twice as many nodes the join happens twice as quickly. No data is passed between nodes at all; well, apart from the result set going back to the user.

If the Distribution Keys don’t match then the join plan requires a step called a “redistribution”, which puts the data in the right place so it can be joined. That step involves re-hashing the data and then sending it to the correct nodes. Which then requires each node to send data to all other nodes. Which needs N*(N-1) sessions between nodes and can be a lengthy operation. That sounds like it’s a problem, but redistributions are still scalable, as long as your network doesn’t hit limits. The more nodes you have the less work each node has to do.

OMG! O(N^2) operation detected! PANIC?!? Redistribution is an extra operation that is best avoided, much the same as a large sort or sequential scan is best avoided. Having said that, I don’t see any need to fear them. The physics of databases means that every MPP database needs to cope with this problem, so its not isolated to XL or any other MPP database. Even MongoDB suffers from it – just because you have only one table doesn’t mean you never need joins.

Anyway, that’s what I believe is the source of this weird meme that joins don’t scale. There’s also been some strange questions asked about Postgres-XL, as if XL has an issue here that other databases don’t, which it definitely doesn’t.

Still worried? Don’t be. The main purpose of the Star Schema data model is to reduce Business Intelligence problems down to One Big Table plus Lots of Small Ones. In that case, we simply avoid the redistribution task altogether. In the case of a Star Schema, joins take place by copying the smaller tables in the join to all nodes.

Some MPP databases, such as Postgres-XL are designed with star schema joins in mind, providing the ability to replicate reference data tables to all nodes. If the data already exists on all nodes, then we can just use it immediately, speeding up smaller joins.

If you have a more complex data model then you may need to do redistribution joins, which happen pretty much the same way on every MPP database.

Share this