Let's say I have an application that calculates and stores a mutual score between every combination of users.
I'm currently using Postgres to store this information.
So if my system had users with
id 1, 2, and 3 that table might look like
CREATE TABLE scores (id,user_id,friend_id,score) AS ( VALUES ( 1, 1, 2, 15.2 ), ( 2, 2, 1, 15.2 ), ( 3, 1, 3, 19.7 ), ( 4, 3, 1, 19.7 ), ( 5, 2, 3, 21.0 ), ( 6, 3, 2, 21.0 ) );
As you can see, the data itself is pretty simple.
- Every combination of users has a mutual score calculated between them (the derivation of that score or what it means isn't important here) and I store it in the table.
- Because I could look up either user, I have to store the pairings "both ways" (e.g. 1 -> 2 and 2 -> 1)
The problem is that this increases quadratically. With 10,000 users there are 100 million records in this table. With 100,000 users (which is a reasonable scope for my app) there would be 10 billion records.
So my question is – surely Postgres can't be the most scale-able way to store this information right? I don't need access to it constantly, just periodically when I'm running a batch job so I'm happy to store it in some separate type of DB on a separate server.
What kind of DB specializes in storing this type of paired data and can perform quick reads?
For example, I commonly query for all the ranked friends of a given user with
SELECT friend_id FROM my_table where user_id = ? ORDER BY score DESC so I'd like to run the equivalent of that query on this new database.
P.S Someone had suggested a Graph DB – didn't know if that would apply here.