Postgresql – What type of database can I use to store mutual scores between every combination of users


Let's say I have an application that calculates and stores a mutual score between every combination of users.

I'm currently using Postgres to store this information.

So if my system had users with id 1, 2, and 3 that table might look like

CREATE TABLE scores (id,user_id,friend_id,score)
    ( 1, 1, 2, 15.2 ),
    ( 2, 2, 1, 15.2 ),
    ( 3, 1, 3, 19.7 ),
    ( 4, 3, 1, 19.7 ),
    ( 5, 2, 3, 21.0 ),
    ( 6, 3, 2, 21.0 )

As you can see, the data itself is pretty simple.

  • Every combination of users has a mutual score calculated between them (the derivation of that score or what it means isn't important here) and I store it in the table.
  • Because I could look up either user, I have to store the pairings "both ways" (e.g. 1 -> 2 and 2 -> 1)

The problem is that this increases quadratically. With 10,000 users there are 100 million records in this table. With 100,000 users (which is a reasonable scope for my app) there would be 10 billion records.

So my question is – surely Postgres can't be the most scale-able way to store this information right? I don't need access to it constantly, just periodically when I'm running a batch job so I'm happy to store it in some separate type of DB on a separate server.

What kind of DB specializes in storing this type of paired data and can perform quick reads?

For example, I commonly query for all the ranked friends of a given user with SELECT friend_id FROM my_table where user_id = ? ORDER BY score DESC so I'd like to run the equivalent of that query on this new database.


P.S Someone had suggested a Graph DB – didn't know if that would apply here.

Best Answer

Because I could look up either user, I have to store the pairings "both ways"

Well, that's not true. You could normalize the data because the relations aren't directed anyway.

  user_id   int,
  friend_id int,
  score     int,
  CHECK ( user_id < friend_id ),
  PRIMARY KEY (user_id, friend_id)

INSERT INTO scores (user_id, friend_id,score)
  ( 1, 2, 15.2 ),
  ( 1, 3, 19.7 ),
  ( 2, 3, 21.0 );

Now you can find the relation easy.

FROM scores
WHERE user_id = 1
  AND friend_id = 2;

If you don't want to put the rule in the app, just rewrite the query into.

FROM scores
WHERE (user_id, friend_id) =
  ROW( least(1,2), greatest(1,2) );