I have a simple problem that has a simple solution with SQL, but would like to explore alternative ways to solve it if they turn out to be more efficient on large scale.
Let's assume that we have a system where we have users, videos and list of videos users have viewed:
CREATE TABLE `video` ( `id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, `rank` INT NULL, `created_at` DATETIME NULL, PRIMARY KEY (`id`), INDEX `idx_rank` (`rank` ASC)); CREATE TABLE `user` ( `id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, `created_at` DATETIME NULL, PRIMARY KEY (`id`)); CREATE TABLE `user_view` ( `user_id` BIGINT NOT NULL, `video_id` BIGINT NOT NULL, PRIMARY KEY (`user_id`, `video_id`)); CREATE TABLE `user_friend` ( `user_id` BIGINT NOT NULL, `friend_id` BIGINT NOT NULL, PRIMARY KEY (`user_id`, `friend_id`));
The question is: How would we find all the videos that friends of a user have not viewed?
In my solution, I would get the ids of friends (Let's assume user has 3 friends, with ids 1, 2, 3) and build a query:
SELECT v.id, uv1.video_id FROM video v LEFT JOIN user_view uv1 ON (v.id = uv1.video_id AND uv1.user_id = 1) LEFT JOIN user_view uv2 ON (v.id = uv2.video_id AND uv2.user_id = 2) LEFT JOIN user_view uv3 ON (v.id = uv3.video_id AND uv3.user_id = 3) WHERE 1 AND v.id > 100 AND uv1.video_id IS NULL AND uv2.video_id IS NULL AND uv3.video_id IS NULL ORDER BY v.rank LIMIT 30
Above query will work, but more friends a user has, more joins we'd have to add to the query.
Let's assume we're dealing with 1 billion videos, 100 million users and on averahe user having 50 friends.
Is there a more efficient way to do this in SQL?
Is there a way to do this with non traditional SQL way? Perhaps with noSql with mongodb, cassandra, riak, redis, couchdb or anything else? I'm wondering if there is anything else more efficient purpose built for this.
Any other programming/processing technique that would prove to be more efficient?
I would really appreciate your input.