Postgresql – Multicolumn indexes and OR clause

indexperformancepostgresqlpostgresql-performance

I have a message table in my application and one of the most commonly executed SQL queries on this table is the following one:

select * from message message0_ 
 where message0_.sender_id=? or message0_.recipient_id=?
 order by message0_.send_date asc;

I am basically querying for the messages the current user has either received or sent.

The value of the parameter for sender_id & recipient_id is the current user ID.

I have created the following index:

CREATE INDEX ON message(recipient_id, sender_id);

I would like to know if the order of the fields matters for my use case bearing in mind the OR clause in my SQL query.

Can someone please help?

Best Answer

For your use case, you actually need two different indexes:

CREATE INDEX ON messages(recipient_id);
CREATE INDEX ON messages(sender_id);

Your multi-column indexed would be used by the recipient_id = ?? part of the query, but it would not be used by the other (it can be used by other databases that know how to do Skip Scanning, such as Oracle, although it might not be extremely efficient).

Depending on how your PRIMARY KEY is for the message(s) table, you might not need one of them (the implicity index associated with the PK will do the job).

You can then either use your query as is, and let PostgreSQL do a BitmapOr, or convert the query to a UNION and avoid the OR. Timings should be about the same in both cases. If you are sure that there are not messages where both the sender_id and the recipient_id are the same (i.e.: nobody sends a message to him/herself, or if s/he does, you don't mind showing it twice), then a UNION ALL (which is slightly quicker) could be used instead with the same result.

OR conditions tend not to be handled too well by most databases. This specific case, which is rather common and rather simple, is handled well by PostgreSQL.


Experimental check

Table definitions (simplified)

CREATE TABLE users
(
    user_id integer PRIMARY KEY,
    user_name text 
) ;

CREATE TABLE messages
(
    sender_id    integer NOT NULL REFERENCES users(user_id) 
                 ON UPDATE CASCADE ON DELETE RESTRICT,
    recipient_id integer NOT NULL REFERENCES users(user_id) 
                 ON UPDATE CASCADE ON DELETE RESTRICT,
    send_date    timestamp NOT NULL DEFAULT now(),
    message_text text,
    PRIMARY KEY(sender_id, recipient_id, send_date)
) ;
CREATE INDEX ON messages (recipient_id) ;
-- Following index not needed, given our primary key
-- CREATE INDEX ON messages (sender_id) ;

We populate the tables with some simulated data:

-- Create 1000 users
INSERT INTO users (user_id, user_name)
SELECT
    user_id, 'user' || user_id AS user_name
FROM
    generate_series(1, 1000) AS x(user_id) ;

-- Create (aprox.) 100000 messages
INSERT INTO messages (sender_id, recipient_id, send_date)
SELECT
    (random()*999+1)::integer AS sender_id,
    (random()*999+1)::integer AS recipient_id,
    send_date
FROM
    generate_series(timestamp '2017-01-01', timestamp '2017-01-31', interval '25 seconds') AS x(send_date); 

ANALYZE;

And, at this point, we check which are the execution plans that we get:

-- Checking the query with OR
EXPLAIN ANALYZE
SELECT * 
FROM messages
WHERE recipient_id = 123 OR sender_id = 123
ORDER BY send_date ;

gives ...

QUERY PLAN
1   Sort  (cost=420.21..420.71 rows=200 width=48) (actual time=0.430..0.445 rows=192 loops=1)
2     Sort Key: send_date
3     Sort Method: quicksort  Memory: 34kB
4     ->  Bitmap Heap Scan on messages  (cost=10.31..412.56 rows=200 width=48) (actual time=0.092..0.389 rows=192 loops=1)
5           Recheck Cond: ((recipient_id = 123) OR (sender_id = 123))
6           Heap Blocks: exact=157
7           ->  BitmapOr  (cost=10.31..10.31 rows=200 width=0) (actual time=0.061..0.061 rows=0 loops=1)
8                 ->  Bitmap Index Scan on messages_recipient_id_idx  (cost=0.00..5.04 rows=100 width=0) (actual time=0.033..0.033 rows=97 loops=1)
9                       Index Cond: (recipient_id = 123)
10                ->  Bitmap Index Scan on messages_pkey  (cost=0.00..5.17 rows=100 width=0) (actual time=0.025..0.025 rows=95 loops=1)
11                      Index Cond: (sender_id = 123)
12  Planning time: 0.379 ms
13  Execution time: 0.527 ms

And using UNION:

-- Not using OR, but UNION-ing
EXPLAIN ANALYZE
SELECT * 
FROM messages
WHERE recipient_id = 123
UNION
SELECT *
FROM messages
WHERE sender_id = 123
ORDER BY send_date ;

you get the following query plan:

QUERY PLAN
1   Sort  (cost=538.87..539.37 rows=200 width=48) (actual time=0.387..0.399 rows=192 loops=1)
2     Sort Key: messages.send_date
3     Sort Method: quicksort  Memory: 34kB
4     ->  HashAggregate  (cost=529.22..531.22 rows=200 width=48) (actual time=0.268..0.317 rows=192 loops=1)
5           Group Key: messages.sender_id, messages.recipient_id, messages.send_date, messages.message_text
6           ->  Append  (cost=5.07..527.22 rows=200 width=48) (actual time=0.038..0.192 rows=192 loops=1)
7                 ->  Bitmap Heap Scan on messages  (cost=5.07..262.55 rows=100 width=48) (actual time=0.038..0.094 rows=97 loops=1)
8                       Recheck Cond: (recipient_id = 123)
9                       Heap Blocks: exact=89
10                      ->  Bitmap Index Scan on messages_recipient_id_idx  (cost=0.00..5.04 rows=100 width=0) (actual time=0.022..0.022 rows=97 loops=1)
11                            Index Cond: (recipient_id = 123)
12                ->  Bitmap Heap Scan on messages messages_1  (cost=5.19..262.67 rows=100 width=48) (actual time=0.033..0.085 rows=95 loops=1)
13                      Recheck Cond: (sender_id = 123)
14                      Heap Blocks: exact=86
15                      ->  Bitmap Index Scan on messages_pkey  (cost=0.00..5.17 rows=100 width=0) (actual time=0.021..0.021 rows=95 loops=1)
16                            Index Cond: (sender_id = 123)
17  Planning time: 0.178 ms
18  Execution time: 0.521 ms

For the sake of completeness: UNION ALL:

    QUERY PLAN
1   Sort  (cost=537.11..537.62 rows=201 width=48) (actual time=0.213..0.229 rows=160 loops=1)
2     Sort Key: messages.send_date
3     Sort Method: quicksort  Memory: 32kB
4     ->  Append  (cost=5.08..529.42 rows=201 width=48) (actual time=0.034..0.173 rows=160 loops=1)
5           ->  Bitmap Heap Scan on messages  (cost=5.08..264.74 rows=101 width=48) (actual time=0.034..0.086 rows=82 loops=1)
6                 Recheck Cond: (recipient_id = 123)
7                 Heap Blocks: exact=77
8                 ->  Bitmap Index Scan on messages_recipient_id_idx  (cost=0.00..5.05 rows=101 width=0) (actual time=0.022..0.022 rows=82 loops=1)
9                       Index Cond: (recipient_id = 123)
10          ->  Bitmap Heap Scan on messages messages_1  (cost=5.19..262.67 rows=100 width=48) (actual time=0.030..0.075 rows=78 loops=1)
11                Recheck Cond: (sender_id = 123)
12                Heap Blocks: exact=72
13                ->  Bitmap Index Scan on messages_pkey  (cost=0.00..5.17 rows=100 width=0) (actual time=0.020..0.020 rows=78 loops=1)
14                      Index Cond: (sender_id = 123)
15  Planning time: 0.222 ms
16  Execution time: 0.280 ms

You can see that, in both cases, the execution plan is nearly the same for the two firsts approaches, and better for the UNION ALL.

You can check all this at Rextester.

Your original setup

If you do exactly the same but don't have the CREATE INDEX ON messages (recipient_id) ; statement, what you would get is:

QUERY PLAN
1   Sort  (cost=2123.86..2124.36 rows=200 width=48) (actual time=27.305..27.320 rows=227 loops=1)
2     Sort Key: send_date
3     Sort Method: quicksort  Memory: 35kB
4     ->  Seq Scan on messages  (cost=0.00..2116.22 rows=200 width=48) (actual time=0.108..27.097 rows=227 loops=1)
5           Filter: ((recipient_id = 123) OR (sender_id = 123))
6           Rows Removed by Filter: 103454
7   Planning time: 0.474 ms
8   Execution time: 27.392 ms

QUERY PLAN
1   Sort  (cost=2135.60..2136.10 rows=201 width=48) (actual time=15.716..15.803 rows=227 loops=1)
2     Sort Key: messages.send_date
3     Sort Method: quicksort  Memory: 35kB
4     ->  HashAggregate  (cost=2125.90..2127.91 rows=201 width=48) (actual time=15.571..15.621 rows=227 loops=1)
5           Group Key: messages.sender_id, messages.recipient_id, messages.send_date, messages.message_text
6           ->  Append  (cost=0.00..2123.89 rows=201 width=48) (actual time=0.061..15.371 rows=227 loops=1)
7                 ->  Seq Scan on messages  (cost=0.00..1857.01 rows=100 width=48) (actual time=0.060..15.092 rows=117 loops=1)
8                       Filter: (recipient_id = 123)
9                       Rows Removed by Filter: 103564
10                ->  Bitmap Heap Scan on messages messages_1  (cost=5.20..264.87 rows=101 width=48) (actual time=0.086..0.248 rows=110 loops=1)
11                      Recheck Cond: (sender_id = 123)
12                      Heap Blocks: exact=101
13                      ->  Bitmap Index Scan on messages_pkey  (cost=0.00..5.18 rows=101 width=0) (actual time=0.066..0.066 rows=110 loops=1)
14                            Index Cond: (sender_id = 123)
15  Planning time: 0.333 ms
16  Execution time: 16.006 ms

QUERY PLAN
1   Sort  (cost=2131.58..2132.08 rows=201 width=48) (actual time=14.847..14.865 rows=227 loops=1)
2     Sort Key: messages.send_date
3     Sort Method: quicksort  Memory: 35kB
4     ->  Append  (cost=0.00..2123.89 rows=201 width=48) (actual time=0.076..14.731 rows=227 loops=1)
5           ->  Seq Scan on messages  (cost=0.00..1857.01 rows=100 width=48) (actual time=0.076..14.497 rows=117 loops=1)
6                 Filter: (recipient_id = 123)
7                 Rows Removed by Filter: 103564
8           ->  Bitmap Heap Scan on messages messages_1  (cost=5.20..264.87 rows=101 width=48) (actual time=0.082..0.209 rows=110 loops=1)
9                 Recheck Cond: (sender_id = 123)
10                Heap Blocks: exact=101
11                ->  Bitmap Index Scan on messages_pkey  (cost=0.00..5.18 rows=101 width=0) (actual time=0.060..0.060 rows=110 loops=1)
12                      Index Cond: (sender_id = 123)
13  Planning time: 0.284 ms
14  Execution time: 14.931 ms

... which are much worse query plans, because you need some sequential scans. You can check this approach also at this Rextester.