# Postgresql – Efficient pagination for big tables

indexpagingperformancepostgresqlpostgresql-10query-performance

Using PostgreSQL 10.5. I'm trying to create a pagination system where the user can go back and forth between various of results.

In an attempt to not use OFFSET, I pass the id from the last row in the previous page in a parameter called p (prevId). I then select the first three rows whose id is higher than the number passed in the p parameter. (as described in this article)

For example, if the id for the last row in the previous page was 5, I'd select the first 3 rows with an id is higher than 5:

SELECT
id,
firstname,
lastname
FROM
people
WHERE
firstname = 'John'
AND id > 5
ORDER BY
ID ASC
LIMIT
3;


This works great and the timing isn't very bad either:

Limit  (cost=0.00..3.37 rows=3 width=17) (actual time=0.046..0.117 rows=3 loops=1)
->  Seq Scan on people  (cost=0.00..4494.15 rows=4000 width=17) (actual time=0.044..0.114 rows=3 loops=1)
Filter: ((id > 5) AND (firstname = 'John'::text))
Rows Removed by Filter: 384
Planning time: 0.148 ms
Execution time: 0.147 ms


Although, if the user, on the other hand, would like to return to the previous page, things look a bit different:

First, I'd pass the id for the first row and then put minus sign in front of it to indicate that I should select the rows with an id that's less than (a positive) p parameter. Namely, if the id for the first row is 6, the p parameter would be -6. Similarly, my query would look like the following:

SELECT
*
FROM
(
SELECT
id,
firstname,
lastname
FROM
people
WHERE
firstname = 'John'
AND id < 6
ORDER BY
id DESC
LIMIT
3
) as d
ORDER BY
id ASC;


In the query above, I first select the last 3 rows with an id that's less than 6 and then reverse them in order to present them in the same way as the first query described in the beginning.

This works as it should, but since the database is going through almost all my rows, the performance is suffering:

Sort  (cost=4252.75..4252.76 rows=1 width=17) (actual time=194.464..194.464 rows=0 loops=1)
Sort Key: people.id
Sort Method: quicksort  Memory: 25kB
->  Limit  (cost=4252.73..4252.73 rows=1 width=17) (actual time=194.460..194.460 rows=0 loops=1)
->  Sort  (cost=4252.73..4252.73 rows=1 width=17) (actual time=194.459..194.459 rows=0 loops=1)
Sort Key: people.id DESC
Sort Method: quicksort  Memory: 25kB
->  Gather  (cost=1000.00..4252.72 rows=1 width=17) (actual time=194.448..212.010 rows=0 loops=1)
Workers Planned: 1
Workers Launched: 1
->  Parallel Seq Scan on people  (cost=0.00..3252.62 rows=1 width=17) (actual time=18.132..18.132 rows=0 loops=2)
Filter: ((id < 13) AND (firstname = 'John'::text))
Rows Removed by Filter: 100505
Planning time: 0.116 ms
Execution time: 212.057 ms


With this being said, I appreciate that you have taken the time to read this far and my question is, how can I make the pagination more efficient?

The key to performance is a matching multicolumn index of the form:

CREATE UNIQUE INDEX ON people (firstname, id);


UNIQUE, since sort order can be ambiguous without it, and you can get arbitrary results from peers.

A UNIQUE or PRIMARY KEY constraint serves as well.

While the first column is checked for equality like in your example (or sorted in the same direction as the query), this index is good for paging up and down, though it's a bit better for paging up.

With the index in place (and after running ANALYZE on the table), you won't see sequential scans any more (unless your table is small). The database is not "going through almost all your rows" any more.

Read the fine presentation by Markus Winand you are linking to.

If you want paging across more than one firstname, use ROW values. Example for paging down:

SELECT *
FROM  (
SELECT id, firstname, lastname
FROM   people
WHERE  (firstname, id) < ('John', 6)  -- ROW values
ORDER  BY firstname DESC, id DESC
LIMIT  3
) d
ORDER BY firstname, id;


Related:

If the SELECT list only adds lastname like in your example, you might try to add that column to the index to get index-only scans out of it:

CREATE UNIQUE INDEX ON people (firstname, id, lastname);


Index expressions in this order.

The upcoming Postgres 11 allows indexes to INCLUDE columns, which results in smaller index size, better performance and is applicable in more situations. Like:

CREATE UNIQUE INDEX ON people (firstname, id) INCLUDE (lastname);