I have 2 tables, first one contains dated items (something like log records), second one – multiple attributes of those items.
CREATE TABLE items ( date date NOT NULL, point_id integer NOT NULL, /* several more columns */ );
CREATE TABLE attributes ( attribute_id integer, value double precision NOT NULL, point_id integer, CONSTRAINT fk_point_id FOREIGN KEY (point_id) REFERENCES items (point_id) MATCH SIMPLE ON UPDATE RESTRICT ON DELETE CASCADE )
So for single record in
items there are 1..N records in
attributes. For now able
items has about 35M of rows, for each row there are 2 rows in
attributes table – one with
attribute_id = 0 and second with
attribute_id = 1
I have separate btree indexes on all listed fields. And what is unclear to me is why following query leads to Seq Scan of attributes table:
EXPLAIN SELECT s.date, a1.value AS value FROM items s INNER JOIN attributes a1 ON s.point_id = a1.point_id WHERE a1.attribute_id = 0 AND s.date >= '2016-12-01'::DATE
Here is explain result:
Hash Join (cost=14457.15..1529906.57 rows=310075 width=12) Hash Cond: (a1.point_id = s.point_id) -> Seq Scan on attributes a1 (cost=0.00..1376027.10 rows=36352418 width=12) Filter: (attribute_id = 0) -> Hash (cost=10569.55..10569.55 rows=311008 width=8) -> Index Scan using items_date_idx on items s (cost=0.56..10569.55 rows=311008 width=8) Index Cond: (date >= '2016-12-01'::date)
I see that Postgres uses index to filter
items table and get only needed rows – there are about 300K of them. But then why it iterates over whole 70M
attributes table to compose hash map? As I have indexes on 'point_id' column in both tables, why not to use them to perform merge join of just 300K?
I tried to enable/disable different planner options, but of course it get worse. I understand that planner does his job very well, there is just something wrong in my schema and I need to help planner somehow. I've tried to solve the issue with adding composite index like 'date,point_id' – at least it helped in Mongo (after filtering by date you have sorted list of point_id and may perform merge join), but Postgres ignored those indexes and used simple ones.
So which index shall I create (or how shall I redesign my schema) to get provided query work faster and prevent seq scan?
Not sure how can it help, but in
items table each row has larger point_id than previous and same or larger date. Also, I do not need exact date/time – only date part.
In the query above if I remove date check condition I get the merge join as expected:
Merge Join (cost=73.17..4229922.45 rows=36352418 width=12) Merge Cond: (s.point_id = a1.point_id) -> Index Scan using items_point_id_idx on items s (cost=0.44..1143957.50 rows=36461804 width=8) -> Index Scan using attributes_point_id_idx on attributes a1 (cost=0.57..2540470.32 rows=36352418 width=12) Filter: (attribute_id = 0)
I use postgres 9.5
In theory there could be custom amount of attributes, it is defined in runtime.
There is an index on
attribute_id. I think it is not used because half of table has
attribute_id = 0 and another half has
attribute_id = 1.