I have a star schema database in Redshift. I'm running an aggregation on a fact table
order_facts with the following relevant columns:
FLOAT – total order cost
INTEGER – payment date in the format
INTEGER foreign key to the shops dimension
I'm using the following dimension tables:
INTEGER – date in format
DATE – SQL date formatted field
INTEGER – primary key for the table
DATETIME – timestamp for when the shop was created
My goal is to obtain a table of the total amount of orders for each shop in its first 30 days.
My query is this, and should give an accurate answer. However, it's taking over a half-hour to run:
SELECT of.shop_id, SUM(of.total) FROM order_facts of INNER JOIN shops s ON of.shop_id = s.id INNER JOIN dates d ON of.payment_date = d.id WHERE d.date <= (s.created + INTERVAL '30 days') GROUP BY of.shop_id
I tried rewriting it like this, but the query still does not finish after over 20 minutes of runtime
SELECT SUM(r.total), r.shop_id FROM ( SELECT s.created, of.shop_id, of.total, of.payment_date FROM order_facts of INNER JOIN shops s ON of.shop_id = s.id ) r INNER JOIN dates d ON r.payment_date = d.id WHERE d.date <= (r.created + INTERVAL '30 days') GROUP BY r.shop_id
I'm not currently understanding why it's taking so long to run. Understanding that would help me know how to correct the query. Likewise, seeing a better version of the above query would help me understand where the inefficiency exists. Either way is extremely helpful for me.
Just getting the total order amount per shop for the past X days, or all-time, is a very fast query. Thus it seems I'm doing something suboptimal with the join to the dates table, but it's not clear what.
EDIT: output of
1 XN HashAggregate (cost=868301988.49..868301991.01 rows=1009 width=12) 2 -> XN Merge Join DS_DIST_NONE (cost=0.00..868286973.07 rows=3003084 width=12) 3 Merge Cond: ("outer".id = "inner".payment_date) 4 Join Filter: ("inner".shop_id = "outer".id) 5 -> XN Nested Loop DS_BCAST_INNER (cost=0.00..1773635510.00 rows=32472000 width=8) 6 Join Filter: (("outer".date)::timestamp without time zone > ("inner".created + '30 days'::interval)) 7 -> XN Seq Scan on dates d (cost=0.00..110.00 rows=11000 width=8) 8 -> XN Seq Scan on shops s (cost=0.00..88.56 rows=8856 width=12) 9 -> XN Seq Scan on order_facts "of" (cost=0.00..90092.52 rows=9009252 width=16) 10 ----- Nested Loop Join in the query plan - review the join predicates to avoid Cartesian products -----