I have a bulk insert function
set_interactions(arg_rows text) that looks like this:
with inserts as ( insert into interaction ( thing_id, associate_id, created_time) select t->>'thing_id', t->>'associate_id', now() from json_array_elements(arg_rows::json) t ON CONFLICT (thing_id, associate_id) DO NOTHING RETURNING thing_id, associate_id ) select into insert_count count(*) from inserts; -- Followed by an insert in an unrelated table that has two triggers, neither of which touch any of the tables here (also not by any of their triggers, etc.)
(I wrap it that way because I need to get a count of actual inserts, without the "fake row updates" trick.)
- Just one constraint: A multi-column primary key (thing_id, associate_id)
- No indices
- Just one trigger: After insert, for each row.
The trigger does this:
DECLARE associateId text; BEGIN -- Go out and get the associate_id for this thing_id BEGIN SELECT thing.associate_id INTO STRICT associateId FROM thing WHERE thing.id = NEW.thing_id; EXCEPTION WHEN NO_DATA_FOUND THEN RAISE EXCEPTION 'Could not map the thing to an associate!'; WHEN TOO_MANY_ROWS THEN RAISE EXCEPTION 'Could not map the thing to a SINGLE associate!'; -- thing PK should prevent this END; -- We don't want to add an association between an associate interacting with their own things IF associateId != NEW.associate_id THEN -- Insert the new association, if it doesn't yet exist INSERT INTO associations ("thing_owner", "associate") VALUES (associateId, NEW.associate_id) ON CONFLICT DO NOTHING; END IF; RETURN NULL; END;
associations have no more columns than you see in the above statements.
Occasionally, I get an error
deadlock detected from PostgreSQL 9.6.5 when the application calls
set_interactions(). It may call it with 1-100 rows' worth of data, unsorted; "conflicting" batches may or may not have identical input (at the whole batch level or for each conflicting row).
deadlock detected while inserting index tuple (37605,46) in relation "associations" SQL statement INSERT INTO associations ("thing_owner", "associate") VALUES (associateId, NEW.associate_id) ON CONFLICT DO NOTHING; PL/pgSQL function aud.addfriendship() line 19 at SQL statement SQL statement "with inserts as ( insert into interaction ( thing_id, associate_id, created_time) select t->>'thing_id', t->>'associate_id', now() from json_array_elements(arg_rows::json) t ON CONFLICT (thing_id, associate_id) DO NOTHING RETURNING thing_id, associate_id ) select count(*) from inserts" PL/pgSQL function setinteractions(text) line 7 at SQL statement Process 31370 waits for ShareLock on transaction 111519214; blocked by process 31418. Process 31418 waits for ShareLock on transaction 111519211; blocked by process 31370. error: deadlock detected
What I've tried
I thought that maybe the function was sometimes being called with duplicate data in a single call. Not so: That instead results in a guaranteed error,
ON CONFLICT DO UPDATE command cannot affect row a second time.
I'm not able to reproduce the deadlock even trying 1,000 calls to
set_interactions() all at once with identical parameters, or even with identical pairs of rows having (different within the pair)
associate_id values but other values too, so they don't get optimized away somehow before hitting PostgreSQL (they also shouldn't get optimized away by the database, since the function is marked
volatile.) This is from a single-threaded back end; but at the same time, the application itself only runs one such back end in production, where the deadlock is occurring. I've even tried running these 1,000 calls against a full copy of the production database, and even under load from a second back end, and additionally from pgAdmin via a very-long-running query that selects from
interactions. They succeed without complaint.
https://rcoh.svbtle.com/postgres-unique-constraints-can-cause-deadlock mentions trying to avoid relying on a unique index (which is what the PK amounts to, as I understand it) when inserting duplicates. However, that was before
ON CONFLICT DO UPDATE, which I thought would solve this issue.
How is this query deadlocking "randomly", and how can I fix it? (Also, why can't I reproduce it with the above method?)