Postgresql – Postgres: Why is the delete query using up all available disk space

linuxpostgresqltimescaledb

I have a postgres table set up like so:

                                                           Table "public.facts"
   Column    |           Type           | Collation | Nullable |              Default              | Storage  | Stats target | Description 
-------------+--------------------------+-----------+----------+-----------------------------------+----------+--------------+-------------
 id          | integer                  |           | not null | nextval('facts_id_seq'::regclass) | plain    |              | 
 value       | character varying(100)   |           |          |                                   | extended |              | 
 measured_at | timestamp with time zone |           | not null |                                   | plain    |              | 
 received_at | timestamp with time zone |           |          |                                   | plain    |              | 
 written_at  | timestamp with time zone |           |          |                                   | plain    |              | 
 is_numeric  | boolean                  |           |          |                                   | plain    |              | 
 source_id   | character varying(100)   |           |          |                                   | extended |              | 
 site_id     | integer                  |           |          |                                   | plain    |              | 
Indexes:
    "facts_measured_at_idx" btree (measured_at DESC)
    "facts_source_id_measured_at_idx" btree (source_id, measured_at)
    "idx_source_id_measured_at" btree (source_id, measured_at)
Foreign-key constraints:
    "facts_site_id_fkey" FOREIGN KEY (site_id) REFERENCES site(id)
    "facts_source_id_fkey" FOREIGN KEY (source_id) REFERENCES source(id)
Triggers:
    ts_insert_blocker BEFORE INSERT ON facts FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Child tables: _timescaledb_internal._hyper_1_100_chunk,
              _timescaledb_internal._hyper_1_101_chunk,
              _timescaledb_internal._hyper_1_102_chunk,
              _timescaledb_internal._hyper_1_103_chunk,
              _timescaledb_internal._hyper_1_104_chunk,
              _timescaledb_internal._hyper_1_105_chunk,
              _timescaledb_internal._hyper_1_106_chunk,
              _timescaledb_internal._hyper_1_107_chunk,
              _timescaledb_internal._hyper_1_108_chunk,
              _timescaledb_internal._hyper_1_109_chunk,
              _timescaledb_internal._hyper_1_110_chunk,
              _timescaledb_internal._hyper_1_111_chunk,
              _timescaledb_internal._hyper_1_112_chunk,
              _timescaledb_internal._hyper_1_113_chunk,
              _timescaledb_internal._hyper_1_114_chunk,
              _timescaledb_internal._hyper_1_115_chunk,
              _timescaledb_internal._hyper_1_116_chunk,
              _timescaledb_internal._hyper_1_117_chunk,
              _timescaledb_internal._hyper_1_118_chunk,
              _timescaledb_internal._hyper_1_119_chunk,

Recently a bug was introduced that caused an infinite loop and many thousands of duplicate rows were written. I want to delete those duplicate rows and I'm using the following query:

DELETE FROM facts a USING facts b WHERE a.id > b.id AND a.source_id = b.source_id AND a.measured_at = b.measured_at AND a.value = b.value;

Disk usage prior to running the query:

ubuntu@ip-xx-xx-xx-xx:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             32G     0   32G   0% /dev
tmpfs           6.3G  844K  6.3G   1% /run
/dev/nvme0n1p1   49G  8.5G   40G  18% /
tmpfs            32G   44K   32G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            32G     0   32G   0% /sys/fs/cgroup
/dev/loop0       29M   29M     0 100% /snap/amazon-ssm-agent/2012
/dev/loop1       98M   98M     0 100% /snap/core/10126
/dev/loop2       98M   98M     0 100% /snap/core/10185
/dev/nvme1n1p1  3.9T  181G  3.5T   5% /mnt                <-- database on this volume
tmpfs           6.3G     0  6.3G   0% /run/user/1000

disk usage 3 days after executing the query (still running):

ubuntu@ip-xx-xx-xx-xx:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             32G     0   32G   0% /dev
tmpfs           6.3G  836K  6.3G   1% /run
/dev/nvme0n1p1   49G  8.5G   40G  18% /
tmpfs            32G   44K   32G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            32G     0   32G   0% /sys/fs/cgroup
/dev/loop0       29M   29M     0 100% /snap/amazon-ssm-agent/2012
/dev/loop1       98M   98M     0 100% /snap/core/10126
/dev/loop2       98M   98M     0 100% /snap/core/10185
/dev/nvme1n1p1  3.9T  3.2T  527G  86% /mnt              <-- database on this volume
tmpfs           6.3G     0  6.3G   0% /run/user/1000

Nothing is being written to or read from the database while this query is executing. Most of the extra storage is in /mnt/postgres/12/main/base/pgsql_tmp. Why does it take >3TB to delete less than 200GB?

Please let me know if there's any other info I can provide.

Best Answer

It turns out all the extra child tables/chunks created by timescaledb were making this query run wild; running EXPLAIN as Schwern suggested was what keyed me into this. So here's what I ended up doing:

-- copy the table without any of the extra indices or child tables
CREATE TABLE facts_copy AS TABLE facts;

-- re-run the delete query, took about 23 hours, deleted 369225431 / 619858486 rows
DELETE FROM facts_copy a USING facts_copy b WHERE a.id > b.id AND a.source_id = b.source_id AND a.measured_at = b.measured_at AND a.value = b.value;

-- add fkeys and unique index so this can't happen again
ALTER TABLE facts_copy ADD CONSTRAINT facts_site_id_fkey FOREIGN KEY (site_id) REFERENCES site(id);
ALTER TABLE facts_copy ADD CONSTRAINT facts_source_id_fkey FOREIGN KEY (source_id) REFERENCES source(id);
CREATE UNIQUE INDEX unique_idx_source_id_measured_at ON facts_copy(source_id,measured_at);

-- drop the old table and rename the new one
DROP TABLE facts;
ALTER TABLE facts_copy RENAME TO facts;

-- re-create the hypertable
SELECT create_hypertable('facts','measured_at',migrate_data=>true);