Postgresql – Postgres long running query not blocking table operations


We have a 3rd party service that imports data into a local postgres instance, that we then query/copy into our own view-data system. These queries can yield millions of rows, so the import of them to our view-data can take a LONG time (days sometimes). This isn't actually a problem for us, but the 3rd party script occasionally performs table operations (like drop) when it's rebuilding itself. Unfortunately those operations get blocked when we have queries running. Right now, due to the size of the data, we're using a postgres cursor (using the ruby sequel gem to assist with this) so we can fit a fixed chunked of data into memory. It seems that CURSORS though, have a sort of transaction running while iterating through the cursor.

I recognize that the best way to do this is take the results of a query and place them somewhere else as quickly as possible and then scan them to import that data and I know I could do that with temporary tables or other means, however I'm wondering if there's a way that a query can buffer its results (say, to a file on disk) without having to resort to managing temporary tables, such that it no longer locks schema operations?

For instance, I found there's a "single row mode" that streams query results, but I'm not entirely sure how postgres does that under the hood and if that would ultimately have the same effect on concurrently running schema operations.

Can someone enlighten me as to how "streaming" works and if this would solve my problem? Or do I have no other choice but to create a temporary table?

Best Answer

Ruby will close the cursor when it is done iterating over it. If you already know that that will take a long time, then there is no reason to think it isn't working correctly. Can you set log_statement ='all' to see when the CLOSE and COMMIT arrive if you want to double check.

Using single row mode won't change the situation. It will still hold locks until it is done. It will just change the chunk size fetched from the server from 1000 (or whatever the cursor uses) to 1.

I know that the dblink function does spool its results to disk, so that it can release locks (on the remote end) as quickly as possible. It is the only thing I know of that does that on the server side. Perhaps you can use dblink to loop back to the same database server and run the query through it, then use single row mode to get the results, although I don't know how you get Ruby to make use of single-row-mode. I've only seen it done from C. You can probably get Ruby to use a cursor with a dblink query.