I have been investigating an inefficiency in our ETL pipeline, where on one side we have Postgres database with logical replication slots.
A consumer keeps track of the LSN where it's at, and sometimes disconnects by design. Upon return, it tells the database the LSN where it would like to begin. However, the time it takes for the replication slot to start returning any content seems to depend on the amount of WAL files currently on disk. It appears as if the replication slot parses through them all to find the right position, which can sometimes take hours.
This behavior does not seem normal or necessary, am I missing something?