I have a huge table. Each row has an ID, some columns which relate to that ID specifically, a short (DNA) Sequence, and some columns which relate to that sequence particularly. If these have not yet been calculated (for this row) then they are null, but the calculations will always come out identical for a given sequence.
In case it proves relevant: the sequence is indexed, and the DBMS is Postgres.
There are lots of duplicate sequences. Obviously this is non-optimal – both because we don't want to store the duplicates, and because we don't want to waste time recalculating those properties. There will already be duplicate calculated properties.
So I want to move the sequence's properties into a new table, using the sequence as a foreign key. The trouble here is the size of the table – hundreds of millions of records, and the properties are quite large as well.
With a small table this would be easy enough, but I need a better strategy for a huge table.