We want to store some genomic variant data but there are some problems, more important ones like problem of the data's immense size and variability.
1) Variant data can be huge. For example, a single individuals variant data could feasibly some day require a million rows of data in a table, or require of a gigabyte of raw storage on disk. Multiply this over several thousand individuals, and you could potentially end up with terabytes worth of information that you need to make sense of.
2) Each client and/or system we integrate with, will expose or want to see data slightly differently depending on their needs and use cases. This can potentially lead to hundreds of fields that we might need to store, all of which might need to be in different configurations based on the clients needs. So this variant data model will need to keep this in mind in order to remain easy to use, expandable and most importantly, scalable in the long term.
What do you think is better for such a problem? We were thining of having some coulmns in each table that point to an external database or even a file, where we save the huge BLOB data?