Postgresql – HStore vs. multiple tables with inheritance – do I have a good use case for HStore

database-designhstorepostgresql

My broader question here has to do with when it is appropriate to use HStore vs. multiple tables vs. one table for storing document-like objects.

Now, for my localized example: I’m designing a database structure for holding experimental results. My first idea was to have separate tables for each experiment. This would look like:

Multiple tables

However, this would require a lot of JOINs across tables since it would be fairly common for end-users to want to look for data across multiple experiments. (Side note: maybe a good use case for inheritance?). I thought I could avoid this by having a monolithic experiments table like this:

Monolithic table

That would certainly eliminate the need to have JOINs but I feel like it would be logical to have individual experiments be separate in some way. Also, it will be very common that one experiment is requested in its entirety. This is why I’m considering using HStore kind of like this:

HStore

This appeals to me because each individual result set is very document-like, but I’m afraid that I’ll be running into the same issues that the multiple tables approach would bring.

Other considerations and thoughts that may be relevant:

  • I’ve thought about having additional tables for storing meta-data/annotations relating to individual data points in individual experiments. Some of this would fit very nicely into the HStore experiments table.
  • Different users will be inputting experimental data, I would use schema for this
  • There will be different experiment types – I think the best way to handle this would be storing these in entirely different databases

I should also mention that I have very limited experience with PostgreSQL, but I’m already delighted at what it has to offer over MySQL!

Best Answer

That would certainly eliminate the need to have JOINs but I feel like it would be logical to have individual experiments be separate in some way. Also, it will be very common that one experiment is requested in its entirety

Both of those are good arguments for using table partitioning. PostgreSQL implements this with inheritance and constraint exclusion.

I’ve thought about having additional tables for storing meta-data/annotations relating to individual data points in individual experiments. Some of this would fit very nicely into the HStore experiments table

Nothing stops you having an hstore field for annotations, or a separate join table of hstore fields for them.

Different users will be inputting experimental data, I would use schema for this

Why? Surely the same argument applies, that you'll want to do cross-user aggregation too?

I'd just partition by experiment and have the user ID as part of the table's key.

There will be different experiment types – I think the best way to handle this would be storing these in entirely different databases

Why?

Do you ever think you might need to query across them or aggregate them? If so, do not store them in different DBs.

If they store similar kinds of data, just have them in the main table, and have some optional columns.

If they store mostly different kinds of data (mostly or entirely different columns) then use different tables in the same DB.