Mysql – Economic Database Architecture Possibilities w/ Data of Varying Length


I'm in the process of designing a MySQL database (InnoDB) that holds a large amount of economic data which our application scores. The size of the economic data varies a great deal: some measures, such as percentage changes, are fractions of a percent; others, such as national debt figures, are 14+ digits long. Furthermore, we have a business requirement that states some data points need to be correct to the 6th decimal place.

We currently have over 500,000 rows in the legacy database, but anticipate growing that number substantially in the future, as we are building the new database with point-in-time considerations and, for the most part, will not be deleting or updating rows, only adding new rows and superseding the old rows.

All potential tables containing this economic data will be structured as follows:

id | country_id | period_id | [economic_data] | data_type | date_created | date_superseded

My question is this:

Is it best to :

  1. Break out all of these individual economic data series into their own tables, given that the data comes in varying sizes, or
  2. Combine all this data into one massive table, given the identical structure of all economic data tables and the simplicity it would offer for writing queries?

We collect over 200 data series and are planning on increasing that number every year, so option 1 would require creating and maintaining 200+ tables.

Option 2 seems the easiest to develop and maintain, but I wonder what the implications might be on query performance and storage.

Any thoughts or suggestions?

Best Answer

I would separate the problem into transactions and analytics -- based on the question, seems that you are trying to find a design which would be optimal for both.

From a design point -- on the logical level -- I would use something like this, and would not worry about number of tables. Also, each attribute has proper data type, etc.

enter image description here

From this you may periodically (daily) publish to structures which are more analytic-friendly (flat OLAP tables, data marts ...). Depending on the performance -- and user expectation -- exposing 5NF views may be good enough.

On the physical level, I am not so sure :(

Structures like this are usually exposed to users as flat views (5NF) and via point-in-time functions. The main problem here is that the question is tagged MySQL. MySQL has a limit on number of tables that can appear in a join (61) and the query optimizer does not support table elimination; hence, forget the views. You would have to use the application level to "run-around" and join tables based on the ID and date; the application may be the ETL code that exports to analytic tables.

So, now it depends on how do you expose this to final users -- if they are supposed write custom queries this will not work.

It is a common approach to design a DB on a logical level without a regard for the target DB, but in this case the selection of the DB limits design options.