Sql-server – How to identify when to create a new table to hold data that can be obtained from a query

performancequery-performancesql serversql-server-2005

We have a payment table, and agents get commission on payments. Commission is based on a few different factors, such as how long it took to get the payment, so there is some calculations involved when figuring out the commission rate the agent gets, but nothing obscenely complex.

For example, it will probably never be more complex than this:

SELECT Payments.Amount * CASE 
    WHEN DateDiff(year, Client.Received, Payments.DatePaid) = 1 THEN Rates.Rate1
    WHEN DateDiff(year, Client.Received, Payments.DatePaid) = 2 THEN Rates.Rate2
    ELSE Rates.Rate3 END

Would it make sense to build a 2nd table to hold this data instead of querying for it anytime it's needed? Or should I just stick with run-time queries that pull the data whenever it's requested?

And more importantly, what are the factors to use when determining if a query should be run anytime the data is needed, or if the data should be stored in a separate table of it's own?

Best Answer

If the query is run fairly infrequently (for example a report) then building the table on the fly is probably better1. If the query is run frequently and the temp table is required for performance then you potentially have a problem.

  • If the table is cheap to build, then do it as a temp table. As long as the database is fast enough you may get away with it. However you will need to keep an eye on performance.

  • If the table doesn't have to be totally up to date but will be the subject of relatively frequent reporting activity than a periodic rebuild is probably the best way to go.

  • If the table is expensive to build but needs to be up to date you may need to manage it as a denormalised structure, either maintained as an indexed view or through triggers. This is rather more complicated and places an additional burden on write operations.

    In more extreme cases (i.e. large data volumes) you may need a hybrid approach where historical data is queried from a denormalised structure optimised for performance and current data is queried from the live application.

    The most extreme cases of this can get you into low-latency data mart feeds and hybrid OLAP solutions, so this is by far the most complex in terms of how deep the rabbit hole can go. It's best avoided unless you have a genuine requirement.

In the case you describe above a periodic rebuild of a reporting table sounds appropriate. If you need to close in the middle of a day to run reports then you could provide a facility to force an update from the application. Otherwise run it on an overnight process and the agents can see their commission 'as at midnight on the previous working day.'

1 select into queries creating temp tables are quite fast on SQL Server because the insert operations are minimally logged.

So to summarize, you use the following factors to determine if you should have a new table for your data or not:

  • How often the data is needed
  • How expensive it is to get the data
  • How up-to-date the data needs to be