Sql-server – SQL Server : refactor schema to avoid repetition

arraydatabase-sizeperformanceschemasql server

I have an enormous prod SQL Server database :

total size = 1.5 TB 
full backup duration = 4 hrs ( online )

that needs to be slimmed-down because it's causing a variety of issues for database maintenance :

  1. capacity planning & expansion ( cost of SAN storage , hit 2 TB LUN limit = must partition tables )
  2. backup duration & size of backups ( cost of storage )
  3. archive duration ( long-running SQL Agent jobs to delete rows in batches )

My [PlayerWeapon] table is a candidate for weight-reduction :

[PlayerWeapon] table
    [GUID] (PK , uniqueidentifier , NOT NULL)
    [PlayerGUID] (uniqueidentifier , NOT NULL)
    [WeaponGUID] (uniqueidentifier , NOT NULL)
    [BeltSlot] (tinyint , NULL)

ALTER TABLE [dbo].[PlayerWeapon] ADD  CONSTRAINT [PlayerWeapon_PK] PRIMARY KEY CLUSTERED 
(
    [GUID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

ALTER TABLE [dbo].[PlayerWeapon] ADD  CONSTRAINT [PlayerWeapon_PlayerGUID_WeaponGUID_UK] UNIQUE NONCLUSTERED 
(
    [PlayerGUID] ASC,
    [WeaponGUID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

Row Count = 500 M
Total Size = 125 GB
Data Size = 65 GB
Index Size = 60 GB

Because there's a set of 16 default weapons and each player has a maximum of 16 belt slots ( and those counts are not anticipated to ever be changed ) there's a possible "optimization" :

[Weapon] table
    [GUID] (PK , uniqueidentifier , NOT NULL)
    [DefaultIndex] (tinyint , NULL)

[Player] table
    [GUID] (uniqueidentifier , NOT NULL)
    [DefaultWeapons] (tinyint , NOT NULL)

where [DefaultWeapons] is a packed representation of a 16×16 bit array representing belt slot index vs default weapon index.

If a belt slot index does not have a corresponding default weapon index ( i.e. row is all zeros ), then [PlayerWeapon].[BeltSlot] is consulted for the (non-default) weapon for that belt slot.

A big con with this approach : if a later refactoring is needed then a complex data-transformation will be necessary on the prod dataset. Also, the unpacking ( either in t-sql or c# ) might not be highly performant.

What are the pros & cons of packing arrays into numeric fields ?

And I am wondering about other approaches to my problem of excessive table size ?

NOTE : table PKs are GUIDs ( rather than INT or BIGINT ) due to the design of an ORM layer built on top ( I have no ability to change this design ).

Best Answer

Ok, here we go.

To the particular problem: You should not use GUID's as primary keys. Move to ints. Use the GUID's as additional field with unique index (i.e. Weapons table has an int / long as PK, and the GUID as field).

Suddenly all your middle tables (PlayerWeapon) are a lot smaller. Case closed?

No. Because tehre is lot more wrong. You do not know your system well enough. Let me go through a number of your wrong assumptions:

capacity planning & expansion ( cost of SAN storage , hit 2 TB LUN limit = must partition tables )

No, you must not. You must have multiple FILES in your FILEGROUP. But a table relies in a FILEGROUP - so it can span multiple files. This is totally separate from partitioning, which is a SQL level thing (table partitioning). Which means this argument simply is non existing.

archive duration ( site downtime while rows are deleted )

No, you want those fast - learn about table partitioning. But also here there are nice ways to work around it, by slowly deleting during operations. Delete in a loop always 1000 or 2000 items (or more) and done - no downtime.