Postgresql – What can go wrong using the same sequence across multiple tables in postgres


We are considering using a shared sequence to assign ids to primary keys for all of the tables in our database. There are about 100 of them. Only a couple are inserted to frequently and regularly. We want to rule out it being "a terrible idea for an obvious reason" before we moved to the phase of actually trying it and testing it at load.

Our peak load is of the order of 1000 inserts a second, across a couple of tables.

Our research thus far indicates that
– sequence generation speed shouldn't be an issue
– sequence fragmentation (gaps) will happen, but shouldn't be an issue
– id exhaustion won't be an issue

We're not sure if we're missing other big things. We'd be grateful for people's opinions, especially from people who have tried it before and had either positive or negative experiences.

For context – we have two main motivations for doing this.

One motivation to do this is so that we can define a bunch of dictionaries (we call them scopes) and have human readable words assigned to those ids, so we want to make sure that ids in different tables never overlap. So, in one scope, id 12345 might be assigned the value "Green" and in another it might be assigned "Verde". (Actually, we don't use it for internationalisation, but we might one day).

The other motivation is to make it easy to have several deployments in the field and know (by uniquely setting each deployment's sequence couple of most significant digits) that our deployments won't overlap primary keys. (Like a GUID lite).

Best Answer

Three possible issues that spring to mind are:

  1. With any shared resource you are creating a potential bottleneck. My gut says that for your peak load this should not be an issue but I strongly suggest benchmarking any such solution in a production-like production-sized environment to be sure.

  2. You are essentially assigning meaning to surrogate keys which defeats part of their purpose in RDB theory. A surrogate key by its nature should not have meaning beyond being a key for identifying tuples in that relation. If the entities might have meaning together and so need collision free keys, is it correct that they are being modelled separately or has something been missed in the requirements and/or data model design?

  3. You are introducing a potential point of failure. What if a deployment doesn't get its initial sequence starting point set? You then either have a deployment blocking error or deployments start from the same place "breaking" your feature. Also, what will you do if somewhere down the line someone thinks it is a good idea to branch a deployment (in production perhaps a tenant company divests part of itself and needs to separate out the data). What if the seed somehow gets reset by a bad upgrade deployment or other migration?[0]

If none of those issues concerns you then go ahead, the idea isn't going to break anything IMO. Of course there may be better ways even if this one isn't wrong in itself.

When you say "UUID-lite" you imply that you have already considered and discounted UUIDs. Is that the case, and if so are there particular reasons for deciding they are not suitable for this project?

One possible reason for not using UUIDs is index fragmentation though the significance of that is often greatly over-stated[1]. SQL Server's answer to this is the "sequential GUID" which is pretty much equivalent to what you are suggesting if we discount assigning meaning to key values - perhaps postgres has an equivalent to that? Of course always increasing indexes can have their own performance issues (last-page contention, index stats growing stale) in some very specific high volume workloads[2].

Another common argument against UUIDs is key length: why use 16 bytes per value when 4 or 8 will suffice? If the uniqueness is truly a useful property then this will usually trump key-size concerns significantly. If key-size is a concern but you are happy to use a 64-bit INT rather than needing to keep inside 32-bits you could use your technique without adding a potential shared-resource contention issue by doing your seeded-integer-key idea per table[3] using a normal INT IDENTITY(<start>, 1)[4] column definition, though again this is adding deployment complexity (a small amount, but certainly not zero).

Human readability is sometimes cited as a problem, but that goes back to assigning meaning to surrogate keys.

Compressibility is a less common concern but one you might come across. To just about any compression algorithm UUIDs are likely to look like random (therefor uncompressible) data unless you are using something like SQL server's sequential UUIDs. This might be a concern for a very large set of links (or other block of data) that contains many entity IDs served to an application over a slow network, or if needing to use something like SQL Server's index compression features, though both these matters are essentially just restating the key size concern in a slightly different way and sequential UUIDs may help here too.

[0] this could happen for normal identity columns too of course, but as you are using a less common feature you are increasing the chance of a less experienced DBA after you missing the problem if it happens once you are off doing something new and exciting elsewhere!

[1] I'm a SQL Server guy, I suspect the potential issue is the same in postgres but for all I know it may have a different index layout that can mitigate the effect.

[2] Though again these may be SQL Server specific especially the latter of the two examples I listed

[3] Top two bytes: vary by database, next two: vary by table, remaining four: the incrementing bits

[4] That is MS SQL Server syntax, postgres syntax may vary but you shoudl see what I mean and be able to translate

tl;dr: if you find yourself reinventing the wheel, make sure that all of the existing designs really aren't suitable before starting to considering why a new one might or might not be.