Mysql – Performance impact of NOT using a foreign key (many dedicated 1:many key-key tables vs non-fk generic key-key table)


We're using a lot of association tables to manage 1:many relationships among a variety of different objects in our system.

To illustrate the question, two examples would be:

  1. users, events, ass_users_events. The ass_users_events would contain only the User_ID and Event_ID columns, both with foreign key relationships.
  2. projects, tasks, ass_projects_tasks. The ass_projects_tasks would contain only the Project_ID and Task_ID columns, both with foreign key relationships.

NB1: Every object table actually uses a combination of an Auto-Incrementing Integer Primary Key, plus a UUID column with a unique index which is the actual record ID. For the purpose of this question, we're using only UUID's so there is no chance of collision.

NB2: The reason we use this format rather than a straight Foreign Key column/index is that the reality is much more complex than this example, there are many different joins to many different tables and we don't want the ORM doing a lot of unnecessary work every time a record is loaded.

The problem is we're starting to create these associational tables for pretty much every new object type in the system for many of the other existing objects in the system and in the long run this doesn't seem sustainable, we'll end up with hundreds of association types.

A potential solution we're considering is getting rid of ALL of the current associational tables and instead creating one table with the following structure: obj_1_id, obj_1_type, obj_2_id, obj_2_type. Every column would be indexed, likely as composite indexes (i.e. INDEX object_1 (obj_1_type,obj_1_id) and INDEX object_2 (obj_2_type,obj_2_id)).

The examples above in the ass tables would become: abc,user,123,event, and def,project,456,task.

This solution gives us the flexibility to build as many relationship types as we want between different objects, and there's ample indexing on the ass table to remain performant. My question is is there a downside to using indexing only during joins, vs having defined foreign key relationships in smaller tables, but potentially hundreds of them?

EDIT: I think there are some misconceptions below so this might clear up a couple things:

  1. I use polymorphic object structures, but every object is stored in it's own table, i.e. user, product, category, event, etc.

  2. As far as the proposed ass table, it will only have 4 functional fields, with very straightforward datatypes (plus one ai_col as primary). The datatypes would be varchar(10) for the type cols and CHAR(32) / BINARY(16) for the id's. SELECT's will only ever have one join, to the ass table, i.e.

    SELECT event.* FROM event INNER JOIN ass ON ass.obj_1_type = 'event' AND ass.Object_ID_1 = event.Event_ID AND ass.obj_2_type='location'

there's no reason I would ever ask for two of these objects at once, I would only be filtering results from one table by existence of ass links.

  1. UUID's are the real Unique ID's here, but every table uses ai_col which is an INNODB construct for increasing the performance of their clustered index.

  2. This is a read heavy environment, I'm not near as concerned with insert/delete performance.

  3. We use handlersocket for simple read AND write queries removing the SQL overhead.

  4. The integrity of this table is acceptable at Eventually Consistent in this specific use case.

EDIT 2: To make this crystal clear, here's the why… if I have 200 objects each in their own tables, that I want to be able to link in all possible ways, I would either have to have 199 foreign key relationships in each object table, or I would have to create 40,000 tables with all the possible combinations and then all the business logic to go with that. Using a the method we're considering, there's one table and I have RI taken care of at the application level, which I indicated is ok at eventually consistent.

Best Answer

There is no performance impact of not using a foreign key. A foreign key constraint simply enforces referential integrity by constraining the values that can be inserted into the foreign key column (s).

If you are using UUIDS as identifiers, why do you need the type columns in the association table?

As you mention modelling this in the traditional manner would result in a large number of tables. Deleting, for example, a user would result in the RDBMS checking each foreign key constraint to ensure non are violated, which could take some time! With your proposed approach the RDBMS would have no foreign key constraints to check and the user would be deleted quicker. So you actually get better performance deleting objects. As you noted you'll have to have a background job that makes the association table eventually consistent. Regarding this, you could consider keeping a log of deleted objects and replaying it against the associations table. This would be better performing than carrying out a existence check on each row to see if the referenced objects still exist.

I've used this approach for several years now to allow anything to relate to anything else and haven't encountered any issues.