Let's talk about an online service where goat/sheep/cattle/llama farmers would register and keep track of their animals.
One of the features of such a service is tracking parent/child relationship. However, it's not as trivial as adding a
parent_id field. Here are the reasons why:
There are many farmers, each with their own account; they're selling animals to each other. This means that not every parent can be entered into the system with all its properties. So there is no
parent_idprimary key to link to.
There are two IDs, national and internal. None of them, one of them or both of them can be entered. There should be a priority in determining the animal's parent. Internal IDs have priorities. Then comes national IDs. This means that if both are specified, the parent should be determined by an internal ID.
Not all animal cards are made public by their owners. This means that we cannot rely national IDs and look for parents through the whole database and fetch parents' data to display the tree.
Furthermore, the National ID is a NOT a unique field among the database (but is unique for each user and species). This is made so to allow farmer1 enter national ID "ABC" even if farmer2 entered national ID "ABC" for another animal as well (e.g. by mistake).
As you can see, this all makes impossible to rely on a single parent_id column. That's where come various approaches to solve the issue, and I ask your advice.
Introduced the following columns:
- mom_national_id (nullable)
- mom_internal_id (nullable)
- dad_national_id (nullable)
- dad_internal_id (nullable)
Unique indexes (
- mom_id BIGINT UNSIGNED (nullable)
- is_mom_id_national BOOL
- dad_id BIGINT UNSIGNED (nullable)
- is_dad_id_national BIGINT UNSIGNED (nullable)
So, here are my questions
Q1. Is either of the approaches above correct, or can you recommend a better one?
Q2. If both of them correct, which one would be better in terms of querying speeds for building parent/child trees? Does it matter at all if all is needed is just up to ~50 generations up or down?
Q3. Which indexes are needed for best performance?
Despite the solution should not be specific to database type, the service is being built with using PostgreSQL.