Should I combine these tables into one, or keep them seperate

database-designnormalizationtable

In a question/answer forum component I am writing as part of a site, I have two tables that theoretically look like this(that is, I want said data kept for each entity):

THREAD                 POST
thread_id              post_id   
type_id (question)     writer_id
writer_id              thread_id  
title                  parent_id
body (optional)        post_body
followers              date
merge_id               upvotes
date                   dnvotes
upvotes                replies
dnvotes                views
replies                post_type_id (e.g. answer, comment to an answer etc.)
views
anonymous

The only real differences between these entities are merge_id (if a question is merged into another one, this id will reference said question_id), followers, and title. Combining them into one seems a bit cumbersome and bloated to me, and seems to be starting off a bit denormalized I think. And, if they're one, both body and title would have to be be nullable which they shouldn't (I know there can be safeguards in the code).

I also thought of having thread consist of only the attributes not in post which would look like:

THREAD
thread_id (or post_id which would be primary and foreign key refing POST)
title
merge_id
followers

but that doesn't feel quite right either.

Also, if it makes any difference, I plan to have the questions displayed along with the highest ranking answer quite often, in the style of quora.

Any design advice would be much appreciated.

(also, I am keeping them someone generic (thread/post) so that if needed I can use these tables for a blog or whatever else later on.)

Best Answer

I do think you need two tables - there are things in thread that apply to the entire conversation, not just the initial "question". It makes a lot of sense to have everything that represents a user-entered message of some sort stored in one place. There may be some differences in formatting for the initial post vs. follow-ups, but presumably most things that would apply to rows in post would also apply to parallel fields in thread.

I would, however, look at eliminating redundancy between thread and post.

A thread must have a title, and can have a body; all follow ups do not have individual titles, but must have a body. You have multiple options here; a couple are:

  • If only a title is provided for the starting post, store that in post.body; if a separate title and body are provided, store the title in thread.title, and the body in post.body.
  • Allow for a NULL or empty body if and only if post.parent_id is NULL.

I'll assume that upvotes, dnvotes, replies, and views are basically counts. If those are distinct between a thread and the posts that follow-up to the thread, then the only question is should we track these numbers for both the thread and the initial post.

While it's a little trickier to set up, I would actually be inclined to put the post_id of the initial post in the thread table. It should be true (I expect) that the post row with the desired thread_id and where parent_id is NULL should be the initial post. However, if users are allowed to delete posts, and if one user's follow-up to the initial post can be the parent for another user's post, then to delete a post with children, you'd have to set the children's parent_ids to the parent_id of the deleted post; you wouldn't have the option to leave the reply posts in the thread without having a parent_id.

If the initial post's ID is part of thread, then fields like upvotes and dnvotes would not exist at the thread level - the thread would simply pass on whatever values were tied to that initial post row.