I’m looking for the best database design approach of a future web application that will store thousands of films and where (maybe) thousands of users will save their own films validations with up to thirty attributes each validation (attributes are likely to increase as the app grows). Validations are private; they’re not shared with other users.
With this scenario, when I designed the database months ago I ended up that the best approach would be the following:
- One table for “users” containing UserID, nickname, password…
- As many tables as necessary like “users_xxxx” to store related data of all users together.
- One table to store “films” (FilmID, Name…)
- UserID-numbered-tables like “film_validations_[user-ID]” with the following columns: FilmID and the thirty attributes (at least). So it could store the validation of all films of a unique user.
Sadly, the PROS I think I am getting with this approach are merely based on suppositions, not on performance tests (I am too novice to know how to do this kind of task). And the CONS about maintenance and performance are an imminent incoming headache for sure, like:
- Thousands of tables to store all film validations.
- Almost impossible to offer statistics based on global users film validations. Because it will be really hard to join too many tables, also having in mind the total will grow as new users sign up into the application.
- Any day I want to offer more film validation attributes I will have to massively execute ALTER TABLE.
- The DBMS has to operate with thousands of tables and open them before it can execute the queries.
I am still aware about these problems, but I really can’t figure out how to design it better.
Designing a single table with the UserID column, plus the thousands FilmID as columns (these would contain a Boolean value), plus the tens of film attributes looks really weird.
If we accept this approach, here is another problem: a heavy amount of columns, which in addition, will grow in the future.
Although I have read that a database has no problem managing millions of records in a single table (it would contain the thousands of film validations of [I hope] thousands of users), and the MySQL columns limit is much higher than my requirements (4.096 columns as described here), I don’t know if the performance would be poor.
Other design errors I consider about the single table approach are:
- Each record would contain as many empty columns as films minus one, which is awful. I mean, each film validation would carry the rest of FilmID columns, and that has no sense because it wouldn’t be neither necessary nor optimised.
- Querying for films with a validation would be a waste of memory. I guess I would have to code a query within a for loop to check which of the thousands FilmID columns contains a “true” value. In my current design, film validations are inserted into the table “film_validations_[user-ID]” as the user fill them. So with just a “SELECT * FROM film_validations_[user-ID]” is enough because I am sure it only contains rated films.
I am a recent graduate student as a web developer. Although I think I have read and learned enough about database normalization (Wikipedia, many StackOverflow Q&A before asking you, and other sources) to consider the single table approach as a bad option in this scenario, maybe in this case is the opposite having in mind all its future maintenance benefits.
Well, after some months, I am still wondering what would be the best approach designing this kind of database. Right now I’m working on moving the unfinished web application fully coded as procedural to a MVC pattern. So I think this is the right moment to correct my database design if necessary.
That’s why am asking for your suggestions based on professional expertise.
Thank you very much in advance.