We have the following database structure
CREATE TABLE objects ( id int PRIMARY KEY, name text, address text ); CREATE TABLE tasks ( id int PRIMARY KEY, object_id int NOT NULL, actor_id int NOT NULL, description text ); CREATE TABLE actors ( id int PRIMARY KEY, name text );
The user enters a whitespace-separated list of words (search terms, basically) and we have to search for tasks, that satisfy the following: the task is a "match" if each search term occurs at least once in concatenation of task's description, the name and address of its associated object and the name of its associated actor.
Now, if we are not concerned about performance, we can just do something like this (given query "foo bar"):
SELECT t.id, t.description FROM tasks AS t INNER JOIN actors AS a ON t.actor_id = a.id INNER JOIN objects AS o ON t.object_id = o.id WHERE to_tsvector(concat_ws(' ', t.description, o.name, o.address, a.name)) @@ plainto_tsquery('foo bar');
Unfortunately, we are concerned about performance. The dataset will, probably, be as follows (and it is expected to grow):
- about 10000 objects
- about 1000 actors
- about 100000 tasks evenly distributed between objects
What I've considered:
Make a denormalized table like this:
CREATE TABLE task_documents ( id int PRIMARY KEY, doc tsvector )
The field "doc" will contain the concatenation of task's description, associated object's name and address and actor's name. We will have to create an index over this field and it will be used in full text search queries. This table will be updated in update / insert triggers on tasks, actors, objects.
Drawbacks: tons of duplicated data (this one I am not quite concerned about), and updates to the master tables will become unpredictable in terms of number of rows updated (say, you update a name of some object and now suddenly you must update thousands of rows in task_documents).
Honestly, I don't have any more (good) ideas. It is obviously impossible to create an index, spanning 3 tables, so that it will be used for the
WHERE clause in the original query.
Here's a sqlfiddle with DB schema and some data. I had to make it up, because we have no real data at the moment.