I am using open source columnar dbms (postgresql+Cstore).
I am trying to speed up the response time of queries. My database has tables with strings mostly. String columns contain huge data (up to 3000 characters per value). A linear look up will take it too long to give results. My queries vary from simple '=' to like, contain and wildcards.
I am unable to conclude on what indexing should I use on string columns. I have hash table, trie structure, B+ tree index in mind.
The problem with hash table is collisions and even if use perfect hashing, there are chances of returning false positives. Moreover, hash index cannot solve like, contains and wild card queries. Apart from that, my data have duplicate values in columns like company name etc., and building perfect hash is getting difficult over values with duplicates in them.
Trie and B+ tree: Trie seems useful for all the queries, but as I mentioned, my strings are of huge length. Can trie be scaled to that extent? I have the same question regarding B+ Tree also.
So, to sum up, I have lengthy string columns and I need speed in query execution. Which indexing should I use, or is there any other method to speed up searches, apart from what I've mentioned?