This is a php script to view apache logs.
I want to save apache logs into a mysql database.
Then add some rules for tagging urls using mysql REGEXP search, like: SET tag='some tag' WHERE url REGEXP 'some pattern';
a) Should I use one table for storing all the urls every time they are accessed even if they repeat and then do the REGEXP search and apply the tag to all of them?
b) Or it would be better to save one table with unique urls, and a second table with the id of the url and the time accessed? Then the tagging will be applied this table that has less rows if the url repeats itself.
If option 'b' is better, what kind of index should I use for unique urls? varchar(4000) primary key? I was thinking about creating a md5 hash of the string of the url and use that as primary key because it will be shorter.
I ask this question because I want to know what would be best performance when:
- Tagging many urls with regexp search
- Importing thousands of urls into one table and make sure they are unique