Mysql – help required with indexing and query optimzation

MySQLperformancequery-performance

After browsing most of the posts and after running out for options I am posting here.
I have a table with 76,222,954 (76M) rows. It collects web site traffic based on country.

Here is my table structure:

CREATE TABLE `publication_month` (
  `entity_id` int(11) NOT NULL,
  `category_id` int(11) NOT NULL,
  `region_id` int(4) NOT NULL,
  `partner_id` int(11) NOT NULL DEFAULT '-1',
  `country_code` int(3) NOT NULL DEFAULT '0',
  `metric` int(2) NOT NULL,
  `time_index` int(11) NOT NULL DEFAULT '0',
  `mobile_value` bigint(20) DEFAULT '0',
  `desktop_value` bigint(20) DEFAULT '0',
  `mobile_last_value` bigint(20) DEFAULT '0',
  `desktop_last_value` bigint(20) DEFAULT '0',
  `flag` int(1) NOT NULL,
  `area_id` int(11) NOT NULL,
  PRIMARY KEY (`entity_id`,`category_id`,`region_id`,`partner_id`,`country_code`,
               `metric`,`time_index`,`area_id`),
  KEY `group_by_index` (`time_index`,`metric`,`partner_id`),
  KEY `distinct_entities` (`region_id`,`category_id`,`time_index`,`entity_id`,`partner_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE ( time_index)
(PARTITION 2008_09 VALUES LESS THAN (2008100000) ENGINE = MyISAM,
 PARTITION 2008_10 VALUES LESS THAN (2008110000) ENGINE = MyISAM,
// many partitons
 PARTITION 2012_11 VALUES LESS THAN (2012120000) ENGINE = MyISAM,
 PARTITION 2012_12 VALUES LESS THAN (2013010000) ENGINE = MyISAM,
 PARTITION catchall VALUES LESS THAN MAXVALUE ENGINE = MyISAM) */ 

Here are my index information

mysql> show keys from publication_month;
+-------------------+------------+-------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| Table             | Non_unique | Key_name          | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------------------+------------+-------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| publication_month |          0 | PRIMARY           |            1 | entity_id    | A         |      115140 |     NULL | NULL   |      | BTREE      |         |
| publication_month |          0 | PRIMARY           |            2 | category_id  | A         |      266513 |     NULL | NULL   |      | BTREE      |         |
| publication_month |          0 | PRIMARY           |            3 | region_id    | A         |     1249556 |     NULL | NULL   |      | BTREE      |         |
| publication_month |          0 | PRIMARY           |            4 | partner_id   | A         |     1732339 |     NULL | NULL   |      | BTREE      |         |
| publication_month |          0 | PRIMARY           |            5 | country_code | A         |    15244590 |     NULL | NULL   |      | BTREE      |         |
| publication_month |          0 | PRIMARY           |            6 | metric       | A         |    76222954 |     NULL | NULL   |      | BTREE      |         |
| publication_month |          0 | PRIMARY           |            7 | time_index   | A         |    76222954 |     NULL | NULL   |      | BTREE      |         |
| publication_month |          0 | PRIMARY           |            8 | area_id      | A         |    76222954 |     NULL | NULL   |      | BTREE      |         |
| publication_month |          1 | group_by_index    |            1 | time_index   | A         |          13 |     NULL | NULL   |      | BTREE      |         |
| publication_month |          1 | group_by_index    |            2 | metric       | A         |          68 |     NULL | NULL   |      | BTREE      |         |
| publication_month |          1 | group_by_index    |            3 | partner_id   | A         |         340 |     NULL | NULL   |      | BTREE      |         |
| publication_month |          1 | distinct_entities |            1 | region_id    | A         |          95 |     NULL | NULL   |      | BTREE      |         |
| publication_month |          1 | distinct_entities |            2 | category_id  | A         |        4493 |     NULL | NULL   |      | BTREE      |         |
| publication_month |          1 | distinct_entities |            3 | time_index   | A         |        4493 |     NULL | NULL   |      | BTREE      |         |
| publication_month |          1 | distinct_entities |            4 | entity_id    | A         |     1249556 |     NULL | NULL   |      | BTREE      |         |
| publication_month |          1 | distinct_entities |            5 | partner_id   | A         |     1732339 |     NULL | NULL   |      | BTREE      |         |
+-------------------+------------+-------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+

Here is the query most of the times I run.

SELECT traffic.entity_id, traffic.metric, traffic.partner_id
FROM publication_month AS traffic
WHERE traffic.region_Id IN (0,1,2,3,4,5,6) AND traffic.category_id =533 AND
      traffic.time_index BETWEEN 2012050100 AND 2012080100 AND 
      traffic.partner_id IN (-999,-1) AND  traffic.entity_id > 0 AND  
      traffic.country_code  >=0 AND traffic.metric IN (1,2,3,4,5) 
GROUP BY traffic.entity_id, traffic.metric, traffic.partner_id;

More or less all my queries will as above or some times I may have to query upon a group of countries or group of entity_ids.

Here is the EXPLAIN output for the above query

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: traffic
         type: range
possible_keys: PRIMARY,group_by_index,distinct_entities
          key: distinct_entities
      key_len: 16
          ref: NULL
         rows: 485802
        Extra: Using where; Using temporary; Using filesort
1 row in set (0.00 sec)

It is scanning for 485,802 rows to return 1500 rows. I have tried to create an index on country-time-entitid-category-id but that couldn't help me.

Any ideas or suggestions I would love to hear it.

Best Answer

region_id int(4) NOT NULL,

Learn about TINYINT UNSIGNED (1 byte) and SMALLINT UNSIGNED (2 bytes). INT is always 4 bytes. Smaller -> more cacheable -> faster.

WHERE traffic.region_Id IN (0,1,2,3,4,5,6) AND traffic.category_id =533 AND
  traffic.time_index BETWEEN 2012050100 AND 2012080100 AND 
  traffic.partner_id IN (-999,-1) AND  traffic.entity_id > 0 AND  
  traffic.country_code  >=0 AND traffic.metric IN (1,2,3,4,5) 

For an index to be useful, it should start with "=" fields and "IN" fields. After that, only one "range" field will be used. So:

INDEX(category_id, region_Id, Partner_id, xx) -- the order of the first 3 is not critical; start xx with whatever is usually most selective (other than time_index - see below).

Generally, when PARTITIONing, it is best to put the partition key (time_index) last in any indexes it is in. The range key is used for "partition pruning" before the INDEX is used. So, even if is too late in the index, that's ok.

Use EXPLAIN PARTITIONS SELECT ... to see that it is doing the "pruning" correctly.