Mysql – How to improve performance of a query with an IN subquery

MySQLoptimizationperformancequery-performance

Get the count of other users having the x tag with order_id in order_id of users having the y tag

table: MyCustomTable (200K rows)

  • order_id
  • user_id
  • type ('x' and 'y') my custom tags

and the query:

SELECT   COUNT(order_id),user_id
FROM     MyCustomTable
WHERE    type='x'
AND      order_id IN (SELECT order_id
                      FROM   MyCustomTable
                      WHERE  type   ='y'
                      AND    user_id='56')
AND      user_id <> '56'
GROUP BY user_id

Create table output

CREATE TABLE `MyCustomTable` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `order_id` int(11) NOT NULL,
 `user_id` int(11) NOT NULL,
 `type` enum('x','y') NOT NULL DEFAULT 'x',
 PRIMARY KEY (`id`),
 KEY `customeKey` (`user_id`,`type`,`order_id`),
) ENGINE=InnoDB AUTO_INCREMENT=360806 DEFAULT CHARSET=utf8

Best Answer

One thing to try with IN is to change it to EXISTS (with the proper modifications to the subquery):

SELECT   COUNT(order_id), user_id
FROM     MyCustomTable a
WHERE    type='x'
AND      EXISTS (SELECT 1
                      FROM   MyCustomTable b
                      WHERE  type   ='y'
                      AND    user_id='56'
                      AND    a.order_id = b.order_id)
AND      user_id <> '56'
GROUP BY user_id

Another one is to get rid of the subquery:

SELECT   COUNT(distinct a.order_id), a.user_id
FROM     MyCustomTable a
JOIN     MyCustomTable b
             ON a.order_id = b.order_id
WHERE    a.type='x'
AND      b.type='y'
AND      a.user_id <> '56'
AND      b.user_id = '56'
GROUP BY a.user_id

Test them and use EXPLAIN to check the plans used to execute them. Your question does not show indexes in that table, for fast execution some indexes will be needed - a multicolumn index on (type, user_id, order_id) covers both the main query and the subquery for the first case (and the same for your original query with IN) and works for the second case too.