Mysql – Query Optimization for added value

MySQLperformancequery-performance

I have a query that's behaving a bit oddly. In my database I have a table called "records". It tells me a bunch of information about what applications a user ran on my company's machines. I'm trying to aggregate some statistics, but am having some odd issues with a query.

This query runs in about 6.5 minutes (~30 million entries in "records"). I would expect it to take longer when divisionName isn't specified, but it seems to be taking an unreasonable amount of time to finish (overnight and still chugging).

select divisionName, programName, count(usageID) 
    from records R 
    right join Programs P 
        on P.programID=R.usageProgramID 
    right join locate L 
        on L.computerID=R.usageComputerID 
    where divisionName="umbrella"
    group by programName
    order by programName asc
    INTO OUTFILE '/tmp/lab_prog_umbrella.csv'
    FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n';

Is there an alternate structure to speed up the query? I have an index on (computerID,divisionName) in locate and (programID,programName) in Programs as well as a multitude of indexes in records.

Note:
Programs contains 4 fields and locate contains 2. I don't think the joins are exceptionally large.

Edit:

Explain:

+----+-------------+-------+------+-----------------+-----------+---------+----------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys   | key       | key_len | ref                  | rows | Extra                                        |
+----+-------------+-------+------+-----------------+-----------+---------+----------------------+------+----------------------------------------------+
|  1 | SIMPLE      | L     | ref  | loc             | loc       | 27      | const                | 1195 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | R     | ref  | uprog,computers | computers | 34      | scf.L.computerID     | 1627 |                                              |
|  1 | SIMPLE      | P     | ref  | pid_name        | pid_name  | 43      | scf.R.usageProgramID |    1 | Using index                                  |
+----+-------------+-------+------+-----------------+-----------+---------+----------------------+------+----------------------------------------------+

Records Description:

+-----------------+-------------+------+-----+---------------------+-------+
| Field           | Type        | Null | Key | Default             | Extra |
+-----------------+-------------+------+-----+---------------------+-------+
| usageID         | varchar(24) | NO   | PRI | NULL                |       |
| usageWhen       | datetime    | NO   | PRI | 0000-00-00 00:00:00 |       |
| usageEnum       | int(11)     | YES  |     | NULL                |       |
| usageServerID   | int(11)     | YES  |     | NULL                |       |
| usageServerType | int(11)     | YES  |     | NULL                |       |
| usageProgramID  | varchar(40) | NO   | PRI |                     |       |
| usageLicenseID  | varchar(18) | YES  |     | NULL                |       |
| usageComputerID | varchar(31) | YES  | MUL | NULL                |       |
| usageExpansion  | varchar(0)  | YES  |     | NULL                |       |
| usageUser       | varchar(31) | YES  | MUL | NULL                |       |
| usageAddress    | varchar(28) | YES  |     | NULL                |       |
| usageGroup      | varchar(16) | YES  |     | NULL                |       |
| usageEvent      | int(11)     | YES  |     | NULL                |       |
| usageReason     | int(11)     | YES  |     | NULL                |       |
| usageTime       | int(11)     | YES  |     | NULL                |       |
| usageOtherTime  | varchar(25) | YES  |     | NULL                |       |
| usageGMTOffset  | int(11)     | YES  |     | NULL                |       |
| usageCount      | int(11)     | YES  |     | NULL                |       |
+-----------------+-------------+------+-----+---------------------+-------+

Locate Description:

+--------------+-------------+------+-----+---------+-------+
| Field        | Type        | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| computerID   | varchar(31) | YES  | MUL | NULL    |       |
| divisionName | varchar(24) | YES  | MUL | NULL    |       |
+--------------+-------------+------+-----+---------+-------+

Programs Description:

+----------------+-------------+------+-----+---------+-------+
| Field          | Type        | Null | Key | Default | Extra |
+----------------+-------------+------+-----+---------+-------+
| programID      | varchar(40) | YES  | MUL | NULL    |       |
| programName    | varchar(63) | YES  | MUL | NULL    |       |
| programVersion | varchar(31) | YES  |     | NULL    |       |
| category       | varchar(30) | YES  |     | NULL    |       |
+----------------+-------------+------+-----+---------+-------+

Best Answer

  • Create foreign keys from RECORDS to PROGRAMS ans LOCATE ( you don't mention if they exist ).
  • Use LEFT JOIN instead of RIGHT JOIN. After all RECORDS is the "strong" table in this query.
  • Group by R.usageProgramID instead of by ProgramName.

select divisionName, programName, count(usageID) 
    from records R 
    left join Programs P 
        on P.programID=R.usageProgramID 
    left join locate L 
        on L.computerID=R.usageComputerID 
    where divisionName="umbrella"
    group by R.usageProgramID 
    order by programName asc

Another alternative is to try this:

select
    t.divisionName, P.programName, count(*) as total
from (
        select L.divisionName, R.usageComputerID
        from records R 
        left join locate L 
        on L.computerID=R.usageComputerID 
        where L.divisionName="umbrella"
      ) t 
    left join Programs P 
        on P.programID=t.usageProgramID 
group by
    group by P.programName
    order by P.programName asc

Since the absence of FK maybe not helping.