Mysql – Unusually low insert rate on MySQL Slaves

MySQLperconaperformancereplication

I have 4 MySQL nodes replicating like this:

M1 – S1
|
M2 – S2

Only the M1 master is writing, the hardware is similar (the slaves are a bit beefier), they all run Percona 5.7.

The trouble is that when M1 has a lot of inserts in a small time frame, the slaves lag behind. While M1 and M2 ar able to insert at a rate of thousands per second, S2 seems limited ad 120 inserts/s. S1 varies between 70 and 180 but not more.

Here's the slave status on S2 during this time:

mysql> show slave status\G;

               Slave_IO_State: Waiting for master to send event
                  Master_Host: *******
                  Master_User: *******
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000799
          Read_Master_Log_Pos: 677668480
               Relay_Log_File: db2-relay-bin.000101
                Relay_Log_Pos: 568744098
        Relay_Master_Log_File: mysql-bin.000799
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 568743885
              Relay_Log_Space: 677668945
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 1118
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 2
                  Master_UUID: 0b400f69-3459-16e6-a835-14feb5d6c592
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Reading event from the relay log
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:

And here's the general mysql config of all the nodes:

[mysqld]
sql_mode=NO_ENGINE_SUBSTITUTION

# MyISAM 
key-buffer-size                = 32M

# SAFETY 
max-allowed-packet             = 16M
max-connect-errors             = 1000000
skip-name-resolve

# DATA STORAGE 
datadir                        = /var/lib/mysql/

# BINARY LOGGING modified 4 slave replication
server-id                      = 3
binlog_do_db                   = ********
log-bin                        = /var/lib/mysql/mysql-bin
expire-logs-days               = 14
sync-binlog                    = 1
binlog_format                  = ROW
relay_log_info_repository=TABLE
relay_log_recovery = ON

# CACHES AND LIMITS 
tmp-table-size                 = 32M
max-heap-table-size            = 32M
query-cache-type               = 0
query-cache-size               = 0
max-connections                = 2000
thread-cache-size              = 50
open-files-limit               = 65535
table-definition-cache         = 1024
table-open-cache               = 2048

# INNODB #
innodb-flush-method            = O_DIRECT
innodb-log-files-in-group      = 2
innodb-log-file-size           = 100M
innodb-flush-log-at-trx-commit = 1
innodb-file-per-table          = 1
innodb-buffer-pool-size        = 90G
innodb_buffer_pool_instances   = 48

# LOGGING 
log-error                      = /var/lib/mysql/mysql-error.log
slow-query-log                 = 1
slow-query-log-file            = /var/log/mysql/mysql-slow.log

Also, checking on the processes on S2 i only get the:

Waiting for master to send event
Reading event from the relay log

Any help or idea to get to the bottom of this would be highly appreciated.

Update: here's a visual enter image description here

Update 2: it's not just the inserts, it's everything except selects:
enter image description here

Best Answer

innodb-flush-log-at-trx-commit = 2 -- This will make lots of one-row inserts run faster, but be less crash-proof.

"Batch" inserts (multiple rows in a single INSERT) will run a lot faster. (Of course, this may or may not be practical.)

Use the Slowlog on the slave to determine whether something else is involved in the sluggishness. Include

long_query_time = 0.2
log_slow_admin_statements = ON
log_slow_slave_statements = ON

Use, for example, pt-query-digest to summarize the slowlog.

What is the ping time between each pair of nodes in the replication setup?

M1 seems to have a burst of SELECTs every 2-3 seconds. M2 has a similar situation, but every 10 seconds. What is going on?