# Mysql – Unusually low insert rate on MySQL Slaves

MySQLperconaperformancereplication

I have 4 MySQL nodes replicating like this:

M1 – S1
|
M2 – S2

Only the M1 master is writing, the hardware is similar (the slaves are a bit beefier), they all run Percona 5.7.

The trouble is that when M1 has a lot of inserts in a small time frame, the slaves lag behind. While M1 and M2 ar able to insert at a rate of thousands per second, S2 seems limited ad 120 inserts/s. S1 varies between 70 and 180 but not more.

Here's the slave status on S2 during this time:

mysql> show slave status\G;

Slave_IO_State: Waiting for master to send event
Master_Host: *******
Master_User: *******
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000799
Relay_Log_File: db2-relay-bin.000101
Relay_Log_Pos: 568744098
Relay_Master_Log_File: mysql-bin.000799
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 568743885
Relay_Log_Space: 677668945
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 1118
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
Master_UUID: 0b400f69-3459-16e6-a835-14feb5d6c592
Master_Info_File: /var/lib/mysql/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Reading event from the relay log
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:


And here's the general mysql config of all the nodes:

[mysqld]
sql_mode=NO_ENGINE_SUBSTITUTION

# MyISAM
key-buffer-size                = 32M

# SAFETY
max-allowed-packet             = 16M
max-connect-errors             = 1000000
skip-name-resolve

# DATA STORAGE

# BINARY LOGGING modified 4 slave replication
server-id                      = 3
binlog_do_db                   = ********
log-bin                        = /var/lib/mysql/mysql-bin
expire-logs-days               = 14
sync-binlog                    = 1
binlog_format                  = ROW
relay_log_info_repository=TABLE
relay_log_recovery = ON

# CACHES AND LIMITS
tmp-table-size                 = 32M
max-heap-table-size            = 32M
query-cache-type               = 0
query-cache-size               = 0
max-connections                = 2000
open-files-limit               = 65535
table-definition-cache         = 1024
table-open-cache               = 2048

# INNODB #
innodb-flush-method            = O_DIRECT
innodb-log-files-in-group      = 2
innodb-log-file-size           = 100M
innodb-flush-log-at-trx-commit = 1
innodb-file-per-table          = 1
innodb-buffer-pool-size        = 90G
innodb_buffer_pool_instances   = 48

# LOGGING
log-error                      = /var/lib/mysql/mysql-error.log
slow-query-log                 = 1
slow-query-log-file            = /var/log/mysql/mysql-slow.log


Also, checking on the processes on S2 i only get the:

Waiting for master to send event
Reading event from the relay log


Any help or idea to get to the bottom of this would be highly appreciated.

Update: here's a visual

Update 2: it's not just the inserts, it's everything except selects:

innodb-flush-log-at-trx-commit = 2 -- This will make lots of one-row inserts run faster, but be less crash-proof.

"Batch" inserts (multiple rows in a single INSERT) will run a lot faster. (Of course, this may or may not be practical.)

Use the Slowlog on the slave to determine whether something else is involved in the sluggishness. Include

long_query_time = 0.2

Use, for example, pt-query-digest to summarize the slowlog.
M1 seems to have a burst of SELECTs every 2-3 seconds. M2 has a similar situation, but every 10 seconds. What is going on?