This is MySQL Ver 14.14 Distrib 5.5.57, for debian-linux-gnu (x86_64) using readline 6.3 on Debian 8. This MYSQL only hosts Zabbix but it is a very active Zabbix with over 600 hosts constantly sending it metrics. The Zabbix database itself weighs over 350GB.
The monitoring team recently failed Zabbix over to a secondary MySQL server that has now become the principal. For some unknown reason, we started getting Primary Key constraint errors, so the decision was made to reinitialize replication from scratch on the old principal.
Here are the steps I'm taking to reinitalize replication:
On the Secodary (SECONDARY SERVER)
service mysql stop
On (PRIMARY SERVER) (NEED TWO OPEN SESSIONS IN SSH)
In the first SSH session connect to MySQL
flush tables; flush tables with read lock; show master status;
- write down the bin file and log-file position
Example: 'mysql-bin.001871', MASTER_LOG_POS = 21103447
In the second SSH session
- Create the snapshot:
lvcreate --size 10G --snapshot --name snapMY01 /dev/mapper/vg_01-lv_mysql00
In the first SSH session still connected to MySQL
In the second SSH session
SCP copy the snapshot contents from snapshot to /var/lib/mysql to secondary
mount /dev/vg_01/lv_mysql00 /tmp/mysql_restore/
scp -i ~/.ssh/some_ssh_key -r ./ somelogin@SECONDARY_SERVER:/var/lib/mysql/
or alternatively you can use netcat
On (SECONDARY SERVER)
- switch to root account: sudo su –
rm debian-5.5.flag master.info mysqld-relay-bin.* relay-log.info mysql_upgrade_info
service mysql start
- Connect to MySQL
change master to MASTER_HOST = 'IP.ADDRESS.PRIMARY.SERVER', MASTER_USER = 'repli', MASTER_PASSWORD = 'repli', MASTER_LOG_FILE = 'mysql-bin.somefilenumber', MASTER_LOG_POS = some_log_pos;
show slave status\G;
On (PRIMARY SERVER)
- Cleanup mount
Do you really want to remove and DISCARD active logical volume snapMY01? [y/n]: y
But, at the point that MySQL is started, there are lots of innodb corruption errors and a memory dump of the application.
InnoDB: Failing assertion: !page || (ibool)!!page_is_comp(page) == dict_table_is_comp(index->table) InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. Fatal signal 6 while backtracing
The innodb db and logs files match in size between the principal and secondary so I don't understand where the corruption is coming from. I am able to start MySQL with force_recovery = 4 on the secondary but that's useless to me.
So these are my questions so far:
Is flush tables; flush tables with read lock; sufficent when taking an LVM snapshot of a very active database?
If not, is there anyway to take a coherent and viable snapshot of /var/lib/mysql without shutting down MySQL and Zabbix?
I am also considering mylvmbackup but I don't see what difference that would make since I'm already creating a lvm snapshot.
I have another working replica, so is it possible to chain replicas?
This would result in master -> slave -> slave
2nd attempt, this time I shutdown the first secondary and copied its data via SCP to the second server. I am still getting corruption errors on the ibdata and ib_logfile files.
I ran a mysqlcheck on the whole server and can see entries like this:
zabbix.alerts Warning : InnoDB: Index 'alerts_1' contains 1525245 entries, should be 1525262. Warning : InnoDB: Index 'alerts_2' contains 1525245 entries, should be 1525262. Warning : InnoDB: Index 'alerts_3' contains 1525245 entries, should be 1525262.
CHECK TABLE alerts; on the secondary I copied from, runs without error.
Still no solution at hand to bring back the other secondary server.
Yesterday I stopped MySQL and took a snapshot after shutdown.
Copied the files with the netcat soltuion
tar cfv - * | netcat source_server_ip 9999
netcat -l -p 9999 | tar xfv -
And then started MySQL.
Server start resulted in MySQL replaying transactions and then getting these errors over and over.
InnoDB: A new raw disk partition was initialized or InnoDB: innodb_force_recovery is on: we do not allow InnoDB: database modifications by the user. Shut down InnoDB: mysqld and edit my.cnf so that newraw is replaced InnoDB: with raw, and innodb_force_... is removed. 171011 13:12:24 [ERROR] Slave SQL: Error 'Got error -1 from storage engine' on query. Default database: 'zabbix'. Query: 'insert into history_log (id,itemid,clock,ns,timestamp,source,severity,value,logeventid) values (2371918228,443643,1507721727,565645171,0,'',0,'11/10 /2017 13:30:41.558 [EC_CORE]Processing element \'variable\' with name \'tmpData_PositionFiscale\'.',0)', Error_code: 1030 171011 13:12:24 [Warning] Slave: Got error -1 from storage engine Error_code: 1030 171011 13:12:24 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.008917' position 447481453 171011 13:12:36 [Note] /usr/sbin/mysqld: Normal shutdown 171011 13:12:36 [Note] Event Scheduler: Purging the queue. 0 events 171011 13:12:36 [Note] Slave I/O thread killed while reading event 171011 13:12:36 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.008919', position 199662133 171011 13:12:36 InnoDB: Starting shutdown... 171011 13:12:38 InnoDB: Shutdown completed; log sequence number 17456777783702 171011 13:12:38 [Note] /usr/sbin/mysqld: Shutdown complete
I then put force recovery to 3 which stopped the endless stop and start loops. This option stops transaction rollbacks from occuring.
- Stopped MySQL on the other secondary
- Unmounted /var/lib/mysql
- lvm snapshot of /var/lib/mysql
- reformatted /var/lib/mysql on secondary I'm repairing.
Tomorrow, if this fails, I am going to restore the secondary with a backup of the working secondary.