Mysql – Problems when Reinitalizing MySQL Replication using LVM on a very large Database


This is MySQL Ver 14.14 Distrib 5.5.57, for debian-linux-gnu (x86_64) using readline 6.3 on Debian 8. This MYSQL only hosts Zabbix but it is a very active Zabbix with over 600 hosts constantly sending it metrics. The Zabbix database itself weighs over 350GB.

The monitoring team recently failed Zabbix over to a secondary MySQL server that has now become the principal. For some unknown reason, we started getting Primary Key constraint errors, so the decision was made to reinitialize replication from scratch on the old principal.

Here are the steps I'm taking to reinitalize replication:

On the Secodary (SECONDARY SERVER)

service mysql stop


In the first SSH session connect to MySQL

  1. flush tables; flush tables with read lock; show master status;
  2. write down the bin file and log-file position
    Example: 'mysql-bin.001871', MASTER_LOG_POS = 21103447

In the second SSH session

  1. Create the snapshot: lvcreate --size 10G --snapshot --name snapMY01 /dev/mapper/vg_01-lv_mysql00

In the first SSH session still connected to MySQL

  1. unlock tables;

In the second SSH session

SCP copy the snapshot contents from snapshot to /var/lib/mysql to secondary

  1. mount /dev/vg_01/lv_mysql00 /tmp/mysql_restore/
  2. cd /tmp/mysql_restore/var/lib/mysql/
  3. scp -i ~/.ssh/some_ssh_key -r ./ somelogin@SECONDARY_SERVER:/var/lib/mysql/
    or alternatively you can use netcat


  1. switch to root account: sudo su –
  2. cd /var/lib/mysql
  3. rm debian-5.5.flag mysqld-relay-bin.* mysql_upgrade_info
  4. service mysql start
  5. Connect to MySQL
  6. stop slave;
  7. change master to MASTER_HOST = 'IP.ADDRESS.PRIMARY.SERVER', MASTER_USER = 'repli', MASTER_PASSWORD = 'repli', MASTER_LOG_FILE = 'mysql-bin.somefilenumber', MASTER_LOG_POS = some_log_pos;
  8. start slave;
  9. show slave status\G;


  1. Cleanup mount
  2. umount /tmp/mysql_restore
  3. lvremove /dev/vg_01/snapMY01
    Do you really want to remove and DISCARD active logical volume snapMY01? [y/n]: y

But, at the point that MySQL is started, there are lots of innodb corruption errors and a memory dump of the application.

InnoDB: Failing assertion: !page || (ibool)!!page_is_comp(page) == dict_table_is_comp(index->table)
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: about forcing recovery.
Fatal signal 6 while backtracing

The innodb db and logs files match in size between the principal and secondary so I don't understand where the corruption is coming from. I am able to start MySQL with force_recovery = 4 on the secondary but that's useless to me.

So these are my questions so far:

Is flush tables; flush tables with read lock; sufficent when taking an LVM snapshot of a very active database?

If not, is there anyway to take a coherent and viable snapshot of /var/lib/mysql without shutting down MySQL and Zabbix?

I am also considering mylvmbackup but I don't see what difference that would make since I'm already creating a lvm snapshot.

I have another working replica, so is it possible to chain replicas?

This would result in master -> slave -> slave

UPDATE 10/11/2017

2nd attempt, this time I shutdown the first secondary and copied its data via SCP to the second server. I am still getting corruption errors on the ibdata and ib_logfile files.

I ran a mysqlcheck on the whole server and can see entries like this:

Warning  : InnoDB: Index 'alerts_1' contains 1525245 entries, should be 1525262.
Warning  : InnoDB: Index 'alerts_2' contains 1525245 entries, should be 1525262.
Warning  : InnoDB: Index 'alerts_3' contains 1525245 entries, should be 1525262.

CHECK TABLE alerts; on the secondary I copied from, runs without error.

Still no solution at hand to bring back the other secondary server.

UPDATE 10/12/2017

Yesterday I stopped MySQL and took a snapshot after shutdown.
Copied the files with the netcat soltuion

source: tar cfv - * | netcat source_server_ip 9999

target: netcat -l -p 9999 | tar xfv -

And then started MySQL.

Server start resulted in MySQL replaying transactions and then getting these errors over and over.

InnoDB: A new raw disk partition was initialized or
InnoDB: innodb_force_recovery is on: we do not allow
InnoDB: database modifications by the user. Shut down
InnoDB: mysqld and edit my.cnf so that newraw is replaced
InnoDB: with raw, and innodb_force_... is removed.
171011 13:12:24 [ERROR] Slave SQL: Error 'Got error -1 from storage engine' on query. Default database: 'zabbix'. Query: 'insert into history_log (id,itemid,clock,ns,timestamp,source,severity,value,logeventid) values (2371918228,443643,1507721727,565645171,0,'',0,'11/10
/2017 13:30:41.558 [EC_CORE]Processing element \'variable\' with name \'tmpData_PositionFiscale\'.',0)', Error_code: 1030
171011 13:12:24 [Warning] Slave: Got error -1 from storage engine Error_code: 1030
171011 13:12:24 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.008917' position 447481453
171011 13:12:36 [Note] /usr/sbin/mysqld: Normal shutdown

171011 13:12:36 [Note] Event Scheduler: Purging the queue. 0 events
171011 13:12:36 [Note] Slave I/O thread killed while reading event
171011 13:12:36 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.008919', position 199662133
171011 13:12:36  InnoDB: Starting shutdown...
171011 13:12:38  InnoDB: Shutdown completed; log sequence number 17456777783702
171011 13:12:38 [Note] /usr/sbin/mysqld: Shutdown complete

I then put force recovery to 3 which stopped the endless stop and start loops. This option stops transaction rollbacks from occuring.

Today I

  1. Stopped MySQL on the other secondary
  2. Unmounted /var/lib/mysql
  3. lvm snapshot of /var/lib/mysql
  4. reformatted /var/lib/mysql on secondary I'm repairing.

Tomorrow, if this fails, I am going to restore the secondary with a backup of the working secondary.

Best Answer

LVM Snapshots can't reliably restore innodb tables and filespace

From MySQL's documentation:, there is this paragraph.

(But note that table file copying methods do not work if your database contains InnoDB tables. Also, even if the server is not actively updating data, InnoDB may still have modified data cached in memory and not flushed to disk.)

Which is probably why, even after shutting down MySQL, the ibdata and ib_logfiles were never able to recover after being copied to another, even identical server.

Using a backup/restore solution

The dilema is simple, a 350GB database that needs to be backed up and restored to a secondary over a 100MB link before the primary runs out of space to store all of the mysql-bin logs needed to restore replication.

Mydumper to the rescue

The primary server stores three days of bin logs until the purge log housekeeper removes them. So that leaves us with a 72 hour window. Using mysqldump, even with compression, the backup was possible, or the restore but not both.

  • Backup all the databases on the only working secondary

This example puts mydumper into daemon mode using 8 separate threads to backup all databases. This is key to finishing the backup well within the 72 hour window.

./mydumper --host ip.working.secondary \
-u backup_user -p some_strong_password \
-t 8 \
-D --outputdir=/mnt/some_directory/mydumper

By default, all databases are backed up.

  • Restore the offline secondary

Here, we are restoring all databases on MySQL with the same 8 threads options.

./myloader --host ip.offline.secondary 
-u backup_user -p some_strong_password 
-o --threads=8 

The -o option says, overwrite/create tables. --directory is where the backup was stored

The backup and restore started Thursday afternoon and finished Saturday morning.

In the ./0 directory there is a .metadata file that contains both the master and slave log positions. In this case, I just used the mysql-bin file and log position for the primary.

change master to MASTER_HOST = 'ip.primary', \
MASTER_USER = 'repli', MASTER_PASSWORD = 'repli', \
MASTER_LOG_FILE = 'mysql-bin.008935', \
MASTER_LOG_POS = 140611724;

enter image description here

And replication is backup, no more errors about ibdata.