Tags:
mysql
1
replication
1
slurm
1
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+ MySQL replication %TOC% ---++ Usage * slurm1 and slurm2 make use of master master replication to provide high availability. ---++ Requirements Semisynchronous replication requires MySQL 5.5 or newer. SL6 ships with MySQL 5.1.x as such you will have do download the newer RPMs from the MySQL website. These RPMs have been copied to the Phoenix repo and are available through yum. Note you will have to first install MySQL-shared-compat package as otherwise you will receive an error about conflicts. ---++ Setup * slurm1 and slurm2 are both masters * Both can be written to and changes will be replicated to the other node. * slurm1 and slurm2 are both semisynchronous masters and slaves * When a transaction is performed on either node the change is replicated to the other machine <img alt="" align="MIDDLE" src="%ATTACHURL%/slurm_replication1.jpg" /> ---++ Config ---+++ Install Download version 5.5 or newer RPMs from http://dev.mysql.com/downloads/mysql/ (version 5.6 is available in the phoenix repo) Install compatibility libs <verbatim> yum install MySQL-shared-compat </verbatim> Install the client and server <verbatim> yum install MySQL-client MySQL-server </verba %ICON{TODO}%tim> The first thing we have to do is change the password. The installation of MySQL-server will create the following file with a random password to log into MySQL with <verbatim> /root/.mysql_secret </verbatim> Log into MySQL and change the password <verbatim> mysql -p SET PASSWORD FOR 'root'@'localhost' = PASSWORD('newpwd'); </verbatim> Open another terminal and ensure you are able to log in with the new password. Now lets edit the config to define our replication. The below is from slurm2, on slurm1 the only values that would change are the server-id and the bind-address, these would be 2 and 148.187.64.42 respectively. <verbatim> vim /etc/my.cnf [mysqld] datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock user=mysql # Disabling symbolic-links is recommended to prevent assorted security risks symbolic-links=0 # Master-Slave config bind-address=148.187.64.43 server-id=1 # master log_bin=/var/log/mysql/mysql-bin.log binlog_do_db=slurm_acct_db # the slurmdbd database binlog_ignore_db = mysql binlog_ignore_db = test log-bin=mysql-bin [mysqld_safe] log-error=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid </verbatim> Restart MySQL for the changes to take affect <verba %ICON{TODO}%tim> service mysql restart </verbatim> ---+++ Master slave replication Now we need to log in and create the database if it does not exist. Note I upgraded from MySQL 5.1 to 5.6 after the slurm RPMs had been installed. *WARNING* There is a <a target="_blank" href="http://bugs.mysql.com/bug.php?id=68892" title="bug">bug</a> in 5.6 that breaks replication when a grant command is issued with incorrect syntax. <verbatim> mysql -p CREATE DATABASE slurm_acct_db; # Note replace 'password' in the below command with the actual password GRANT REPLICATION SLAVE ON *.* TO 'root'@'%' IDENTIFIED BY 'password'; FLUSH PRIVILEGES; show master status;ver for the primary slurmctld if it ever fails. The backup controller should be hosted on a node different from the node hosting the primary slurmctld. However, both hosts should mount a c +------------------+----------+---------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+---------------+------------------+-------------------+ | mysql-bin.000018 | 120 | slurm_acct_db | mysql,test | | +------------------+----------+---------------+------------------+-------------------+ 1 row in set (0.00 sec) </verbatim> Now we need to install the packages on the other server, edited the my.cnf and set the password as previously described. Lets now set up this server as a slave, first create the database if it does not exist. <verbatim> mysql -p CREATE DATABASE slurm_acct_db; </verbatim> Define the other machine as the master, again change 'password' to the actual password. Note the MASTER_LOG_FILE and MASTER_LOG_POS values are taken from the 'show master status;' output on the other node. <verbatim> CHANGE MASTER TO MASTER_HOST='148.187.64.43', MASTER_USER='root', MASTER_PASSWORD='password', MASTER_LOG_FILE='mysql-bin.000018', MASTER_LOG_POS=120; </verbatim> Now we have defined our master start the slave and ensure that the values are correct. Master_Host, Master_Log_File and Read_Master_Log_Pos are the entries of main interest. <verbatim> START SLAVE; show slave status \G </verbatim> Before enabling the other node as a master and defining slave replication in the other direction lets test the current replication. <verbatim> # On slurm2 use slurm_acct_db; create table testmaster21 (mid int(11) auto_increment, PRIMARY KEY (mid)) Engine=MyISAM; # On slurm1 check the table exists use slurm_acct_db; show tables; </verbatim> ---+++ Master master replication If all is good define master slave replication in the other direction. On slurm1 display the statussacctmgr ping ver for the primary slurmctld if it ever fails. The backup controller should be hosted on a node different from the node hosting the primary slurmctld. However, both hosts should mount a c <verbatim> show master status; +------------------+----------+---------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+---------------+------------------+-------------------+ | mysql-bin.000010 | 120 | slurm_acct_db | mysql,test | | +------------------+----------+---------------+------------------+-------------------+ 1 row in set (0.00 sec) </verbatim> Grant replication permission on the slave <verbatim> GRANT REPLICATION SLAVE ON *.* TO 'root'@'%' IDENTIFIED BY 'password'; FLUSH PRIVILEGES; show master status; </verbatim> On slurm2 define slurm1 as a master, again using the values from the other node for MASTER_LOG_FILE and MASTER_LOG_POS <verbatim> CHANGE MASTER TO MASTER_HOST='148.187.64.42', MASTER_USER='root', MASTER_PASSWORD='password', MASTER_LOG_FILE='mysql-bin.000010', MASTER_LOG_POS=120; </verbatim> Again start the slave and check the status <verbatim> START SLAVE; show slave status \G </verbatim> Ensure replication is working. <verbatim> # On slurm1 use slurm_acct_db; DROP TABLE testmaster21; # On slurm2 check the table no longer exists use slurm_acct_db; show tables; </verbatim> ---+++ Enabling semisynchronous replication *WARNING* MySQL 5.6 does not work with slurmdb at time of writting (19/08/2013) Semisynchronous replication is provided by plugins. As both slurm1 and slurm2 are masters and slaves to one another install both master and slave plugins <verbatim> INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so'; INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so'; </verbatim> By default these plugins are not automatically enabled so enable them on both systems. <verbatim> SET GLOBAL rpl_semi_sync_master_enabled = 1; SET GLOBAL rpl_semi_sync_slave_enabled = 1; </verbatim> The replication between master and slave will need to be restarted for the nodes to be recognised as semisynchronous slaves. <verbatim> STOP SLAVE IO_THREAD; START SLAVE IO_THREAD; </verbatim> To ensure the semisynchronous plugin is enabled across service restarts of mysql add the following lines to each severs under the [mysqld] section of /etc/my.cnf <verbatim> vim /etc/my.cnf rpl_semi_sync_master_enabled=1 rpl_semi_sync_master_timeout=1000 # 1 second rpl_semi_sync_slave_enabled=1 </verbatim> ---++ Testing TBA ---++ Monitoring There is a check_mysql plugin available that makes our lives easier here's how to implement it. * First install the check <verbatim> yum install nagios-plugins-mysql.x86_64 --enablerepo=epel </verbatim> * We now need to create a unprivileged user to log into the database to perform these checks with. <verbatim> #mysql shell #log in as root #change password to an actual password GRANT SELECT, REPLICATION CLIENT ON *.* TO 'nagios'@'localhost' IDENTIFIED BY 'password'; </verbatim> * Then update the nrpe and services files in cfengine and we're good to go.. Below is an example, the -S flag checks the slave status. <verbatim> #Change password to the real password command[check_mysql]=/usr/libexec/nagios-plugins/check_mysql -u nagios -p password -S -d slurm_acct_db </verbatim> ---++ HA testing %ICON{choice-yes}% Tested killing MySQL processes with -s 9 signal on primary DB host, sacctmgr commands continued to return correct info on both slurm1 and slurm2 %ICON{choice-yes}% Submitted a job, waited until it was running stopped MySQL gracefully on primary DB host job finished ok, other squeue returned successfully throughout %ICON{choice-no}% Brought down public interface on slurm2 after job started to run, caused sacctmgr commands to fail. It appears the slurmdbd deamon is required to be reachable in order for fail over to occur. * Killed slurmctrld (init script would not stop the service), started again and sacctmgr and sinfo returned, however job did not complete %ICON{choice-no}% Brought down public interface on slurm1 to test fail over of slurmctld: Note is appears StateSaveLocation in the slurm.conf needs to refer to a shared filesystem location for this to work. * squeue and sinfo ran but took a longer time to return * when a job was submitted it did not complete * %ICON{choice-yes}% Added shared location for the StateSaveLocation using the NAS mounted on /slurm. Jobs completed and I could submit jobs with the primary controller down but sinfo and sacctmgr were still slow to return. ---++ Issues * Upon initial install mysql failed to start due to the fact /var/log/mysql was not owned by the mysql user. * When a grant command is issued with incorrect syntax replication breaks, this is a know bug and is still open at the time of writting (19/08/2013) http://bugs.mysql.com/bug.php?id=68892 * If the slave status shows that it is having trouble connecting to the master ensure the firewall is not blocking communication on port 3306 and grant permissions have been given. If it is open the port and restart the slave (STOP SLAVE; START SLAVE;) * The syntax slurmdb uses when initially creating the tables in MySQL is not valid in MySQL 5.6 (works in 5.1). As such we cannot take advantage of semi synchronous replication at this time. TBA bug report with SchedMD. ---++ Further reading * Master master replication http://www.hbyconsultancy.com/blog/create-a-master-master-mysql-replication-ubuntu-server-10-04-x64.html * Semisynchronous replication http://dev.mysql.com/doc/refman/5.6/en/replication-semisync-installation.html -- Main.GeorgeBrown - 2013-08-12
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
jpg
slurm_replication1.jpg
r1
manage
18.0 K
2013-08-13 - 09:23
GeorgeBrown
slurm replication
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r13
<
r12
<
r11
<
r10
<
r9
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r13 - 2013-09-05
-
GeorgeBrown
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback