Generally, Lustre should start on its own using heartbeat. It is chkconfig'd on on all the lustre nodes, and when the nodes reboot, they should negotiate between them and start all the services. It is advised to start Lustre using only service heartbeat start and just wait. The action of mounting the partitions turns on the lustre-related modules. Nothing further is needed. ALWAYS REMEMBER - <span style="color: #dc143c;"> *be patient!* </span> It takes about 3 times longer than you want it to! * LCGTier2.LustreDiskChangeProcedures * LCGTier2.LustreFreezeUpProcedures *OST Layout <table align="center" style="width: 80%;" border="0"> <tbody> <tr align="center"> <td></td> <td style="background-color: #0b5ef3"><span style="color: #f0ffff;"> *oss11* </span></td> <td style="background-color: #0b5ef3"><strong><span style="color: #f0ffff;">oss12</span></strong></td> <td style="background-color: #0b5ef3"><strong><span style="color: #f0ffff;">oss31</span></strong></td> <td style="background-color: #0b5ef3"><strong><span style="color: #f0ffff;">oss41</span></strong></td> </tr> <tr align="center"> <td align="center" style="background-color: #83e183"><span style="color: #f0ffff;"> *md11* </span></td> <td>OST0000 <br /></td> <td>OST0001</td> <td>OST0002</td> <td>OST0003</td> </tr> <tr align="center"> <td style="background-color: #83e183"><strong><span style="color: #f0ffff;">md13</span></strong></td> <td>OST0008<br /></td> <td>OST0009<br /></td> <td>OST000a<br /></td> <td>OST000b <br /></td> </tr> <tr align="center"> <td style="background-color: #83e183"><span style="color: #f0ffff;"> *md15* </span></td> <td>OST0010</td> <td>OST0011</td> <td>OST0012 <br /></td> <td>OST0013 <br /></td> </tr> <tr align="center"> <td style="background-color: #83e183"><span style="color: #f0ffff;"> *md17* </span></td> <td>OST0018</td> <td>OST0019</td> <td>OST001a <br /></td> <td>OST001b <br /></td> </tr> <tr align="center"> <td></td> <td></td> <td></td> <td></td> <td></td> </tr> <tr align="center"> <td></td> <td style="background-color: #0b5ef3"><span style="color: #f0ffff;"> *oss12* </span></td> <td style="background-color: #0b5ef3"><span style="color: #f0ffff;"> *oss22* </span></td> <td style="background-color: #0b5ef3"><span style="color: #f0ffff;"> *oss32* </span></td> <td style="background-color: #0b5ef3"><span style="color: #f0ffff;"> *oss42* </span></td> </tr> <tr align="center"> <td style="background-color: #83e183"><span style="color: #f0ffff;"> *md10* </span></td> <td>OST0004 <br /></td> <td>OST0005 <br /></td> <td>OST0006</td> <td>OST0007 <br /></td> </tr> <tr align="center"> <td style="background-color: #83e183"><span style="color: #f0ffff;"> *md12* </span></td> <td>OST000c</td> <td>OST000d <br /></td> <td>OST000e <br /></td> <td>OST000f <br /></td> </tr> <tr align="center"> <td style="background-color: #83e183"><span style="color: #f0ffff;"> *md14* </span></td> <td>OST0014</td> <td>OST0015</td> <td>OST0016 <br /></td> <td>OST0017</td> </tr> <tr align="center"> <td style="background-color: #83e183"><span style="color: #f0ffff;"> *md16* </span></td> <td>OST001c</td> <td>OST001d <br /></td> <td>OST001e <br /></td> <td>OST001f <br /></td> </tr> </tbody> </table> -- Main.JasonTemple - 2010-04-13 * [[%ATTACHURL%/CSCS_Lustre_Runbook_v0.8.odt][CSCS_Lustre_Runbook_v0.8.odt]]: This is the sun-provided Lustre information - how the fs was created, and how to generally use it... ---+++ Lustre FSCK procedure: 3 Steps: 1. e2fsck on the mds * unmount lustre everywhere, including on the lustre servers: * phoenix lustre service heartbeat start * start up the raids on the mds: * mdadm --assemble -c /etc/mdadm.conf.local /dev/md10 * next, mount gpfs so you have a workspace * phoenix lustre "/usr/lpp/mmfs/bin/mmstartup;sleep 2;/usr/lpp/mmfs/bin/mmmount scratch" * now, run the e2fsck: * e2fsck -n -v --mdsdb /gpfs/lustre_fsck/mdsdb /dev/md10 * this will output a metadata file which you use in the next steps 2. e2fsck on the oss machines * first, start up the raids using this script (/gpfs/jason/start_raid.sh): #!/bin/bash<br />OSSNAME=`uname -n | cut -d. -f1`<br />case ${OSSNAME} in<br />oss?1) OSTs="1 3 5 7" ;;<br />oss?2) OSTs="0 2 4 6" ;;<br />*) echo "Wrong node, exiting!"; exit 1 ;;<br />esac<br /> for i in $OSTs; do<br /> mdadm -A -c /etc/mdadm.conf.local /dev/md3$i 2>&1 | /usr/bin/logger -t "initlustre"<br /> mount /lustre/scratch/bmp1$i && /usr/bin/logger -t "initlustre" mounted bitmap device /dev/md3$i<br /> mdadm -A -c /etc/mdadm.conf.local /dev/md2$i 2>&1 | /usr/bin/logger -t "initlustre"<br /> mdadm -A -c /etc/mdadm.conf.local /dev/md1$i 2>&1 | /usr/bin/logger -t "initlustre"<br /> done * And you run it like this: * dsh -w oss[11-12,21-22,31-32,41-42] /gpfs/jason/start_raid.sh * Then, you run the e2fsck on the servers, like this: * dsh -w oss[11-12,21-22,31-32,41-42] sh /gpfs/jason/lustre_e2fsck.sh * using this script: #!/bin/bash OSSNAME=`uname -n | cut -d. -f1`<br />case ${OSSNAME} in<br />oss?1) OSTs="1 3 5 7" ;;<br />oss?2) OSTs="0 2 4 6" ;;<br />*) echo "Wrong node, exiting!"; exit 1 ;;<br />esac typeset LOG="/gpfs/lustre_fsck/$OSSNAME.out"<br />[[ -t 1 ]] && echo "Writing to logfile '$LOG'."<br />exec > $LOG 2>&1<br />exec < /dev/null 2<&1<br />for i in $OSTs; do<br />e2fsck -n -v --mdsdb /gpfs/lustre_fsck/mdsdb --ostdb /gpfs/lustre_fsck/${OSSNAME}.ostdb.${i} /dev/md1${i}<br />done which generates a logfile for each oss, as well as ostdb files for each raid:<br />Mar 16 14:30 [root@oss11:~]# ls /gpfs/lustre_fsck/<br />mdsdb oss11.ostdb.3 oss11.out oss12.ostdb.4 oss21.ostdb.1 oss21.ostdb.7 oss22.ostdb.2 oss22.out oss31.ostdb.5 oss32.ostdb.0 oss32.ostdb.6 oss41.ostdb.3 oss41.out oss42.ostdb.4<br />mdsdb.mdshdr oss11.ostdb.5 oss12.ostdb.0 oss12.ostdb.6 oss21.ostdb.3 oss21.out oss22.ostdb.4 oss31.ostdb.1 oss31.ostdb.7 oss32.ostdb.2 oss32.out oss41.ostdb.5 oss42.ostdb.0 oss42.ostdb.6<br />oss11.ostdb.1 oss11.ostdb.7 oss12.ostdb.2 oss12.out oss21.ostdb.5 oss22.ostdb.0 oss22.ostdb.6 oss31.ostdb.3 oss31.out oss32.ostdb.4 oss41.ostdb.1 oss41.ostdb.7 oss42.ostdb.2 oss42.out 3. run lfsck from a client: * stop the raids on all the servers * ssh mds1 mdadm --stop /dev/md10 * dsh -w oss[11-12,21-22,31-32,41-42] /gpfs/jason/stop_raid.sh * start lustre on all the servers * dsh -w oss[11-12,21-22,31-32,41-42] service heartbeat start * dsh -w mds[1,2] service heartbeat start * Make sure gpfs and lustre are running on a client and that e2fsprogs is installed, then start the lfsck (I run it from a script with typeset running /gpfs/jason/lfsck.sh): * lfsck -n -v --mdsdb /gpfs/lustre_fsck/mdsdb --ostdb /gpfs/lustre_fsck/oss11.ostdb.1 /gpfs/lustre_fsck/oss11.ostdb.3 /gpfs/lustre_fsck/oss11.ostdb.5 /gpfs/lustre_fsck/oss11.ostdb.7 /gpfs/lustre_fsck/oss12.ostdb.0 /gpfs/lustre_fsck/oss12.ostdb.2 /gpfs/lustre_fsck/oss12.ostdb.4 /gpfs/lustre_fsck/oss12.ostdb.6 /gpfs/lustre_fsck/oss21.ostdb.1 /gpfs/lustre_fsck/oss21.ostdb.3 /gpfs/lustre_fsck/oss21.ostdb.5 /gpfs/lustre_fsck/oss21.ostdb.7 /gpfs/lustre_fsck/oss22.ostdb.0 /gpfs/lustre_fsck/oss22.ostdb.2 /gpfs/lustre_fsck/oss22.ostdb.4 /gpfs/lustre_fsck/oss22.ostdb.6 /gpfs/lustre_fsck/oss31.ostdb.1 /gpfs/lustre_fsck/oss31.ostdb.3 /gpfs/lustre_fsck/oss31.ostdb.5 /gpfs/lustre_fsck/oss31.ostdb.7 /gpfs/lustre_fsck/oss32.ostdb.0 /gpfs/lustre_fsck/oss32.ostdb.2 /gpfs/lustre_fsck/oss32.ostdb.4 /gpfs/lustre_fsck/oss32.ostdb.6 /gpfs/lustre_fsck/oss41.ostdb.1 /gpfs/lustre_fsck/oss41.ostdb.3 /gpfs/lustre_fsck/oss41.ostdb.5 /gpfs/lustre_fsck/oss41.ostdb.7 /gpfs/lustre_fsck/oss42.ostdb.0 /gpfs/lustre_fsck/oss42.ostdb.2 /gpfs/lustre_fsck/oss42.ostdb.4 /gpfs/lustre_fsck/oss42.ostdb.6 /lustre/scratch/
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
odt
CSCS_Lustre_Runbook_v0.8.odt
r1
manage
161.3 K
2010-07-27 - 14:18
JasonTemple
This is the sun-provided Lustre information - how the fs was created, and how to generally use it...
This topic: LCGTier2
>
WebHome
>
HardwareInformation
>
LustreInformation
Topic revision: r5 - 2011-03-16 - JasonTemple
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback