Tags:
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+ Diskless Install %TOC% ---++ Create the image Make a directory to work in <verbatim> mkdir /tmp/diskless </verbatim> Install the release package <verbatim> yumdownload sl-release rpm --root=/tmp/diskless -ivh --nodeps sl-release-6.4-1.x86_64.rpm </verbatim> Install packages required for the system <verbatim> yum --installroot=/tmp/diskless -y groupinstall core base --disableexcludes=main </verbatim> Note a non trivial amount is made up of the yum cache <verbatim> du -sh /tmp/diskless/var/cache/yum/ 165M /tmp/diskless/var/cache/yum/ </verbatim> ---+++ Chroot We can now chroot into the image Note if the user you are doing this with has their shell set to something other than bash it will not work unless said shell has been installed in the chroot dir <verbatim> chroot /tmp/diskless </verbatim> We need to create the following symlink in order for the system to boot <verbatim> ln -s ./sbin/init ./init </verbatim> Set the password for root <verbatim> passwd </verbatim> Ensure the network is started at boot <verbatim> echo NETWORKING=yes > etc/sysconfig/network </verbatim> Now you need to configure network interfaces and SSH so we can access the machine once booted. <verbatim> vim etc/sysconfig/network-scripts/ifcfg-eth1 vim etc/sysconfig/network-scripts/ifcfg-ib0 vim root/.ssh/authoized_keys </verbatim> Also disable SELINUX if required <verbatim> vim /etc/selinux/config </verbatim> ---+++ Chroot optional steps At this point we can exit the chroot and cpio the image as such these steps are optional. This is a good time to add anything that modifies the kernel as obviously booting into a modified kernel will not be possible, you will have to update the image. yum won't work as it requires /dev/urandom which does not exist in our chroot yet so to make life easier run the following outside the chroot <verbatim> mount --bind /dev /tmp/diskless/dev mount --bind /proc /tmp/diskless/proc </verbatim> ---++++ Build in Mellanox OFED Follow the instructions described on the following wiki page https://wiki.chipp.ch/twiki/bin/view/LCGTier2/MellanoxYumInstall Ensure the network interfaces are configured for DHCP <verbatim> vi /etc/sysconfig/network-scripts/ifcfg-ib0 TYPE=Infiniband DEVICE=ib0 BOOTPROTO=dhcp GATEWAY=148.187.64.2 ONBOOT=yes USERCTL=no MTU=65520 IPV6INIT=no DNS1=148.187.3.88 DNS2=148.187.18.88 CONNECTED_MODE=yes </verbatim> The MLNX_OFED ALL group install also installs MPI binaries we don't need, we will remove them as they take up a lot of space <verbatim> rpm -e --noscripts mvapich2 rpm -e --noscripts openmpi rpm -e --noscripts mpitests_openmpi rm -rf /usr/mpi </verbatim> ---++++ Build in GPFS We'll also need GPFS similarly with the Mellanox install follow the following guide on the wiki https://wiki.chipp.ch/twiki/bin/view/LCGTier2/OLDServiceGPFS#GPFS_Repo Note as we are running in a chroot 'uname -a' returns the kernel version of the host as we are not running the kernel from the chroot ---++++ Cleaning YUM Remember that space taken by the yum cache? Let's remove it. Current size from outside the chroot <verbatim> du -sh /tmp/diskless/var/cache/yum 287M /tmp/diskless/var/cache/yum </verbatim> After running 'yum clean all' from inside the chroot <verbatim> du -sh /tmp/diskless/var/cache/yum 189M /tmp/diskless/var/cache/yum </verbatim> Better but for yum reason yum isn't clearing all packages in the cache as such from inside the chroot <verbatim> rm -rf /var/cache/yum/* </verbatim> Now from outside the chroot <verbatim> du -sh /tmp/diskless/var/cache/yum/ 66M /tmp/diskless/var/cache/yum/ </verbatim> ---+++ Compress to CPIO Unmount /dev and /proc from the chroot <verbatim> umount /tmp/diskless/dev umount /tmp/diskless/proc </verbatim> Now it's time to compress this into an image. I'm using pigz which is gzip but parallel which is much faster (available from EPEL) <verbatim> cd /tmp/diskless/ find | cpio -ocv | pigz -9 > diskless.cpio.gz </verbatim> ---++ Boot over Infiniband Previously kickstart provisioning in Phoenix had been done over the Ethernet network, we will use this opportunity to change this make use of the Infiniband fabric. The following is based around the Mellanox FlexBoot. If you HCA is already a boot option skip to 'Find the DHCP Client Identifier of the HCA' You can find some useful reading material from Mellanox at the following link if you are unfamiliar with the workings of the Infiniband fabric. However it should be noted that this is somewhat dated. http://www.mellanox.com/related-docs/prod_software/Mellanox_PXE_User_Guide.pdf http://www.mellanox.com/pdf/BoIB/Boot-over-IB_User_Manual_1_0.pdf ---+++ Enable PXE in the HCA ROM Firstly we need to identify some details about the card we have in our system. the master branch and the upcoming 0.0.3 release require PuppetDB 1.5 / API v3. Find the PCI device for our card the '15b3' represents Mellanox HCA's <verbatim> lspci -n | grep 15b3 04:00.0 0280: 15b3:1003 </verbatim> We need to know the 'Device ID' and 'PSID' <verbatim> mstflint -d 04:00.0 q Image type: ConnectX FW Version: 2.11.1308 Device ID: 4099 Description: Node Port1 Port2 Sys image GUIDs: 001e6703000cc1d2 001e6703000cc1d3 001e6703000cc1d4 001e6703000cc1d5 MACs: 001e670cc1d3 001e670cc1d4 VSD: n/a PSID: INCX-3I355922151 </verbatim> ---++++ Update the firmware This is a good opportunity to update the firmware of the card. Search the Mellanox site for firmware for the PSID previously listed. In our case we found it at the following link http://www.mellanox.com/page/firmware_table_Intel Download un-tar and install with the following command. <verbatim> mstflint -d 04:00.0 -i /tmp/fw-ConnectX3-rel-2_11_1308-ConnectX3-Dual_A1-IOM-FDR.bin b </verbatim> You will have to reboot the machine for the new firmware to come into affect. ---++++ Add PXE option Now we have our firmware up to date lets install the FlexBoot option rom that will allow us to PXE boot. Download the latest version from Mellanox http://www.mellanox.com/page/products_dyn?product_family=34 Once you have downloaded this and unpacked you will see there are many versions. The version we want it the one matching the Device ID from the previous output, in our case this is '4099'. <verbatim> mstflint -d 04:00.0 brom /tmp/VPI/ConnectX-4099_3.4.142_VPI.mrom </verbatim> You will now have to reboot the machine. When I first rebooted I did not see any option for the IB ports as a PXE option. I confirmed the rom had been updated. <verbatim> mstflint -d 04:00.0 q Image type: ConnectX FW Version: 2.11.1308 Rom Info: type=PXE version=3.4.142 devid=4099 proto=VPI Device ID: 4099 Description: Node Port1 Port2 Sys image GUIDs: 001e6703000cc1d2 001e6703000cc1d3 001e6703000cc1d4 001e6703000cc1d5 MACs: 001e670cc1d3 001e670cc1d4 VSD: n/a PSID: INCX-3I355922151 </verbatim> I needed to go into the BIOS and ensure the IB was set as the first network device. This was on a Intel Sandy Bridge system, other platforms may differ. <verbatim> Boot Options -> Network Device Order -> MLNX FlexBoot </verbatim> Now save the change and reboot and select network boot. You should see the system start the iPXE client on the HCA. ---+++ Find the DHCP Client Identifier of the HCA If you are using Mellanox FlexBoot the iPXE firmware will use a traditionial MAC address. You can still use following if it's probably easier to use the normal MAC. The main difference for configuring the cluster to boot over Infiniband is that we need to give DHCP the client identifier of the HCAs rather than MAC addresses. The DHCP Client Identifier is can be split into four parts. * Prefix * ConnectX/ InfiniHost III/ InfiniHost III Ex cards have a prefix of; <verbatim>20:</verbatim> * Cards from Intel OEM (aka our WN) and Sun machines have a prefix of; <verbatim>ff:</verbatim> * Queue Pair number * ConnectX cards have a QP of; <verbatim>00:55:00:41:</verbatim> * InfiniHost III/ InfiniHost III Ex cards have a QP of; <verbatim>00:55:04:01:</verbatim> * Node GUID * This is unique to the HCA and can be found with the following (discard the 0x at the beginning); <verbatim>ibstat | grep Node</verbatim> * Port GID * This is unique to the port and can be found with the following (discard the 0x at the beginning); <verbatim>ibstat -p</verbatim> An example DHCP client identifier may be as follows <verbatim> 20:00:55:00:41:00:1e:67:03:00:39:cf:6c:00:1e:67:03:00:39:cf:6d </verbatim> ---++++ Use tcpdump to find the ID If you have any issues or are unsure what the correct ID should be use the following tcpdump command from your DHCP server. <verbatim> tcpdump -i ib0 -vvv -s 1500 '((port 67 or port 68) and (udp[8:1] = 0x1))' tcpdump: listening on ib0, link-type LINUX_SLL (Linux cooked), capture size 1500 bytes 16:36:07.007580 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328) 0.0.0.0.bootpc > 255.255.255.255.bootps: [udp sum ok] BOOTP/DHCP, Request, length 300, htype 32, hlen 0, xid 0x2a6b8504, secs 15, Flags [Broadcast] (0x8000) Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message Option 53, length 1: Discover Parameter-Request Option 55, length 13: Subnet-Mask, BR, Time-Zone, Classless-Static-Route Domain-Name, Domain-Name-Server, Hostname, YD YS, NTP, MTU, Option 119 Default-Gateway Client-ID Option 61, length 20: hardware-type 255, 00:00:00:00:00:02:00:00:02:c9:00:00:21:28:00:01:3e:8d:a4 <----# This is what we are looking for, note it does not include the prefix. END Option 255, length 0 PAD Option 0, length 0, occurs 19 </verbatim> ---+++ dhcpd.conf As part of the DHCP over IB specification, and as a requirement for the IB DHCP client to acquire the IP address, turning on the always broadcast option causes the DHCP server to broadcast the DHCP OFFER reply, instead of unicast. <verbatim> vim /etc/dhcp/dhcpd.conf ... always-broadcast on; ... # For FlexBoot/ iPXE we can pull from http filename "http://148.187.64.40/pxelinux.0 # For syslinux we can only use traditionial TFTP filename "pxelinux.0"; ... host wn67 { fixed-address 148.187.65.67; option host-name "wn67.lcg.cscs.ch"; # Only one of the below is required # MAC provided by FlexBoot/ iPXE # hardware ethernet 00:1e:67:0c:c1:d3; # Standard identifier for infiniband in DHCP option dhcp-client-identifier = ff:00:00:00:00:00:02:00:00:02:c9:00:00:1e:67:03:00:0c:c1:d3; } </verbatim> ---+++ Make the PXE boot option Again assuming you are using the Mellanox FlexBoot the HCA will start iPXE. If you are not using FlexBoot or the HCA does not run iPXE you will be bound to syslinux and most likely not able to use http (requires syslinux >= 5.10) ---++++ iPXE In the DHCP config point the nodes to an ipxe file <verbatim> vim /etc/dhcp/dhcpd.conf ... filename "http://148.187.64.40/boot.ipxe"; ... </verbatim> iPXE makes use of it's own syntax. For booting a standard vmlinuz and initrd you can simply point to the files provided from the OS ISO. Note the last line "boot" this is require otherwise iPXE will take no action after pulling down files and will simply move onto the next interface. Note the vmlinuz below was copied from the boot folder in our diskless chroot. <verbatim> cat /var/www/html/boot.ipxe #!ipxe kernel http://148.187.64.40/vmlinuz initrd http://148.187.64.40/diskless2.cpio.gz boot </verbatim> ---++++ Syslinux TBA ---++ Other notes ---+++ Logs We'll want to ensure all syslogs are sent to a remote server and store nothing locally, otherwise logs will eat our RAM. Alternatively if this is not possible for you mount /var on some form of persistent storage. -- Main.GeorgeBrown - 2013-11-27
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r8
<
r7
<
r6
<
r5
<
r4
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r8 - 2014-10-13
-
MiguelGila
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback