Tags:
create new tag
view all tags

Diskless Install

Create the image

Make a directory to work in

mkdir /tmp/diskless

Install the release package

yumdownload sl-release
rpm --root=/tmp/diskless -ivh --nodeps sl-release-6.4-1.x86_64.rpm

Install packages required for the system

yum --installroot=/tmp/diskless -y groupinstall core base --disableexcludes=main

Note a non trivial amount is made up of the yum cache

du -sh /tmp/diskless/var/cache/yum/
165M    /tmp/diskless/var/cache/yum/

Chroot

We can now chroot into the image Note if the user you are doing this with has their shell set to something other than bash it will not work unless said shell has been installed in the chroot dir

chroot /tmp/diskless

We need to create the following symlink in order for the system to boot

ln -s ./sbin/init ./init

Set the password for root

passwd

Ensure the network is started at boot

echo NETWORKING=yes > etc/sysconfig/network

Now you need to configure network interfaces and SSH so we can access the machine once booted.

vim etc/sysconfig/network-scripts/ifcfg-eth1
vim etc/sysconfig/network-scripts/ifcfg-ib0

vim root/.ssh/authoized_keys

Also disable SELINUX if required

vim /etc/selinux/config

Chroot optional steps

At this point we can exit the chroot and cpio the image as such these steps are optional. This is a good time to add anything that modifies the kernel as obviously booting into a modified kernel will not be possible, you will have to update the image.

yum won't work as it requires /dev/urandom which does not exist in our chroot yet so to make life easier run the following outside the chroot

mount --bind /dev /tmp/diskless/dev
mount --bind /proc /tmp/diskless/proc

Build in Mellanox OFED

Follow the instructions described on the following wiki page

https://wiki.chipp.ch/twiki/bin/view/LCGTier2/MellanoxYumInstall

Ensure the network interfaces are configured for DHCP

vi /etc/sysconfig/network-scripts/ifcfg-ib0

TYPE=Infiniband
DEVICE=ib0
BOOTPROTO=dhcp
GATEWAY=148.187.64.2
ONBOOT=yes
USERCTL=no
MTU=65520
IPV6INIT=no
DNS1=148.187.3.88
DNS2=148.187.18.88
CONNECTED_MODE=yes

The MLNX_OFED ALL group install also installs MPI binaries we don't need, we will remove them as they take up a lot of space

rpm -e --noscripts mvapich2
rpm -e --noscripts openmpi
rpm -e --noscripts mpitests_openmpi
rm -rf /usr/mpi

Build in GPFS

We'll also need GPFS similarly with the Mellanox install follow the following guide on the wiki

https://wiki.chipp.ch/twiki/bin/view/LCGTier2/OLDServiceGPFS#GPFS_Repo

Note as we are running in a chroot 'uname -a' returns the kernel version of the host as we are not running the kernel from the chroot

Cleaning YUM

Remember that space taken by the yum cache? Let's remove it.

Current size from outside the chroot

du -sh /tmp/diskless/var/cache/yum
287M    /tmp/diskless/var/cache/yum

After running 'yum clean all' from inside the chroot

du -sh /tmp/diskless/var/cache/yum
189M    /tmp/diskless/var/cache/yum

Better but for yum reason yum isn't clearing all packages in the cache as such from inside the chroot

rm -rf /var/cache/yum/*

Now from outside the chroot

du -sh /tmp/diskless/var/cache/yum/
66M     /tmp/diskless/var/cache/yum/

Compress to CPIO

Unmount /dev and /proc from the chroot

umount /tmp/diskless/dev
umount /tmp/diskless/proc

Now it's time to compress this into an image. I'm using pigz which is gzip but parallel which is much faster (available from EPEL)

cd /tmp/diskless/
find | cpio -ocv | pigz -9 > diskless.cpio.gz

Boot over Infiniband

Previously kickstart provisioning in Phoenix had been done over the Ethernet network, we will use this opportunity to change this make use of the Infiniband fabric.

The following is based around the Mellanox FlexBoot. If you HCA is already a boot option skip to 'Find the DHCP Client Identifier of the HCA'

You can find some useful reading material from Mellanox at the following link if you are unfamiliar with the workings of the Infiniband fabric. However it should be noted that this is somewhat dated.

http://www.mellanox.com/related-docs/prod_software/Mellanox_PXE_User_Guide.pdf

http://www.mellanox.com/pdf/BoIB/Boot-over-IB_User_Manual_1_0.pdf

Enable PXE in the HCA ROM

Firstly we need to identify some details about the card we have in our system. the master branch and the upcoming 0.0.3 release require PuppetDB 1.5 / API v3. Find the PCI device for our card the '15b3' represents Mellanox HCA's

lspci -n | grep 15b3
04:00.0 0280: 15b3:1003

We need to know the 'Device ID' and 'PSID'

mstflint -d 04:00.0 q 
Image type:      ConnectX
FW Version:      2.11.1308
Device ID:       4099
Description:     Node             Port1            Port2            Sys image
GUIDs:           001e6703000cc1d2 001e6703000cc1d3 001e6703000cc1d4 001e6703000cc1d5 
MACs:                                 001e670cc1d3     001e670cc1d4
VSD:             n/a
PSID:            INCX-3I355922151

Update the firmware

This is a good opportunity to update the firmware of the card. Search the Mellanox site for firmware for the PSID previously listed. In our case we found it at the following link

http://www.mellanox.com/page/firmware_table_Intel

Download un-tar and install with the following command.

mstflint -d 04:00.0 -i /tmp/fw-ConnectX3-rel-2_11_1308-ConnectX3-Dual_A1-IOM-FDR.bin b

You will have to reboot the machine for the new firmware to come into affect.

Add PXE option

Now we have our firmware up to date lets install the FlexBoot option rom that will allow us to PXE boot.

Download the latest version from Mellanox http://www.mellanox.com/page/products_dyn?product_family=34

Once you have downloaded this and unpacked you will see there are many versions. The version we want it the one matching the Device ID from the previous output, in our case this is '4099'.

mstflint -d 04:00.0 brom /tmp/VPI/ConnectX-4099_3.4.142_VPI.mrom

You will now have to reboot the machine. When I first rebooted I did not see any option for the IB ports as a PXE option. I confirmed the rom had been updated.

 
mstflint -d 04:00.0 q

Image type:      ConnectX
FW Version:      2.11.1308
Rom Info:        type=PXE  version=3.4.142 devid=4099 proto=VPI
Device ID:       4099
Description:     Node             Port1            Port2            Sys image
GUIDs:           001e6703000cc1d2 001e6703000cc1d3 001e6703000cc1d4 001e6703000cc1d5 
MACs:                                 001e670cc1d3     001e670cc1d4
VSD:             n/a
PSID:            INCX-3I355922151

I needed to go into the BIOS and ensure the IB was set as the first network device. This was on a Intel Sandy Bridge system, other platforms may differ.

 
Boot Options -> Network Device Order -> MLNX FlexBoot

Now save the change and reboot and select network boot. You should see the system start the iPXE client on the HCA.

Find the DHCP Client Identifier of the HCA

If you are using Mellanox FlexBoot the iPXE firmware will use a traditionial MAC address. You can still use following if it's probably easier to use the normal MAC.

The main difference for configuring the cluster to boot over Infiniband is that we need to give DHCP the client identifier of the HCAs rather than MAC addresses.

The DHCP Client Identifier is can be split into four parts.

  • Prefix
    • ConnectX/ InfiniHost III/ InfiniHost III Ex cards have a prefix of;
      20:
    • Cards from Intel OEM (aka our WN) and Sun machines have a prefix of;
      ff:
  • Queue Pair number
  • Node GUID
    • This is unique to the HCA and can be found with the following (discard the 0x at the beginning);
      ibstat | grep Node
  • Port GID
    • This is unique to the port and can be found with the following (discard the 0x at the beginning);
      ibstat -p

An example DHCP client identifier may be as follows

20:00:55:00:41:00:1e:67:03:00:39:cf:6c:00:1e:67:03:00:39:cf:6d

Use tcpdump to find the ID

If you have any issues or are unsure what the correct ID should be use the following tcpdump command from your DHCP server.

tcpdump -i ib0 -vvv -s 1500 '((port 67 or port 68) and (udp[8:1] = 0x1))'

tcpdump: listening on ib0, link-type LINUX_SLL (Linux cooked), capture size 1500 bytes
16:36:07.007580 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328)
    0.0.0.0.bootpc > 255.255.255.255.bootps: [udp sum ok] BOOTP/DHCP, Request, length 300, htype 32, hlen 0, xid 0x2a6b8504, secs 15, Flags [Broadcast] (0x8000)
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message Option 53, length 1: Discover
            Parameter-Request Option 55, length 13:
              Subnet-Mask, BR, Time-Zone, Classless-Static-Route
              Domain-Name, Domain-Name-Server, Hostname, YD
              YS, NTP, MTU, Option 119
              Default-Gateway
            Client-ID Option 61, length 20: hardware-type 255, 00:00:00:00:00:02:00:00:02:c9:00:00:21:28:00:01:3e:8d:a4   <----# This is what we are looking for, note it does not include the prefix.
            END Option 255, length 0
            PAD Option 0, length 0, occurs 19

dhcpd.conf

As part of the DHCP over IB specification, and as a requirement for the IB DHCP client to acquire the IP address, turning on the always broadcast option causes the DHCP server to broadcast the DHCP OFFER reply, instead of unicast.

vim /etc/dhcp/dhcpd.conf
...
always-broadcast on;
...
# For FlexBoot/ iPXE we can pull from http
filename "http://148.187.64.40/pxelinux.0

# For syslinux we can only use traditionial TFTP
filename "pxelinux.0";
...
        host wn67 {
                fixed-address 148.187.65.67;
                option host-name "wn67.lcg.cscs.ch";
          
                # Only one of the below is required

                # MAC provided by FlexBoot/ iPXE
                # hardware ethernet 00:1e:67:0c:c1:d3;

                # Standard identifier for infiniband in DHCP
                option dhcp-client-identifier = ff:00:00:00:00:00:02:00:00:02:c9:00:00:1e:67:03:00:0c:c1:d3;
        }

Make the PXE boot option

Again assuming you are using the Mellanox FlexBoot the HCA will start iPXE. If you are not using FlexBoot or the HCA does not run iPXE you will be bound to syslinux and most likely not able to use http (requires syslinux >= 5.10)

iPXE

In the DHCP config point the nodes to an ipxe file

vim /etc/dhcp/dhcpd.conf
...
filename "http://148.187.64.40/boot.ipxe";
...

iPXE makes use of it's own syntax.

For booting a standard vmlinuz and initrd you can simply point to the files provided from the OS ISO. Note the last line "boot" this is require otherwise iPXE will take no action after pulling down files and will simply move onto the next interface.

Note the vmlinuz below was copied from the boot folder in our diskless chroot.

cat /var/www/html/boot.ipxe 
#!ipxe

kernel http://148.187.64.40/vmlinuz
initrd http://148.187.64.40/diskless2.cpio.gz 
boot

Syslinux

TBA

Other notes

Logs

We'll want to ensure all syslogs are sent to a remote server and store nothing locally, otherwise logs will eat our RAM.

Alternatively if this is not possible for you mount /var on some form of persistent storage.

-- GeorgeBrown - 2013-11-27

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r8 - 2014-10-13 - MiguelGila
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback