Tags:
create new tag
view all tags

Installing a new LCG Machine

The installation process has a fair number of steps, but it's actually pretty straightforward. Things you'll need:

  • the root password for ce03-lcg (sudo isn't good enough in this case)
  • the root password for newly installed systems that haven't gotten their real password file yet
  • the hardware remote console password for the machine you're trying to install (or access to its physical console)

There are also a couple of caveats:

  • This procedure doesn't cover installing Solaris, only Linux
  • If the machine has a bonded network interface, you need to temporarily plug the primary NIC's cable into a non-bonded interface in order to do the install.
  • The procedure isn't well-tested on the thumper.

Currently, this procedure works for the following operating systems:

  • Scientific Linux 3.0.8 (32-bit). Tested on Dell and x2200 nodes only. Does not support >4gb RAM. (hugemem kernel crashes, at least on x2200 machines)
  • Scientific Linux 4.4 (32-bit). Tested on Dell, x2200, and x4200 nodes. Supports large memory.
  • Scientific Linux 4.4 (64-bit). Tested on x2200 and x4500 nodes. Supports large memory.

Additionally, the netboot portion of the installation procedure should work for Scientific Linux 3.0.8 (64-bit). However, local customizations will not work since the 64-bit version of SLC3 seems to be missing some important RPMs.

Setting up the netboot environment

You need to configure the tftp server to install the correct version of Linux. Note that once this step is done, ANY netbooting of the affected machine will result in its disks being erased and reinstalled without further intervention! Be sure to undo this step at the end!

  • Log into ce03-lcg.projects.cscs.ch.
  • Check to make sure the machine has a proper entry in /etc/dhcpd.conf. If you're reinstalling an existing node, it should already be set up. (But if you're replacing a machine, you'll need to update the mac address.) The entries should look like this example:

host wn22-lcg {
        hardware ethernet 00:16:36:68:51:AA;
        fixed-address 148.187.33.122;
}

  • If you have to edit the entry, you'll need to run /etc/init.d/dhcpd restart to make the changes take effect.
  • The machine also needs to be in DNS. If it's not there already, you'll need to file a ticket to get the change taken care of.
  • You also need to configure tftp/kickstart to install the system. In /tftpboot/linux-install/pxelinux.cfg you'll find a number of links pointing to config files. For example:
lrwxrwxrwx    1 root     root           28 Mar  6 09:57 94BB217A -> gridgroup-standard-i386-slc4

94BB217A is just the hex representation of the IP address of this machine you're trying to install, in this case 148.187.33.122 (wn22-lcg.projects.cscs.ch).

  • To make this process easier, the script called /root/bin/ip will give you the IP address and its hex version if you give it a hostname, for example:
[root@ce03 pxelinux.cfg]# /root/bin/ip wn22-lcg
148.187.33.122  94BB217A
[root@ce03 pxelinux.cfg]#

So what you want to do is to make that hex IP address a link to the profile you want to install. In this case, that machine gets the 32-bit version of slc4. The profiles available as of this writing include:

  • gridgroup-standard-amd64-slc3 - Scientific Linux 3.0.8, 64-bit version (don't use without talking to tg)
  • gridgroup-standard-i386-slc3 - Scientific Linux 3.0.8, 32-bit version
  • gridgroup-standard-amd64-slc4 - Scientific Linux 4.4, 64-bit version
  • gridgroup-standard-i386-slc4 - Scientific Linux 4.4, 32-bit version
  • gridgroup-standard-boothd - This profile requests the machine to boot from the hard disk. It should be the default.
  • gridgroup-thumper-amd64-slc4 - Scientific Linux 4.4, 64-bit version for thumper/x4500 systems. (Not well tested!)

  • Once you've selected a profile to use, remove the old link (if any) from the hex IP address to a profile and link it to the profile you've chosen. For example:
ln -s gridgroup-standard-amd64-slc4 94BB217A

If you need to use a slightly different installation profile, please make a copy of the standard one and link to that rather than editing the standard one.

Once you've done this, it's time to netboot and do the installation.

Installing the OS over the network

NOTE THAT THIS STEP DESTROYS EVERYTHING ON THE LOCAL DISK!

  • Assuming you're installing one of the Sun systems, you need to get access to a web browser on a machine that can see the management network so you can use the remote console. If you've got the vmware client installed on your system, you can do this by attaching it to "ce03-lcg.projects.cscs.ch" (log in using your normal user account) and using the Windows XP virtual machine there. Otherwise, you can ssh into ce03-lcg and run a web browser, though this can be a bit slow.
  • If you're installing other hardware and have access to the console, you can skip this step and just go to the console.
  • Point your web browser (either on ce03-lcg or on the virtual XP machine) to the IP of the management interface for the machine you're trying to install. For MOST of the machines, this is 192.168.33._x_, where x is the last octet of the IP address.
  • IMPORTANT: wn40-wn42 are "off by one", wn40 is 148.187.33.141 but the management interface is 192.168.33.140, etc.

  • Once you've connected to the machine, the interface will be slightly different depending whether the machine is an x2200 or an x4000-series machine, but the basic concepts are the same.

  • Note If you're installing a new x2200 for the first time, you'll need to connect to the machine and click on the little "wrench" icon at the top after logging in and change "hot key setting 1" to be "F12". Once this has been done once on the machine you won't have to do it again.
  • Connect to the virtual console by clicking "Launch" on x2200 machines or selecting the "Remote Control" menu on x4000 machines.
  • Once you have the console launched, restart the machine either from within the console or by going back out to the web application and choosing "Power control" (for x2200) or "Remote Power Control" (for x4000) and selecting "reset" or "reboot".
  • Quickly switch back to the remote console.
  • On x2200 systems, wait for the Sun logo to come up and then choose "F12" from under the "Hotkey" menu to select a netboot. (You have to do this because the remote console can't seem to pass the "F12" key when you press it on your keyboard.
  • On x4000 systems, wait for the AMI BIOS screen to come up, and then press Control-N to netboot.

Once the neboot starts, you should see a DHCP/tftp/Linux startup sequence. It's usually best to keep an eye on it until it starts installing packages. Once you've reached that point, things should proceed without any problems. At the end, the machine will reboot into the new operating system.

Initial Configuration

  • After this is done, I like to log into the machine and do a "yum upgrade" followed by a reboot to make sure all of the latest patches are installed before I start further customizations.
  • Once the operating system is installed, log into the machine as root and do the following:
scp root@ce03-lcg.projects.cscs.ch:/var/cfengine/scripts/newmachine .
./newmachine

You'll be asked to enter the root password for ce03-lcg a bunch of times as it downloads important files for cfengine and copies cfengine and ssh key files to the server. At the end of this process, cfengine should run successfully. If it does, you've got a fairly standard LCG machine ready for use.

Cleanup

  • Make sure to go back onto ce03-lcg and remove the hex->profile link in order to prevent the machine from being accidentally reinstalled.

Turning it into a worker node

  • First, go to /var/cfengine/inputs/cf.groups on ce03-lcg and make sure the machine is listed as a worker node. (Don't forget that the cfengine input files use RCS.)
  • Then, log into the worker node as root and run cfagent -q again to make sure it has the necessary worker node config files.
  • Finally, run /root/worker_node_setup on the worker node.

If everything went OK, this should be all you need to do.

-- TomGuptill - 07 Mar 2007

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r5 - 2011-01-21 - PabloFernandez
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback