Tags:
create new tag
view all tags

PreProduction environment setup

General information

The PreProduction cluster is a set of KVM hosts (DSH group KVM_HOSTS: xen01,xen[04-08]) and a bunch of KVM guests (DSH group KVM_GUESTS: pp*, nagios). The set of hosts mount /kvm_guests2 off a NFS share exported by ppnfs and the guests use the files in it as a hard disk image:

Oct 31 09:49 [root@ppnfs:~]# ls -lh /kvm_guests2/
total 114G
drwx------ 2 root root 16K Oct 18 17:20 lost+found
-rw-r--r-- 1 root root 12G Oct 31 09:49 nagios.root
-rw-r--r-- 1 root root 12G Oct 31 09:49 ppapel.root
-rw-r--r-- 1 root root 12G Oct 31 09:49 ppargus01.root
-rw-r--r-- 1 root root 12G Oct 31 09:49 ppbdii01.root
-rw-r--r-- 1 root root 12G Oct 25 09:04 ppcluster.root
-rw-r--r-- 1 root root 12G Oct 31 09:49 ppcream01.root
-rw-r--r-- 1 root root 12G Oct 31 09:49 ppcream02.root
-rw-r--r-- 1 root root 12G Oct 31 09:49 pplrms01.root
-rw-r--r-- 1 root root 12G Oct 31 09:49 pplrms02.root
-rw-r--r-- 1 root root 12G Oct 31 09:49 pptopbdii.root
-rw-r--r-- 1 root root 12G Oct 31 09:49 ppui.root
-rw-r--r-- 1 root root 12G Oct 31 09:49 ppwms.root
-rw-r--r-- 1 root root 12G Oct 31 09:49 ppwn01.root
-rw-r--r-- 1 root root 12G Oct 31 09:49 ppwn02.root
-rw-r--r-- 1 root root 12G Oct 31 09:49 ppwn03.root
-rw-r--r-- 1 root root 12G Oct 31 09:49 ppwn04.root

The idea was to use the infiniband network to mount the guest's disks and the ethernet for normal ethernet operation, but because not all hosts have an Infiniband network card, some will use the ethernet card also to mount guest's disks.

These guests are fully virtualized, so we can test the same kernel version and packages that the production machines have, without any further compilation or tweak.

Administration console

The best way to manage all these guests and hosts is through the Convirture management console installed in pub.lcg.cscs.ch. To access it, use the link available in the local nagios side bar (which is, by the way, also a KVM guest) or point your web browser to pub. Username and password is the same of the dCache administration interface.

This tool allows you view the machine's screen output and to migrate live guests from host to host. Take special attention to network settings of the guests and hosts!!!

Installing a new host

Since RedHat released RHEL 6, it is much easier to create KVM guests and, therefore, the recommended OS for KVM hosts is SL6.

  1. Install a new machine with SL6 and make sure that the new machine is listed under the category of KVM_HOSTS and cfengine has ran. This does the following changes (among others):
    - Creates /kvm_guests2
    - Makes sure the Convirture software is installed.
    - Adds the mount of /kvm_guests2 from ppnfs.
    - Installs KVM packages
    - Copies the SSH key of the convirture user in pub
  2. If the machine has an IB card, run /root/set_infiniband.sh and add an entry to the infiniband network in /etc/hosts for all machines. Then, reboot.
  3. Install the Convirture dependencies and files:
    cd /root/convirture-tools/install/managed_server/scripts
    ./convirt-tool install_dependencies
    ./convirt-tool --detect_only setup
    ./convirt-tool setup
  4. If required (when machine is not ONLY in infiniband network), add the corresponding route back from ppnfs to the machine just installed. This can be done either by adding routes manually
    route add -host 148.187.67.115 gw 148.187.68.2 # xen05.lcg.cscs.ch via firewall
    route add -host 148.187.67.116 gw 148.187.68.2 # xen06.lcg.cscs.ch via firewall.
    Or by modifying /srv/cfengine/files/ppnfs/etc/sysconfig/network-scripts/route-ib0 accordingly.
  5. Try to connect as user convirt from pub to the new created machine. This would add the ssh key to the keyring of the user convirt in pub.
  6. Mount all the filesystems, and make sure that /kvm_guests2 is accessible.
  7. Add the machine to the Convirture console.

Installing a new guest

To create a new guest from scratch, follow these steps:

  1. Open the Convirture console, select one of the templates under Template Library (SL5-generic-PP is the base one for SL5 grid machines) and click provision.
  2. In the "Select a Target Node" window, choose the destination KVM host to be used.
  3. That opens the "Provision Virtual Machine" interface, which allows us to configure more settings. Special attention must be paid to the VM (the VM name), Memory (MB) boxes in the General tab and the MAC address of the br0 Network device under Networks tab. Usually, the MAC is formed like this: 00:10:3E:64:IP:IP. So, for example, if the machine to create is named ppcluster.lcg.cscs.ch with public IP 148.187.64.207, we would have to set the following:
    General/VM: ppcluster
    General/Memory (MB): 2048
    Networks/br0 MAC: 00:10:3E:64:02:07
    Please, take into account that this procedure might change in the future.
  4. Now it is time to add the corresponding DHCP entries in the dhcp configuration file cfengine.lcg.cscs.ch /srv/cfengine/files/LINUX/etc/dhcpd.conf.lcg.cscs.ch:
            host ppcluster-internal {
                    hardware ethernet 00:10:3E:64:02:07;
                    fixed-address 10.10.64.207;
                    option host-name "ppcluster.lcg.cscs.ch";
            }
  5. Once this is done, go to the dhcp server machine (at the time of writting this: xen03.lcg.cscs.ch), run cfengine and restart the dhcpd service.
  6. At this point, we need to tell the installing server (at the time of wrtting this: xen11.lcg.cscs.ch) what to do with the IP address that the client will be given. So, we need to create the symbolink link with the IP of the hostname in the 10.10 network:
    # cd /tftpboot/pxelinux.cfg/
    # ./ip2hex.sh 10.10.64.207
    0A0A40CF
    # ln -s SL57-service.kvm 0A0A40CF
    # ls -lha |grep 0A0A40CF
    lrwxrwxrwx  1 root root   16 Oct 31 15:28 0A0A40CF -> SL57-service.kvm
    Now it is a good moment to open the file SL57-service.kvm and the kickstart it points to and check if everything is okay with the configuration.
  7. Once all the previous steps are done, we should be ready to boot the machine and set it to install. Right click on the machine name, set the boot device to network and start it.
  8. When the kickstart file is processed, the guest will shut down and you will have to manually start it again to continue the setup process. This is done in this way because it is not possible to change the guest's boot order from the guest itself, so if you set it to boot using pxe, on each reboot it will continue booting using pxe until it is shut down and started again.
  9. After a few reboots, the guest machine should be operative.
  10. Update the machine packages and the kernel.
    yum update
    yum update kernel --disableexcludes=main
  11. Create cfengine key in cfengine:/srv/cfengine/ppkeys
    CFKEY_HOST=ppargus01
    CFKEY_IP=$(host $CFKEY_HOST |awk '{print $4}')
    cd /srv/cfengine/ppkeys/
    cfkey -f root-$CFKEY_IP
  12. Copy the keys to nfs:/export/kickstarts/private/cfengine/
    scp /srv/cfengine/ppkeys/root-$CFKEY_IP* nfs:/export/kickstarts/private/cfengine/
  13. Copy ssh keys to cfengine server.
    cd /srv/cfengine/private/ssh/
    mkdir $CFKEY_HOST
    ls se30|xargs -n1 --replace scp $CFKEY_HOST:/etc/ssh/{} $CFKEY_HOST/
  14. Check in ssh key to svn
    asvn add $CFKEY_HOST
    asvn commit $CFKEY_HOST --username miguelgi -m"New SSH keys for host $CFKEY_HOST"
  15. Create new known_host file.
    /srv/cfengine/scripts/new_known_hosts
  16. Run cfengine in the recently installed machine
    cfagent -q 

Reinstalling a guest

  1. Using the Convirture console, right click on the name of the guest and set the boot device to network. Start it.
  2. The guest should boot using the pxe configuration that the install server (at the time of writting: xen11.lcg.cscs.ch) has for it. For example, ppcream02 is configured as follows:
    # hostname
    xen11.lcg.cscs.ch
    # pwd
    /tftpboot/pxelinux.cfg
    # ./ip2hex.sh 10.10.64.212
    0A0A40D4
    # ls -lha |grep 0A0A40D4
    lrwxrwxrwx  1 root root   16 Sep  8 17:17 0A0A40D4 -> SL57-service.kvm
  3. Once the kickstart file is processed, the guest will shut down and you will have to manually start it again to continue the setup process. This is done in this way because it is not possible to change the guest's boot order from the guest itself, so if you set it to boot using pxe, on each reboot it will continue booting using pxe until it is shut down and started again.
  4. After a few reboots, the guest machine should be operative.
  5. As a good measure, you can always run cfengine once again.
    cfagent -q

Tips and tricks

What to do if the clock of the guests shifts

Sometimes, if the host does not have constant TSC, some tunning must be done to reduce the chances of the guests to shift the system's clock. To find out do

grep constant_tsc /proc/cpuinfo

If any output is given your CPU has the constant_tsc bit. If no output is given, follow the instructions below. For more info, refer to the RHEL 6 Virtualization manual

  1. Enable ntpdate and make sure it is actually updated (should be done if the machine belongs to the group KVM_GUESTS).
  2. Start cpuspeed daemon.
  3. Check the maximun value in /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
  4. Add that value to /etc/sysconfig/cpuspeed under MAX_SPEED and MIN_SPEED.
  5. Add processor.max_cstate=1 to the kernel line in /etc/grub.conf
  6. Reboot the host.

When installing a guest, it will fail because the first interface detected is the public one

Problem description: When trying to install a new guest machine, the virtual bios of the guest sees the public ethernet card as the first one to use (00:16:3E: MAC) and never manages to get the tftp part. This happens because at some point during the configuration of the PreProduction environment, some machines were configured with two virtual devices.

Solution: New guests must be configured to use only ONE interface of the host ( br0) with only ONE network card (00:10:3E: MAC). The kickstart file is configured to create a new virtual device on the system that will use the interface configured for both, the public IP and the private IP. This works because the 10.10 network runs in the same physical network of the public ethernet one.

echo "TYPE=Ethernet
DEVICE=eth0
BOOTPROTO=static
IPADDR=148.187.${IP_ADDR}
NETMASK=255.255.252.0
ONBOOT=yes 
USERCTL=no
IPV6INIT=no" > /etc/sysconfig/network-scripts/ifcfg-eth0

echo "DEVICE=eth0:1
BOOTPROTO=static
IPADDR=10.10.${IP_ADDR}
NETMASK=255.255.252.0
IPV6INIT=no
IPV6_AUTOCONF=no
ONBOOT=yes
TYPE=Ethernet" >/etc/sysconfig/network-scripts/ifcfg-eth0:1

What to do if you can't access the console of a guest from the web interface

Problem description: You no longer can access the VM console from the Web interface and you need to manually launch vncviewer.

Solution: This usually happens because the ssh-agent is not well configured or has gone bad for some reason. Follow these steps (taken from http://www.convirture.com/wiki/index.php?title=SSH_Setup) as the user convirt@pub.lcg.cscs.ch

eval 'ssh-agent -s'
# if you get "Could not open a connection to your authentication agent." when running ssh-add -l, do as follows
killall ssh-agent
exec ssh-agent bash
cd convirt
./convirt-ctl stop
./convirt-ctl start

MTU of interfaces within the VM

Problem description: You can not set the MTU of an interface within the virtual machine to 9000.

Solution: Check the MTU of the tap devices associated with the virtual machine, the bridges may be set correctly however if the MTU was different before the VM was started the tap devices will not have been updated.

Adding new disks to a VM

Problem description: You can not set the size of a disk when adding through the web interface.

Solution: You will first need to create the file on the host filesystem using dd, then add to convirture. This will initially show without a value for the size however after closing an re-opening the window it will appear.

-- MiguelGila - 2011-10-31

Topic attachments
I Attachment History Action Size Date Who Comment
JPEGjpg PP_network_diagram.jpg r3 r2 r1 manage 135.0 K 2011-10-31 - 09:46 MiguelGila PP_network_diagram
Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r9 - 2013-08-07 - GeorgeBrown
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback