Service Configuration

Service Nodes

General Instructions

  • install the OS: XenSampleImageReplication
  • check kernel version, should be kernel-xen ≥ 2.6.18-194.17.1
    • yum upgrade kernel-xen
  • create cfengine key in cfengine:/srv/cfengine/ppkeys
    • cfkey -f root-IPADDRESS
  • copy the keys to nfs:/export/kickstarts/private/cfengine/
    • scp /srv/cfengine/ppkeys/root-IPADRESS* nfs:/export/kickstarts/private/cfengine/
  • copy newmachine script from xen03 and run it
    • ALERT! NOTE: This step takes a long time, wait until it's done and the machine is automatically rebooted.
    • scp xen03:/nfs/kickstarts/newmachine /root/ && /root/newmachine
  • copy ssh keys to cfengine server:
    • cd /srv/cfengine/private/ssh/
    • mkdir HOSTNAME
    • ls se30|xargs -n1 --replace scp HOSTNAME:/etc/ssh/{} HOSTNAME/
  • check in ssh key to svn
    • asvn add HOSTNAME
    • asvn commit HOSTNAME --username poettl -m'New SSH keys for host HOSTNAME'
  • create new known_hosts file
    • /srv/cfengine/scripts/new_known_hosts
  • run /opt/cscs/sbin/install-glite to configure gLite middleware (or do it by hand step by step...)
  • cfagent -qv
  • reboot

Service Specific Notes

Pre-production WNs

Once all the previous steps have been done, Lustre has to be loaded to be able to sucessfuly run the last part of /opt/cscs/sbin/install-glite. In order to do that, you must make sure that the VM guest has two NICs, one public IP and the 10.10 IP. In the XEN host:
Apr 06 16:00 [root@xen17:xen]# cat /etc/xen/ppwn02 
name = "ppwn02"

vcpus = 2
memory = 4096
disk = ['phy:/dev/vg_root/ppwn02_root,xvda,w']
#vif = ['mac=00:16:3E:64:00:50,bridge=xenbr0','mac=00:16:10:64:00:50,bridge=xenbr2']
vif = ['mac=00:16:3E:67:00:02,bridge=xenbr1','mac=00:16:10:67:00:02,bridge=xenbr2']

bootloader = "/usr/bin/pygrub"
on_reboot = 'restart'
on_crash = 'destroy'

In the XEN guest, prepare the network:

Apr 06 16:02 [root@ppwn02:~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1
# Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+
DEVICE=eth1
BOOTPROTO=static
IPADDR=10.10.64.202
NETMASK=255.255.252.0
IPV6INIT=no
IPV6_AUTOCONF=no
ONBOOT=yes
TYPE=Ethernet
Apr 06 16:02 [root@ppwn02:~]# ifup eth1
Apr 06 16:02 [root@ppwn02:~]# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:16:10:67:00:02  
          inet addr:10.10.64.202  Bcast:10.10.67.255  Mask:255.255.252.0
          inet6 addr: fe80::216:10ff:fe67:2/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:18531 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1134 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:21221791 (20.2 MiB)  TX bytes:236364 (230.8 KiB)

Apr 06 16:04 [root@ppwn02:~]# ping -c 4 10.10.64.201 
PING 10.10.64.201 (10.10.64.201) 56(84) bytes of data.
64 bytes from 10.10.64.201: icmp_seq=1 ttl=64 time=0.112 ms
64 bytes from 10.10.64.201: icmp_seq=2 ttl=64 time=0.082 ms
64 bytes from 10.10.64.201: icmp_seq=3 ttl=64 time=0.081 ms
64 bytes from 10.10.64.201: icmp_seq=4 ttl=64 time=0.088 ms

--- 10.10.64.201 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2997ms
rtt min/avg/max/mdev = 0.081/0.090/0.112/0.017 ms

Now you need to install lustre RPMs for the running kernel and start it up. In the XEN Guest:

Apr 06 16:04 [root@ppwn02:~]# mount xen11:/nfs /media
Apr 06 16:06 [root@ppwn02:~]# uname -r
2.6.18-238.5.1.el5xen
Apr 06 16:06 [root@ppwn02:~]#  rpm -ivh /media/rpms/xen_guest_lustre_1.8.4_238/lustre-*
Preparing...                ########################################### [100%]
        package lustre-modules-1.8.4-2.6.18_238.5.1.el5xen_201104061032.x86_64 is already installed
        package lustre-1.8.4-2.6.18_238.5.1.el5xen_201104061032.x86_64 is already installed
Apr 06 16:06 [root@ppwn02:~]# mkdir -p /lustre/scratch
Apr 06 16:07 [root@ppwn02:~]# service lustre start
Apr 06 16:07 [root@ppwn02:~]# lfs df -h
UUID                       bytes        Used   Available Use% Mounted on
scratch-MDT0000_UUID        1.4T        1.8G        1.3T   0% /lustre/scratch[MDT:0]
scratch-OST0000_UUID        3.6T      174.3G        3.2T   4% /lustre/scratch[OST:0]
scratch-OST0001_UUID        3.6T      175.4G        3.2T   4% /lustre/scratch[OST:1]
scratch-OST0002_UUID        3.6T      181.0G        3.2T   4% /lustre/scratch[OST:2]
[...]

At this point you can run the last part of the archive and it will (hopefully) work!:

Apr 06 16:07 [root@ppwn02:~]# umount /media 
Apr 06 16:07 [root@ppwn02:~]# /opt/glite/yaim/bin/yaim -c -s /opt/cscs/siteinfo/site-info.def -n WN -n TORQUE_client

Pre-production CREAM-CEs

In order for CREAM-CE to work well lustre has to be mounted. So the same steps executed before have to be followed.

lrms

Compile Torque 2.5.x with HA and create RPM's

  • download newest version of torque
  • ./configure --prefix=/usr --with-server-home=/var/spool/pbs --with-default-server=lrms02.lcg.cscs.ch,lrms01.lcg.cscs.ch --enable-high-availability
  • make rpm
  • copy rpms to repo
    • scp /usr/src/redhat/RPMS/x86_64/torque{,-server,-mom,-client}-2.5.2-1cri.x86_64.rpm nfs01:/export/packages/repo
    • on nfs01: cd /export/packages/repo; createrepo .

lcg-CE

After the reboot the gridmap files have to be created. Either wait for the cron job running or run:
  • /opt/edg/sbin/edg-mkgridmap --output=/etc/grid-security/dn-grid-mapfile --safe
  • cp /etc/grid-security/dn-grid-mapfile /etc/grid-security/grid-mapfile.tmp; cat /etc/grid-security/voms-grid-mapfile >> /etc/grid-security/grid-mapfile.tmp; mv /etc/grid-security/grid-mapfile.tmp /etc/grid-security/grid-mapfile

CREAM-CE

References

Storage Element

There are some steps needed, but they're described at DcacheOperations

Worker Nodes

Site BDII

For a detailed log of the last installation refer to: https://webrt.cscs.ch/Ticket/Display.html?id=7962 , In short:
BDII_REGIONS="SE CE"
BDII_CE_URL="ldap://ce01.lcg.cscs.ch:2170/mds-vo-name=resource,o=grid"
BDII_SE_URL="ldap://storage01.lcg.cscs.ch:2170/mds-vo-name=resource,o=grid"
  • Run the Yaim conf tool: /opt/glite/yaim/bin/yaim -c -s /opt/cscs/siteinfo/site-info.def -n BDII_site
  • wget/configure/make/install LBCD, from http://archives.eyrie.org/software/system/lbcd-3.3.0.tar.gz
  • Check iptables
  • service lbcd start # that's it, it should appear in the DNS list, IFF DT has included it in the master LBCD node

ui64

You need to do
yum groupinstall glite-UI
/opt/glite/yaim/bin/yaim -c -s /misc/siteinfo/site-info.def -n UI

nagios

  • wget http://www.sysadmin.hep.ac.uk/rpms/egee-SA1/centos5/x86_64/sa1-release-2-1.el5.noarch.rpm
  • rpm -ihv sa1-release-2-1.el5.noarch.rpm
  • yum install httpd
  • yum install libyaml.i386
  • yum install egee-NAGIOS lcg-CA

References

-- PeterOettl - 2010-03-01
Edit | Attach | Watch | Print version | History: r38 | r36 < r35 < r34 < r33 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r34 - 2011-04-07 - MiguelGila
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback