Service Configuration
Service Nodes
General Instructions
- install the OS: XenSampleImageReplication
- check kernel version, should be kernel-xen ≥ 2.6.18-194.17.1
- create cfengine key in
cfengine:/srv/cfengine/ppkeys
- copy the keys to
nfs:/export/kickstarts/private/cfengine/
-
scp /srv/cfengine/ppkeys/root-IPADRESS* nfs:/export/kickstarts/private/cfengine/
- copy
newmachine
script from xen03 and run it
- NOTE: This step takes a long time, wait until it's done and the machine is automatically rebooted.
-
scp xen03:/nfs/kickstarts/newmachine /root/ && /root/newmachine
- copy ssh keys to cfengine server:
-
cd /srv/cfengine/private/ssh/
-
mkdir HOSTNAME
-
ls se30|xargs -n1 --replace scp HOSTNAME:/etc/ssh/{} HOSTNAME/
- check in ssh key to svn
-
asvn add HOSTNAME
-
asvn commit HOSTNAME --username poettl -m'New SSH keys for host HOSTNAME'
- create new known_hosts file
-
/srv/cfengine/scripts/new_known_hosts
- run
/opt/cscs/sbin/install-glite
to configure gLite middleware (or do it by hand step by step...)
-
cfagent -qv
- reboot
Service Specific Notes
Pre-production WNs
Once all the previous steps have been done, Lustre has to be loaded to be able to sucessfuly run the last part of
/opt/cscs/sbin/install-glite
. In order to do that, you must make sure that the VM guest has two NICs, one public IP and the 10.10 IP.
In the XEN host:
Apr 06 16:00 [root@xen17:xen]# cat /etc/xen/ppwn02
name = "ppwn02"
vcpus = 2
memory = 4096
disk = ['phy:/dev/vg_root/ppwn02_root,xvda,w']
#vif = ['mac=00:16:3E:64:00:50,bridge=xenbr0','mac=00:16:10:64:00:50,bridge=xenbr2']
vif = ['mac=00:16:3E:67:00:02,bridge=xenbr1','mac=00:16:10:67:00:02,bridge=xenbr2']
bootloader = "/usr/bin/pygrub"
on_reboot = 'restart'
on_crash = 'destroy'
In the XEN guest, prepare the network:
Apr 06 16:02 [root@ppwn02:~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1
# Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+
DEVICE=eth1
BOOTPROTO=static
IPADDR=10.10.64.202
NETMASK=255.255.252.0
IPV6INIT=no
IPV6_AUTOCONF=no
ONBOOT=yes
TYPE=Ethernet
Apr 06 16:02 [root@ppwn02:~]# ifup eth1
Apr 06 16:02 [root@ppwn02:~]# ifconfig eth1
eth1 Link encap:Ethernet HWaddr 00:16:10:67:00:02
inet addr:10.10.64.202 Bcast:10.10.67.255 Mask:255.255.252.0
inet6 addr: fe80::216:10ff:fe67:2/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:18531 errors:0 dropped:0 overruns:0 frame:0
TX packets:1134 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:21221791 (20.2 MiB) TX bytes:236364 (230.8 KiB)
Apr 06 16:04 [root@ppwn02:~]# ping -c 4 10.10.64.201
PING 10.10.64.201 (10.10.64.201) 56(84) bytes of data.
64 bytes from 10.10.64.201: icmp_seq=1 ttl=64 time=0.112 ms
64 bytes from 10.10.64.201: icmp_seq=2 ttl=64 time=0.082 ms
64 bytes from 10.10.64.201: icmp_seq=3 ttl=64 time=0.081 ms
64 bytes from 10.10.64.201: icmp_seq=4 ttl=64 time=0.088 ms
--- 10.10.64.201 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2997ms
rtt min/avg/max/mdev = 0.081/0.090/0.112/0.017 ms
Now you need to install lustre RPMs for the running kernel and start it up. In the XEN Guest:
Apr 06 16:04 [root@ppwn02:~]# mount xen11:/nfs /media
Apr 06 16:06 [root@ppwn02:~]# uname -r
2.6.18-238.5.1.el5xen
Apr 06 16:06 [root@ppwn02:~]# rpm -ivh /media/rpms/xen_guest_lustre_1.8.4_238/lustre-*
Preparing... ########################################### [100%]
package lustre-modules-1.8.4-2.6.18_238.5.1.el5xen_201104061032.x86_64 is already installed
package lustre-1.8.4-2.6.18_238.5.1.el5xen_201104061032.x86_64 is already installed
Apr 06 16:06 [root@ppwn02:~]# mkdir -p /lustre/scratch
Apr 06 16:07 [root@ppwn02:~]# service lustre start
Apr 06 16:07 [root@ppwn02:~]# lfs df -h
UUID bytes Used Available Use% Mounted on
scratch-MDT0000_UUID 1.4T 1.8G 1.3T 0% /lustre/scratch[MDT:0]
scratch-OST0000_UUID 3.6T 174.3G 3.2T 4% /lustre/scratch[OST:0]
scratch-OST0001_UUID 3.6T 175.4G 3.2T 4% /lustre/scratch[OST:1]
scratch-OST0002_UUID 3.6T 181.0G 3.2T 4% /lustre/scratch[OST:2]
[...]
At this point you can run the last part of the archive and it will (hopefully) work!:
Apr 06 16:07 [root@ppwn02:~]# umount /media
Apr 06 16:07 [root@ppwn02:~]# /opt/glite/yaim/bin/yaim -c -s /opt/cscs/siteinfo/site-info.def -n WN -n TORQUE_client
Pre-production CREAM-CEs
In order for CREAM-CE to work well lustre has to be mounted. So the same steps executed before have to be followed.
lrms
Compile Torque 2.5.x with HA and create RPM's
- download newest version of torque
-
./configure --prefix=/usr --with-server-home=/var/spool/pbs --with-default-server=lrms02.lcg.cscs.ch,lrms01.lcg.cscs.ch --enable-high-availability
-
make rpm
- copy rpms to repo
-
scp /usr/src/redhat/RPMS/x86_64/torque{,-server,-mom,-client}-2.5.2-1cri.x86_64.rpm nfs01:/export/packages/repo
- on nfs01:
cd /export/packages/repo; createrepo .
lcg-CE
After the reboot the gridmap files have to be created. Either wait for the cron job running or run:
-
/opt/edg/sbin/edg-mkgridmap --output=/etc/grid-security/dn-grid-mapfile --safe
-
cp /etc/grid-security/dn-grid-mapfile /etc/grid-security/grid-mapfile.tmp; cat /etc/grid-security/voms-grid-mapfile >> /etc/grid-security/grid-mapfile.tmp; mv /etc/grid-security/grid-mapfile.tmp /etc/grid-security/grid-mapfile
CREAM-CE
References
Storage Element
There are some steps needed, but they're described at
DcacheOperations
Worker Nodes
Site BDII
For a detailed log of the last installation refer to:
https://webrt.cscs.ch/Ticket/Display.html?id=7962 , In short:
BDII_REGIONS="SE CE"
BDII_CE_URL="ldap://ce01.lcg.cscs.ch:2170/mds-vo-name=resource,o=grid"
BDII_SE_URL="ldap://storage01.lcg.cscs.ch:2170/mds-vo-name=resource,o=grid"
- Run the Yaim conf tool: /opt/glite/yaim/bin/yaim -c -s /opt/cscs/siteinfo/site-info.def -n BDII_site
- wget/configure/make/install LBCD, from http://archives.eyrie.org/software/system/lbcd-3.3.0.tar.gz
- Check iptables
- service lbcd start # that's it, it should appear in the DNS list, IFF DT has included it in the master LBCD node
ui64
You need to do
yum groupinstall glite-UI
/opt/glite/yaim/bin/yaim -c -s /misc/siteinfo/site-info.def -n UI
nagios
References
--
PeterOettl - 2010-03-01