---++!! Service Configuration %TOC% ---++ Service Nodes ---+++ General Instructions * install the OS: LCGTier2.XenSampleImageReplication * check kernel version, should be kernel-xen ≥ 2.6.18-194.17.1 * =yum upgrade kernel-xen= * create cfengine key in =cfengine:/srv/cfengine/ppkeys= * =cfkey -f root-IPADDRESS= * copy the keys to =nfs:/export/kickstarts/private/cfengine/= * =scp /srv/cfengine/ppkeys/root-IPADRESS* nfs:/export/kickstarts/private/cfengine/= * copy =newmachine= script from xen03 and run it * %X% NOTE: This step takes a long time, wait until it's done and the machine is automatically rebooted. * =scp xen03:/nfs/kickstarts/newmachine /root/ && /root/newmachine= * copy ssh keys to cfengine server: * =cd /srv/cfengine/private/ssh/= * =mkdir HOSTNAME= * =ls se30|xargs -n1 --replace scp HOSTNAME:/etc/ssh/{} HOSTNAME/= * check in ssh key to svn * =asvn add HOSTNAME= * =asvn commit HOSTNAME --username poettl -m'New SSH keys for host HOSTNAME'= * create new known_hosts file * =/srv/cfengine/scripts/new_known_hosts= * run =/opt/cscs/sbin/install-glite= to configure gLite middleware (or do it by hand step by step...) * =cfagent -qv= * reboot ---+ Service Specific Notes ---++ Worker Nodes ---+++ Worker Nodes ---+++ [PP] WNs Once all the previous steps have been done, Lustre has to be loaded to be able to sucessfuly run the last part of =/opt/cscs/sbin/install-glite=. In order to do that, you must make sure that the VM guest has two NICs, one public IP and the 10.10 IP. In the XEN host: <verbatim>Apr 06 16:00 [root@xen17:xen]# cat /etc/xen/ppwn02 name = "ppwn02" vcpus = 2 memory = 4096 disk = ['phy:/dev/vg_root/ppwn02_root,xvda,w'] #vif = ['mac=00:16:3E:64:00:50,bridge=xenbr0','mac=00:16:10:64:00:50,bridge=xenbr2'] vif = ['mac=00:16:3E:67:00:02,bridge=xenbr1','mac=00:16:10:67:00:02,bridge=xenbr2'] bootloader = "/usr/bin/pygrub" on_reboot = 'restart' on_crash = 'destroy' </verbatim> In the XEN guest, prepare the network: <verbatim>Apr 06 16:02 [root@ppwn02:~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1 # Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ DEVICE=eth1 BOOTPROTO=static IPADDR=10.10.64.202 NETMASK=255.255.252.0 IPV6INIT=no IPV6_AUTOCONF=no ONBOOT=yes TYPE=Ethernet Apr 06 16:02 [root@ppwn02:~]# ifup eth1 Apr 06 16:02 [root@ppwn02:~]# ifconfig eth1 eth1 Link encap:Ethernet HWaddr 00:16:10:67:00:02 inet addr:10.10.64.202 Bcast:10.10.67.255 Mask:255.255.252.0 inet6 addr: fe80::216:10ff:fe67:2/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:18531 errors:0 dropped:0 overruns:0 frame:0 TX packets:1134 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:21221791 (20.2 MiB) TX bytes:236364 (230.8 KiB) Apr 06 16:04 [root@ppwn02:~]# ping -c 4 10.10.64.201 PING 10.10.64.201 (10.10.64.201) 56(84) bytes of data. 64 bytes from 10.10.64.201: icmp_seq=1 ttl=64 time=0.112 ms 64 bytes from 10.10.64.201: icmp_seq=2 ttl=64 time=0.082 ms 64 bytes from 10.10.64.201: icmp_seq=3 ttl=64 time=0.081 ms 64 bytes from 10.10.64.201: icmp_seq=4 ttl=64 time=0.088 ms --- 10.10.64.201 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 2997ms rtt min/avg/max/mdev = 0.081/0.090/0.112/0.017 ms </verbatim> Now you need to install lustre RPMs for the running kernel and start it up. In the XEN Guest: <verbatim>Apr 06 16:04 [root@ppwn02:~]# mount xen11:/nfs /media Apr 06 16:06 [root@ppwn02:~]# uname -r 2.6.18-238.5.1.el5xen Apr 06 16:06 [root@ppwn02:~]# rpm -ivh /media/rpms/xen_guest_lustre_1.8.4_238/lustre-* Preparing... ########################################### [100%] package lustre-modules-1.8.4-2.6.18_238.5.1.el5xen_201104061032.x86_64 is already installed package lustre-1.8.4-2.6.18_238.5.1.el5xen_201104061032.x86_64 is already installed Apr 06 16:06 [root@ppwn02:~]# mkdir -p /lustre/scratch Apr 06 16:07 [root@ppwn02:~]# service lustre start Apr 06 16:07 [root@ppwn02:~]# lfs df -h UUID bytes Used Available Use% Mounted on scratch-MDT0000_UUID 1.4T 1.8G 1.3T 0% /lustre/scratch[MDT:0] scratch-OST0000_UUID 3.6T 174.3G 3.2T 4% /lustre/scratch[OST:0] scratch-OST0001_UUID 3.6T 175.4G 3.2T 4% /lustre/scratch[OST:1] scratch-OST0002_UUID 3.6T 181.0G 3.2T 4% /lustre/scratch[OST:2] [...]</verbatim> At this point you can run the last part of the archive and it will (hopefully) work!: <verbatim>Apr 06 16:07 [root@ppwn02:~]# umount /media Apr 06 16:07 [root@ppwn02:~]# /opt/glite/yaim/bin/yaim -c -s /opt/cscs/siteinfo/site-info.def -n WN -n TORQUE_client </verbatim> ---++ CREAM-CE ---+++ References * [[http://glite.web.cern.ch/glite/packages/R3.2/sl5_x86_64/deployment/glite-CREAM/glite-CREAM.asp][glite-CREAM]] * [[http://igrelease.forge.cnaf.infn.it/doku.php?id=doc:guides:devel:install-cream32][Installation Guide]] * [[https://twiki.cern.ch/twiki/bin/view/EGEE/GLiteCREAMCE][Service Reference Card]] * [[https://twiki.cern.ch/twiki/bin/view/LCG/Site-info_configuration_variables#cream_CE][YAIM Configuration Variables]] * [[https://twiki.cern.ch/twiki/bin/view/LCG/YaimGuide400#Notes_on_configuring_CREAM][YAIM Configuration Notes]] ---+++ [PP] CREAM-CEs In order for CREAM-CE to work well lustre has to be mounted. So the same steps executed before have to be followed. ---++++ Problem when installing tomcat rpms *Problem description*: When running =rpm -qa | grep tomcat5= you don't see the tomcat5 rpm installed. <verbatim>Apr 12 10:34 [root@ppcream02:~]# rpm -qa |grep tomcat5 tomcat5-server-lib-5.5.23-0jpp.17.el5_6.x86_64 tomcat5-jasper-5.5.23-0jpp.17.el5_6.x86_64 tomcat5-jsp-2.0-api-5.5.23-0jpp.17.el5_6.x86_64 tomcat5-common-lib-5.5.23-0jpp.17.el5_6.x86_64 tomcat5-servlet-2.4-api-5.5.23-0jpp.17.el5_6.x86_64</verbatim> And when you try to install it you get some errors: <verbatim>Loaded plugins: kernel-module Excluding Packages in global exclude list Finished Setting up Install Process Resolving Dependencies --> Running transaction check ---> Package tomcat5.x86_64 0:5.5.23-0jpp.17.el5_6 set to be updated --> Finished Dependency Resolution Beginning Kernel Module Plugin Finished Kernel Module Plugin Dependencies Resolved ================================================================================================================================================================================== Package Arch Version Repository Size ================================================================================================================================================================================== Installing: tomcat5 x86_64 5.5.23-0jpp.17.el5_6 sl-security 362 k Transaction Summary ================================================================================================================================================================================== Install 1 Package(s) Update 0 Package(s) Remove 0 Package(s) Total size: 362 k Is this ok [y/N]: y Downloading Packages: Running rpm_check_debug Running Transaction Test Finished Transaction Test Transaction Test Succeeded Running Transaction Installing : tomcat5 1/1 Error unpacking rpm package tomcat5-5.5.23-0jpp.17.el5_6.x86_64 warning: /etc/tomcat5/server.xml created as /etc/tomcat5/server.xml.rpmnew warning: /etc/tomcat5/tomcat5.conf created as /etc/tomcat5/tomcat5.conf.rpmnew error: unpacking of archive failed on file /usr/share/tomcat5/webapps: cpio: rename</verbatim> And/or you have broken links in =/usr/share/tomcat5= and/or =/var/lib/tomcat5= *Solution*: You have to completely erase all files within =/usr/share/tomcat5= and =/var/lib/tomcat5= and run yum and yaim again:<verbatim>Apr 12 10:37 [root@ppcream02:~]# yum install tomcat5-5.5.23-0jpp.17.el5_6.x86_64 # Replace the tomcat5 version with the relevant one!!!! Apr 12 10:37 [root@ppcream02:~]# rpm -qa |grep tomcat tomcat5-server-lib-5.5.23-0jpp.17.el5_6.x86_64 tomcat5-jasper-5.5.23-0jpp.17.el5_6.x86_64 tomcat5-jsp-2.0-api-5.5.23-0jpp.17.el5_6.x86_64 tomcat5-common-lib-5.5.23-0jpp.17.el5_6.x86_64 tomcat5-5.5.23-0jpp.17.el5_6.x86_64 tomcat5-servlet-2.4-api-5.5.23-0jpp.17.el5_6.x86_64 Apr 12 10:38 [root@ppcream02:~]# /opt/glite/yaim/bin/yaim -c -s /opt/cscs/siteinfo/site-info.def -n creamCE -n TORQUE_utils </verbatim> ---++++ Problem when submitting jobs *Problem description*: When submitting a job from the UI you get the following message <verbatim>Apr 12 10:33 [pablof@ui64:test_ppcream01]$ glite-ce-job-submit -a -r ppcream02/cream-pbs-atlas $PWD/jobs/hostname.jdl 2011-04-12 10:46:40,635 FATAL - Received NULL fault; the error is due to another cause: FaultString=[connection error] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] - FaultDetail=[Connection refused]</verbatim> And then you look into =/var/lib/tomcat5/webapps/= and you only see this <verbatim>Apr 12 10:46 [root@ppcream02:~]# ls -lh /var/lib/tomcat5/webapps/ total 4.4M -rw-r--r-- 1 root root 4.4M Apr 12 10:45 ce-cream.war </verbatim> *Solution*: You need to copy the directory =/var/lib/tomcat5/webapps/*= from another running instance of the cream-ce <verbatim> Apr 12 10:48 [root@ppcream02:~]# scp -r ppcream01:/usr/share/tomcat5/webapps/ce-crea* /usr/share/tomcat5/webapps/ pr 12 10:49 [root@ppcream02:~]# ls -lh /var/lib/tomcat5/webapps/ total 4.4M drwxr-xr-x 5 root root 4.0K Apr 12 10:49 ce-cream -rw-r--r-- 1 root root 4.4M Apr 12 10:49 ce-cream.war Apr 12 10:49 [root@ppcream02:~]# service gLite restart STOPPING SERVICES *** glite-ce-blahparser: Shutting down BNotifier: [FAILED] Shutting down BUpdaterPBS: [FAILED] *** glite-lb-locallogger: Stopping glite-lb-logd ... done Stopping glite-lb-interlogd ... done *** tomcat5: Stopping tomcat5: [ OK ] STARTING SERVICES *** tomcat5: Starting tomcat5: [ OK ] *** glite-lb-locallogger: Starting glite-lb-logd ...This is LocalLogger, part of Workload Management System in EU DataGrid & EGEE. [20453] Initializing... [20453] Parse messages for correctness... [yes] [20453] Send messages also to inter-logger... [yes] [20453] Messages will be stored with the filename prefix "/var/glite/log/dglogd.log". [20453] Server running with certificate: /DC=com/DC=quovadisglobal/DC=grid/DC=switch/DC=hosts/C=CH/ST=Zuerich/L=Zuerich/O=ETH Zuerich/CN=ppcream02.lcg.cscs.ch [20453] Listening on port 9002 [20453] Running as daemon... [yes] done Starting glite-lb-interlogd ... done *** glite-ce-blahparser: Starting BNotifier: [ OK ] Starting BUpdaterPBS: [ OK ] </verbatim> ---++ lrms ---+++ Compile Torque 2.5.x with HA and create RPM's * download newest version of [[http://www.clusterresources.com/downloads/torque/][torque]] * =./configure --prefix=/usr --with-server-home=/var/spool/pbs --with-default-server=lrms02.lcg.cscs.ch,lrms01.lcg.cscs.ch --enable-high-availability= * =make rpm= * copy rpms to repo * =scp /usr/src/redhat/RPMS/x86_64/torque{,-server,-mom,-client}-2.5.2-1cri.x86_64.rpm nfs01:/export/packages/repo= * on nfs01: =cd /export/packages/repo; createrepo .= ---++ lcg-CE After the reboot the gridmap files have to be created. Either wait for the cron job running or run: * =/opt/edg/sbin/edg-mkgridmap --output=/etc/grid-security/dn-grid-mapfile --safe= * =cp /etc/grid-security/dn-grid-mapfile /etc/grid-security/grid-mapfile.tmp; cat /etc/grid-security/voms-grid-mapfile >> /etc/grid-security/grid-mapfile.tmp; mv /etc/grid-security/grid-mapfile.tmp /etc/grid-security/grid-mapfile= ---++ Storage Element There are some steps needed, but they're described at LCGTier2.DcacheOperations ---++ BDII ---+++ Site BDII For a detailed log of the last installation refer to: https://webrt.cscs.ch/Ticket/Display.html?id=7962 , In short: * You need to ensure you have /etc/yum.repos.d, either by cfengine or from http://grid-deployment.web.cern.ch/grid-deployment/glite/repos/3.2/glite-BDII.repo * Then run yum-with-glite install glite-BDII * Check /opt/cscs/siteinfo/site-info.def, you need to have: <verbatim>BDII_REGIONS="SE CE" BDII_CE_URL="ldap://ce01.lcg.cscs.ch:2170/mds-vo-name=resource,o=grid" BDII_SE_URL="ldap://storage01.lcg.cscs.ch:2170/mds-vo-name=resource,o=grid" </verbatim> * Run the Yaim conf tool: /opt/glite/yaim/bin/yaim -c -s /opt/cscs/siteinfo/site-info.def -n BDII_site * wget/configure/make/install LBCD, from http://archives.eyrie.org/software/system/lbcd-3.3.0.tar.gz * Check iptables * service lbcd start # that's it, it should appear in the DNS list, IFF DT has included it in the master LBCD node ---+++ [PP] Top BDII Make sure that you have run cfengine and that the following files are installed in your system: * /etc/glite/glite-info-update-endpoints.conf: it specifies which extra sites must be queried (in our case, preproduction bdii). Should look like this:<verbatim>PPCSCS-LCG2 ldap://ppbdii01.lcg.cscs.ch:2170/mds-vo-name=ppcscs-lcg2,o=grid</verbatim> * /opt/cscs/etc/glite-info-update-extra-endpoints: tells bdii which file has the configuration for extra sites. Should look like this:<verbatim>[configuration] EGI = True OSG = True manual = True manual_file = /opt/cscs/etc/glite-info-update-extra-endpoints output_file = /opt/glite/etc/gip/top-urls.conf cache_dir = /var/cache/glite/glite-info-update-endpoints</verbatim> ---++ ui64 You need to do <verbatim>yum groupinstall glite-UI /opt/glite/yaim/bin/yaim -c -s /misc/siteinfo/site-info.def -n UI</verbatim> ---+++ nagios * <verbatim>wget http://www.sysadmin.hep.ac.uk/rpms/egee-SA1/centos5/x86_64/sa1-release-2-1.el5.noarch.rpm</verbatim> * =rpm -ihv sa1-release-2-1.el5.noarch.rpm= * =yum install httpd= * =yum install libyaml.i386= * =yum install egee-NAGIOS lcg-CA= ---+++ References -- Main.PeterOettl - 2010-03-01
This topic: LCGTier2
>
WebHome
>
ServiceInformation
>
ServiceConfiguration
Topic revision: r36 - 2011-05-09 - MiguelGila
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback