Tags:
view all tags
---++!! Service Configuration %TOC% ---++ Service Nodes ---+++ General Instructions * install the OS: LCGTier2.XenSampleImageReplication * check kernel version, should be kernel-xen ≥ 2.6.18-194.17.1 * =yum upgrade kernel-xen= * create cfengine key in =cfengine:/srv/cfengine/ppkeys= * =cfkey -f root-IPADDRESS= * copy the keys to =nfs:/export/kickstarts/private/cfengine/= * =scp /srv/cfengine/ppkeys/root-IPADRESS* nfs:/export/kickstarts/private/cfengine/= * copy =newmachine= script from xen03 and run it * %X% NOTE: This step takes a long time, wait until it's done and the machine is automatically rebooted. * =scp xen03:/nfs/kickstarts/newmachine /root/ && /root/newmachine= * copy ssh keys to cfengine server: * =cd /srv/cfengine/private/ssh/= * =mkdir HOSTNAME= * =ls se30|xargs -n1 --replace scp HOSTNAME:/etc/ssh/{} HOSTNAME/= * check in ssh key to svn * =asvn add HOSTNAME= * =asvn commit HOSTNAME --username poettl -m'New SSH keys for host HOSTNAME'= * create new known_hosts file * =/srv/cfengine/scripts/new_known_hosts= * run =/opt/cscs/sbin/install-glite= to configure gLite middleware (or do it by hand step by step...) * =cfagent -qv= * reboot ---+ Service Specific Notes ---++ Pre-production WNs Once all the previous steps have been done, Lustre has to be loaded to be able to sucessfuly run the last part of =/opt/cscs/sbin/install-glite=. In order to do that, you must make sure that the VM guest has two NICs, one public IP and the 10.10 IP. In the XEN host: <verbatim>Apr 06 16:00 [root@xen17:xen]# cat /etc/xen/ppwn02 name = "ppwn02" vcpus = 2 memory = 4096 disk = ['phy:/dev/vg_root/ppwn02_root,xvda,w'] #vif = ['mac=00:16:3E:64:00:50,bridge=xenbr0','mac=00:16:10:64:00:50,bridge=xenbr2'] vif = ['mac=00:16:3E:67:00:02,bridge=xenbr1','mac=00:16:10:67:00:02,bridge=xenbr2'] bootloader = "/usr/bin/pygrub" on_reboot = 'restart' on_crash = 'destroy' </verbatim> In the XEN guest, prepare the network: <verbatim>Apr 06 16:02 [root@ppwn02:~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1 # Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ DEVICE=eth1 BOOTPROTO=static IPADDR=10.10.64.202 NETMASK=255.255.252.0 IPV6INIT=no IPV6_AUTOCONF=no ONBOOT=yes TYPE=Ethernet Apr 06 16:02 [root@ppwn02:~]# ifup eth1 Apr 06 16:02 [root@ppwn02:~]# ifconfig eth1 eth1 Link encap:Ethernet HWaddr 00:16:10:67:00:02 inet addr:10.10.64.202 Bcast:10.10.67.255 Mask:255.255.252.0 inet6 addr: fe80::216:10ff:fe67:2/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:18531 errors:0 dropped:0 overruns:0 frame:0 TX packets:1134 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:21221791 (20.2 MiB) TX bytes:236364 (230.8 KiB) Apr 06 16:04 [root@ppwn02:~]# ping -c 4 10.10.64.201 PING 10.10.64.201 (10.10.64.201) 56(84) bytes of data. 64 bytes from 10.10.64.201: icmp_seq=1 ttl=64 time=0.112 ms 64 bytes from 10.10.64.201: icmp_seq=2 ttl=64 time=0.082 ms 64 bytes from 10.10.64.201: icmp_seq=3 ttl=64 time=0.081 ms 64 bytes from 10.10.64.201: icmp_seq=4 ttl=64 time=0.088 ms --- 10.10.64.201 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 2997ms rtt min/avg/max/mdev = 0.081/0.090/0.112/0.017 ms </verbatim> Now you need to install lustre RPMs for the running kernel and start it up. In the XEN Guest: <verbatim>Apr 06 16:04 [root@ppwn02:~]# mount xen11:/nfs /media Apr 06 16:06 [root@ppwn02:~]# uname -r 2.6.18-238.5.1.el5xen Apr 06 16:06 [root@ppwn02:~]# rpm -ivh /media/rpms/xen_guest_lustre_1.8.4_238/lustre-* Preparing... ########################################### [100%] package lustre-modules-1.8.4-2.6.18_238.5.1.el5xen_201104061032.x86_64 is already installed package lustre-1.8.4-2.6.18_238.5.1.el5xen_201104061032.x86_64 is already installed Apr 06 16:06 [root@ppwn02:~]# mkdir -p /lustre/scratch Apr 06 16:07 [root@ppwn02:~]# service lustre start Apr 06 16:07 [root@ppwn02:~]# lfs df -h UUID bytes Used Available Use% Mounted on scratch-MDT0000_UUID 1.4T 1.8G 1.3T 0% /lustre/scratch[MDT:0] scratch-OST0000_UUID 3.6T 174.3G 3.2T 4% /lustre/scratch[OST:0] scratch-OST0001_UUID 3.6T 175.4G 3.2T 4% /lustre/scratch[OST:1] scratch-OST0002_UUID 3.6T 181.0G 3.2T 4% /lustre/scratch[OST:2] [...]</verbatim> At this point you can run the last part of the archive and it will (hopefully) work!: <verbatim>Apr 06 16:07 [root@ppwn02:~]# umount /media Apr 06 16:07 [root@ppwn02:~]# /opt/glite/yaim/bin/yaim -c -s /opt/cscs/siteinfo/site-info.def -n WN -n TORQUE_client </verbatim> ---++ Pre-production CREAM-CEs In order for CREAM-CE to work well lustre has to be mounted. So the same steps executed before have to be followed. ---+++ Problem when installing tomcat rpms *Problem description*: When running =rpm -qa | grep tomcat5= you don't see the tomcat5 rpm installed. <verbatim>Apr 12 10:34 [root@ppcream02:~]# rpm -qa |grep tomcat5 tomcat5-server-lib-5.5.23-0jpp.17.el5_6.x86_64 tomcat5-jasper-5.5.23-0jpp.17.el5_6.x86_64 tomcat5-jsp-2.0-api-5.5.23-0jpp.17.el5_6.x86_64 tomcat5-common-lib-5.5.23-0jpp.17.el5_6.x86_64 tomcat5-servlet-2.4-api-5.5.23-0jpp.17.el5_6.x86_64</verbatim> And when you try to install it you get some errors: <verbatim>Loaded plugins: kernel-module Excluding Packages in global exclude list Finished Setting up Install Process Resolving Dependencies --> Running transaction check ---> Package tomcat5.x86_64 0:5.5.23-0jpp.17.el5_6 set to be updated --> Finished Dependency Resolution Beginning Kernel Module Plugin Finished Kernel Module Plugin Dependencies Resolved ================================================================================================================================================================================== Package Arch Version Repository Size ================================================================================================================================================================================== Installing: tomcat5 x86_64 5.5.23-0jpp.17.el5_6 sl-security 362 k Transaction Summary ================================================================================================================================================================================== Install 1 Package(s) Update 0 Package(s) Remove 0 Package(s) Total size: 362 k Is this ok [y/N]: y Downloading Packages: Running rpm_check_debug Running Transaction Test Finished Transaction Test Transaction Test Succeeded Running Transaction Installing : tomcat5 1/1 Error unpacking rpm package tomcat5-5.5.23-0jpp.17.el5_6.x86_64 warning: /etc/tomcat5/server.xml created as /etc/tomcat5/server.xml.rpmnew warning: /etc/tomcat5/tomcat5.conf created as /etc/tomcat5/tomcat5.conf.rpmnew error: unpacking of archive failed on file /usr/share/tomcat5/webapps: cpio: rename</verbatim> And/or you have broken links in =/usr/share/tomcat5= and/or =/var/lib/tomcat5= *Solution*: You have to completely erase all files within =/usr/share/tomcat5= and =/var/lib/tomcat5= and run yum and yaim again:<verbatim>Apr 12 10:37 [root@ppcream02:~]# yum install tomcat5-5.5.23-0jpp.17.el5_6.x86_64 # Replace the tomcat5 version with the relevant one!!!! Apr 12 10:37 [root@ppcream02:~]# rpm -qa |grep tomcat tomcat5-server-lib-5.5.23-0jpp.17.el5_6.x86_64 tomcat5-jasper-5.5.23-0jpp.17.el5_6.x86_64 tomcat5-jsp-2.0-api-5.5.23-0jpp.17.el5_6.x86_64 tomcat5-common-lib-5.5.23-0jpp.17.el5_6.x86_64 tomcat5-5.5.23-0jpp.17.el5_6.x86_64 tomcat5-servlet-2.4-api-5.5.23-0jpp.17.el5_6.x86_64 Apr 12 10:38 [root@ppcream02:~]# /opt/glite/yaim/bin/yaim -c -s /opt/cscs/siteinfo/site-info.def -n creamCE -n TORQUE_utils </verbatim> ---+++ Problem when submitting jobs *Problem description*: When submitting a job from the UI you get the following message <verbatim>Apr 12 10:33 [pablof@ui64:test_ppcream01]$ glite-ce-job-submit -a -r ppcream02/cream-pbs-atlas $PWD/jobs/hostname.jdl 2011-04-12 10:46:40,635 FATAL - Received NULL fault; the error is due to another cause: FaultString=[connection error] - FaultCode=[SOAP-ENV:Client] - FaultSubCode=[SOAP-ENV:Client] - FaultDetail=[Connection refused]</verbatim> And then you look into =/var/lib/tomcat5/webapps/= and you only see this <verbatim>Apr 12 10:46 [root@ppcream02:~]# ls -lh /var/lib/tomcat5/webapps/ total 4.4M -rw-r--r-- 1 root root 4.4M Apr 12 10:45 ce-cream.war </verbatim> *Solution*: You need to copy the directory =/var/lib/tomcat5/webapps/*= from another running instance of the cream-ce <verbatim> Apr 12 10:48 [root@ppcream02:~]# scp -r ppcream01:/usr/share/tomcat5/webapps/ce-crea* /usr/share/tomcat5/webapps/ pr 12 10:49 [root@ppcream02:~]# ls -lh /var/lib/tomcat5/webapps/ total 4.4M drwxr-xr-x 5 root root 4.0K Apr 12 10:49 ce-cream -rw-r--r-- 1 root root 4.4M Apr 12 10:49 ce-cream.war Apr 12 10:49 [root@ppcream02:~]# service gLite restart STOPPING SERVICES *** glite-ce-blahparser: Shutting down BNotifier: [FAILED] Shutting down BUpdaterPBS: [FAILED] *** glite-lb-locallogger: Stopping glite-lb-logd ... done Stopping glite-lb-interlogd ... done *** tomcat5: Stopping tomcat5: [ OK ] STARTING SERVICES *** tomcat5: Starting tomcat5: [ OK ] *** glite-lb-locallogger: Starting glite-lb-logd ...This is LocalLogger, part of Workload Management System in EU DataGrid & EGEE. [20453] Initializing... [20453] Parse messages for correctness... [yes] [20453] Send messages also to inter-logger... [yes] [20453] Messages will be stored with the filename prefix "/var/glite/log/dglogd.log". [20453] Server running with certificate: /DC=com/DC=quovadisglobal/DC=grid/DC=switch/DC=hosts/C=CH/ST=Zuerich/L=Zuerich/O=ETH Zuerich/CN=ppcream02.lcg.cscs.ch [20453] Listening on port 9002 [20453] Running as daemon... [yes] done Starting glite-lb-interlogd ... done *** glite-ce-blahparser: Starting BNotifier: [ OK ] Starting BUpdaterPBS: [ OK ] </verbatim> ---++ lrms ---+++ Compile Torque 2.5.x with HA and create RPM's * download newest version of [[http://www.clusterresources.com/downloads/torque/][torque]] * =./configure --prefix=/usr --with-server-home=/var/spool/pbs --with-default-server=lrms02.lcg.cscs.ch,lrms01.lcg.cscs.ch --enable-high-availability= * =make rpm= * copy rpms to repo * =scp /usr/src/redhat/RPMS/x86_64/torque{,-server,-mom,-client}-2.5.2-1cri.x86_64.rpm nfs01:/export/packages/repo= * on nfs01: =cd /export/packages/repo; createrepo .= ---++ lcg-CE After the reboot the gridmap files have to be created. Either wait for the cron job running or run: * =/opt/edg/sbin/edg-mkgridmap --output=/etc/grid-security/dn-grid-mapfile --safe= * =cp /etc/grid-security/dn-grid-mapfile /etc/grid-security/grid-mapfile.tmp; cat /etc/grid-security/voms-grid-mapfile >> /etc/grid-security/grid-mapfile.tmp; mv /etc/grid-security/grid-mapfile.tmp /etc/grid-security/grid-mapfile= ---++ CREAM-CE ---+++ References * [[http://glite.web.cern.ch/glite/packages/R3.2/sl5_x86_64/deployment/glite-CREAM/glite-CREAM.asp][glite-CREAM]] * [[http://igrelease.forge.cnaf.infn.it/doku.php?id=doc:guides:devel:install-cream32][Installation Guide]] * [[https://twiki.cern.ch/twiki/bin/view/EGEE/GLiteCREAMCE][Service Reference Card]] * [[https://twiki.cern.ch/twiki/bin/view/LCG/Site-info_configuration_variables#cream_CE][YAIM Configuration Variables]] * [[https://twiki.cern.ch/twiki/bin/view/LCG/YaimGuide400#Notes_on_configuring_CREAM][YAIM Configuration Notes]] ---++ Storage Element There are some steps needed, but they're described at LCGTier2.DcacheOperations ---++ Worker Nodes ---++ Site BDII For a detailed log of the last installation refer to: https://webrt.cscs.ch/Ticket/Display.html?id=7962 , In short: * You need to ensure you have /etc/yum.repos.d, either by cfengine or from http://grid-deployment.web.cern.ch/grid-deployment/glite/repos/3.2/glite-BDII.repo * Then run yum-with-glite install glite-BDII * Check /opt/cscs/siteinfo/site-info.def, you need to have: <verbatim>BDII_REGIONS="SE CE" BDII_CE_URL="ldap://ce01.lcg.cscs.ch:2170/mds-vo-name=resource,o=grid" BDII_SE_URL="ldap://storage01.lcg.cscs.ch:2170/mds-vo-name=resource,o=grid" </verbatim> * Run the Yaim conf tool: /opt/glite/yaim/bin/yaim -c -s /opt/cscs/siteinfo/site-info.def -n BDII_site * wget/configure/make/install LBCD, from http://archives.eyrie.org/software/system/lbcd-3.3.0.tar.gz * Check iptables * service lbcd start # that's it, it should appear in the DNS list, IFF DT has included it in the master LBCD node ---++ ui64 You need to do <verbatim>yum groupinstall glite-UI /opt/glite/yaim/bin/yaim -c -s /misc/siteinfo/site-info.def -n UI</verbatim> ---+++ nagios * <verbatim>wget http://www.sysadmin.hep.ac.uk/rpms/egee-SA1/centos5/x86_64/sa1-release-2-1.el5.noarch.rpm</verbatim> * =rpm -ihv sa1-release-2-1.el5.noarch.rpm= * =yum install httpd= * =yum install libyaml.i386= * =yum install egee-NAGIOS lcg-CA= ---+++ References -- Main.PeterOettl - 2010-03-01
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r38
<
r37
<
r36
<
r35
<
r34
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r35 - 2011-04-12
-
MiguelGila
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback