Tags:
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+ Service Card for BDII %TOC% ---++ Definition Our site BDII host is *bdii.lcg.cscs.ch* which is a DNS alias for sbdii[01-03]. The three site-bdiis are set up in High Availability mode, using DNS load balancing. If the *lbcd* daemon is running, it publishes the machine's load to our DNS server and it will redirect the queries to the least busy one. If the lbcd service is stopped, the DNS server will not send queries to it. Using this mechanism, we can easily do rolling upgrades or installs. As of today, site BDII hosts are configured as follows: * sbdii[01,02,03]: * Scientific Linux 6.4 x86_64 * EMI-3 site bdii (bdii-5.2.22-1) There is no top BDII deployed at CSCS-LCG2 at the moment. ---++ Operations Normally this service does not require any operation, but when the BDII service fails for some reason it's important to disable it by stopping the lbcd service on that machine. This makes the =bdii.lcg.cscs.ch= not to go to this particular machine. <verbatim>$ service lbcd stop</verbatim> For a restart of the =bdii= service itself either *grid-service2 restart* or <strong>service bdii restart </strong>can be used. ---+++ Client tools ---+++ Testing The best way to test the service is by using ldapsearch. Here are some examples of usage: <verbatim>$ ldapsearch -x -LLL -h ppcream01 -p 2170 -b "o=grid" # to test BDII of ppcream01 $ ldapsearch -x -LLL -h bdii.lcg.cscs.ch -p 2170 -b "o=grid" </verbatim> The result of these queries should be a long document in LDIF format. A useful tool to check published values is the _GLUE validator_ (already installed on all =sbdii*= machines, otherwise available on official EMI-3 Updates repository) that can be run against a site BDII or even a BDII resource (e.g. a CE) to check the conformity of published data to the GLUE (version 2.0 by default) schema: <verbatim> [sbdii03] # glue-validator -H bdii.lcg.cscs.ch -p 2170 -b o=glue -k CRITICAL - errors 38, warnings 8, info 258 | errors=38;warnings=8;info=258 </verbatim> in this case there are several critical errors that can be further investigated increasing the verbosity: <verbatim> [sbdii03] # glue-validator -H sbdii01.lcg.cscs.ch -p 2170 -b -o=glue -k -v2 CRITICAL - errors 38, warnings 8, info 258 | errors=38;warnings=8;info=258 Summary per type of error, warning and info message: E002 - Obsolete entry (GLUE2EntityValidity): 38 I007 - Unknown WLCG Name (GLUE2EntityOtherInfo): 2 I032 - Default value published (GLUE2ComputingShareMaxTotalJobs): 36 I033 - Default value published (GLUE2ComputingShareMaxRunningJobs): 36 I034 - Default value published (GLUE2ComputingShareMaxWaitingJobs): 36 I043 - Memory higher than 100,000 MB (GLUE2ComputingShareMaxMainMemory): 36 I045 - Memory higher than 100,000 MB (GLUE2ComputingShareMaxVirtualMemory): 36 I091 - Total share capacity size less than 1000 GB (GLUE2StorageShareCapacityTot alSize): 4 I096 - Default value published (GLUE2ComputingShareMaxMainMemory): 36 I097 - Default value published (GLUE2ComputingShareMaxVirtualMemory): 36 W023 - Incoherent attribute range (GLUE2ComputingShareMaxUserRunningJobs): 6 W025 - Incoherent number of total jobs (GLUE2ComputingShareTotalJobs): 2 </verbatim> Using =glue-validator= to check data published by a single resource: <verbatim> [sbdii03] # glue-validator -H cream01.lcg.cscs.ch -p 2170 -b -o=glue -k OK - errors 0, warnings 0, info 84 | errors=0;warnings=0;info=84 </verbatim> ---+++ Failover check ---+++ Checking logs ---++ Set up ---+++ Dependencies (other services, mount points, ...) This service does not depend on any other system, just on its own. What it does need, though, is access to the BDII port (=2170 in EMI/gLite, 2135 in ARC) of the other machines defined in the *siteinfo* . ---+++ Redundancy notes As stated before, there are 3 machines providing the service on a BDII DNS load balancing. ---+++ Installation ---++++ Site BDII (EMI/UMD release) After you bring it the VM up, run cfengine once. Then try: <verbatim> yum update --enablerepo=epel yum install emi-bdii-site --enablerepo=cscs,epel cfagent -q /opt/glite/yaim/bin/yaim -c -s /opt/cscs/siteinfo/site-info.def -n BDII_site cfagent -q grid-service restart ls -l lbcd-3.3.0.tar.gz # you should have got this 75K tarball via cfengine # OR wget http://archives.eyrie.org/software/system/lbcd-3.3.0.tar.gz tar -zxvf lbcd-3.3.0.tar.gz cd lbcd-3.3.0 ./configure && make && make install # By now, you must check that bdii is in chkconfig, is UP and CORRECT!!! chkconfig ntpd off # this should not be needed normally cfagent -qv ; reboot # wait for the machine to come back and bring it in production service iptables stop service lbcd start </verbatim> ---+++ Upgrade Simply stop the services (including =lbcd=), update the packages, run YAIM and start the services again. At least two instance of =lbcd= must run on two different servers at any time in order to enable DNS load balancing. To perform a _rolling_ update stop =lbcd= on one of the 3 =sbdii[01-03]=, update the node, test it and start =lbcd= again; repeat on the other two nodes one node at a time. ---++ Monitoring The best way to see whether the service works okay is by running the ldapsearch command stated before, but there is also another important thing to do: check the status of [[http://gstat.egi.eu/gstat/site/CSCS-LCG2/treeview/bdii_site/bdii.lcg.cscs.ch/][GSTAT]]. ---+++ Nagios A few specific checks have been implemented to check the status of =slapd=, =bdii-update=, =lbcd=. ---+++ Ganglia Usual monitoring deployed, no specific checks implemented. ---+++ Self Sanity / revival? ---+++ Other? ---++ Manuals * [[https://twiki.cern.ch/twiki/bin/view/EMI/GenericInstallationConfigurationEMI1#Installations][EMI Generic installation configuration]] ---++ Issues Information about issues found with this service, and how to deal with them. ---+++ BDII dies without notification Sometimes, when there are a lot of entries to be handled by the bdii, the ramdisk used by it fills up and the service dies without notifying it. Usually, if you have selected to use the RAMDISK for performance, you need to create a bigger ramdisk than the default: Originally, =/etc/init.d/bdii= contains something like this: <verbatim> # Create RAM Disk if [ "${BDII_RAM_DISK}" = "yes" ]; then mount -t tmpfs -o size=1500M,mode=0744 tmpfs ${SLAPD_DB_DIR} fi</verbatim> This needs to be changed to something like: <verbatim> # Create RAM Disk if [ "${BDII_RAM_DISK}" = "yes" ]; then mount -t tmpfs -o size=3000M,mode=0744 tmpfs ${SLAPD_DB_DIR} fi</verbatim> Also, if we talk about a TOP BDII, we need to add these settings to =/etc/bdii/DB_CONFIG_top= <verbatim> [...] # test values set_cachesize 1 0 1 set_flags DB_CDB_ALLDB set_flags DB_LOG_AUTOREMOVE #set_flags DB_LOG_INMEMORY #set_flags DB_TXN_NOSYNC set_lk_max_locks 10000 set_tas_spins 100</verbatim> tmpfs /dev/shm tmpfs defaults,size=3G 0 0 ---+++ Issue2 ---+++ Issue3 ---++ References * [[https://twiki.cern.ch/twiki/bin/view/EGEE/Glite-BDII][Service Reference Card]] * [[https://webrt.cscs.ch/Ticket/Display.html?id=7962][Ticket for setting up LBCD]] * [[https://webrt.cscs.ch/Ticket/Display.html?id=7187][how to make hostname bdii.lcg.cscs.ch]] * [[http://gridinfo.web.cern.ch/][Grid Information System]] * [[http://gridinfo.web.cern.ch/glue/glue-validator-guide][GLUE validator guide]] * [[https://twiki.cern.ch/twiki/bin/view/EGEE/GLUEValidatorErrorCodes][GLUE validator error codes]] -- Main.FotisGeorgatos - 2010-09-23
ServiceCardForm
Service name
BDII-site
Machines this service is installed in
sbdii[01-03]
Is Grid service
Yes
Depends on the following services
-
Expert
Gianni Ricciardi
CM
CfEngine
Provisioning
none
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r20
<
r19
<
r18
<
r17
<
r16
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r20 - 2014-11-25
-
MiguelGila
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback