Tags:
meeting
1
SwissGridOperationsMeeting
1
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+ Swiss WLCG Operations Meeting on 2012-09-06 * *Date and time*: First Thursday of the month, at 14:00 * *Place*: EVO, password: chipp * *External link / EVO*: http://evo.caltech.edu/evoNext/koala.jnlp?meeting=vsvivIese2IsIvaiaMItas ---++ Agenda ---+++ CSCS Status (Reports Miguel) 1. Storage Element: * dCache extension: Storage extension chosen and in the process of purchasing it. Should arrive in November (AFAIK, Miguel). * dCache upgrade to 1.9.12. The process has started on preproduction, but since it's a complicated matter, we are being extra careful to assure no data is lost in the process. 1. WN: * WN: 12 extra Sandy Bridge nodes (384 job slots) are physically installed, but we have had no time to configure them. Will do ASAP. * WN software: As of today, wn[01-46] are gLite 3.2 WN and wn[47-59] are UMD 1 WN. During next maintenance we plan to upgrade all nodes to UMD 1. 1. Network: * Ethernet Network replacement: The Cisco switches have arrived and the Network Administrator at CSCS is preparing the infrastructure and configuration required for them. 1. Problems: * Yesterday's problem with Argus: Some error on Argus caused all CREAM-CEs to stop accepting jobs. A mail to argus-support has been sent, waiting for reply. * We have a small problem with our KVM solution: Convirture is unable to work due to database corruption, so we have to shut down all KMV VMs during one maintenance to re-add them. Thinking about a permanent solution, either commercial or open source, but rock-solid. * Sun HW is failing at an alarming rate. This week the old MDT connected to =puppet= failed 3 disks on a RAID-6. We were able to recover with some filesystem corruption, but if this HW is failing, other hardware of the same batch might start failing too (critical ones: dCache head nodes). Unclear yet whether this has affected our ability to install machines (kickstart files). * We have detected a problem with NGI-DE/CH TopBDII: at times it is very slow answering queries and, therefore, the status of CSCS is degraded on NGI checks. We have seen DESY using their own internal TopBDII, so we are thinking about doing the same internally for CSCS. *NOT* to all the NGI_CH cloud. At the moment the BDII at CERN is being used as a primary BDII, but it's a temporary solution. =lcg-bdii.cern.ch:2170,bdii-fzk.gridka.de:2170= 1. AOB * Atlasvobox: We have seen that it is possible to use the Squid provided by Scientific Linux (currently used on CVMFS) to host, also, the atlasvobox. The process seems simple, but some work needs to be done and a lot of testing is important. We are working on it. * Fabio requested access to our preproduction cluster to test some changes on the CREAM-CE. Please, submit a ticket, so we can get to work on it ASAP. ---+++ PSI Status (Reports Fabio) * Designing a *Fast, HA, SAN 10TB /home* based on *GPFS* with: * Two servers, like 2u HP Proliant + [[http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.v3r50-3.gpfs300.doc%2Fbl1ins_nodqtieb.htm][GPFS 3.5 - Node quorum with tiebreaker disks]] * Well tested dual card Qlogic FC 8Gbit/s * 2u 24-bay IBM [[http://www-03.ibm.com/systems/storage/disk/ds3500/specifications.html][DS3524]] *or* 2u 24-bay [[http://www.sgi.com/products/storage/raid/5000.html][SGI IS5000]]. * 6 Gbps SAS 2.5" 900GB 10k disks, but I'd want to put the GPFS metadata on SSD or 15k in RAID1 ( opinions? ) * Tot Cost with 10k disks is: ~50k CHF. * *BTW* it's still missing features like snapshots but also *GlusterFS*, now called [[https://access.redhat.com/knowledge/docs/Red_Hat_Storage/][Red Hat Storage 2.0]], can implement a *cheap HA /home with 2 NAS*. * Because of several WN panics, we introduced SGE queues memory limits, default is 3GB per Job, users can ask up to 6GB. * We introduced a recent/all hierarchical [[https://wiki.chipp.ch/twiki/bin/view/CmsTier3/CMSTier3Log27][SGE accounting file]] to speed up the =qacct= response times. * Testing dCache 1.9.12 inside our VMWare testbed. ---+++ UNIBE Status (Reports Gianfranco): * Xxx ---+++ UNIGE Status (Rreports Szymon): * Xxx ---++Other topics * Topic1 * Topic2 ---++Next meeting date ---++AOB ---++ Attendants * CSCS: Miguel * CMS: Fabio, Daniel * ATLAS: * LHCb: * EGI: ---++ Action items * Item1
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r7
<
r6
<
r5
<
r4
<
r3
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r7 - 2012-09-09
-
FabioMartinelli
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback