Tags:
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+ Swiss Grid Operations Meeting on 2013-10-31 * *Date and time*: First Thursday of the month, at 14:00 * *Place*: Vidyo (room: Swiss_Grid_Operations_Meeting, extension: 9227296) * *External link*: http://vidyoportal.cern.ch/flex.html?roomdirect.html&key=Nrq24qRR4V1u * *Phone gate*: From Switzerland: +41227671400 (portal) + 9227296 (extension) + # (pound sign) (for more details see [[http://information-technology.web.cern.ch/services/fe/howto/users-join-vidyo-meeting-phone][the CERN info page]]) * *IRC chat*: irc:gridchat.cscs.ch:994#lcg (ask pw via email) ---++ Agenda Status * *CSCS* (reports Miguel): * Report of the situation of the scratch filesystem: After this week's events, the system seems to be stable. 1 We have cleaned more than 50 million inodes and deployed new GPFS policies to prevent this from happening again. <verbatim>Filesystem Inodes IUsed IFree IUse% Mounted on /dev/gpfs 150626304 51891413 98734891 35% /gpfs virident1 293937152 1005 Yes No 25172992 ( 9%) 40936480 (14%) virident2 293937152 1006 Yes No 25139200 ( 9%) 40991648 (14%) ssd1 390711360 1007 Yes No 335058944 ( 86%) 6415328 ( 2%) ssd2 390711360 1008 Yes No 335084544 ( 86%) 6396320 ( 2%)</verbatim> 1 Also new =cp_1.sh= (prolog) and =cp_3.sh= (epilog) files have been deployed to make jobs run under =/tmpdir_slurm/$CREAMCENAME/$JOBID= instead of under =$HOME=. <verbatim>#!/bin/bash # cp_1.sh: this is _sourced_ by the CREAM job wrapper export TMPDIR="/gpfs/tmpdir_slurm/${SLURM_SUBMIT_HOST}/${SLURM_JOB_ID}" export MYJOBDIR=${TMPDIR} mkdir -p ${TMPDIR} cd ${TMPDIR} #! /bin/bash # cp_3.sh: this is _sourced_ by the CREAM job wrapper # This gets executed at the end rmdir ${MYJOBDIR}</verbatim> 1 This scratch filesystem runs on very old hardware (~4yr) and needs to be decommissioned ASAP. We are working on finding a good solution in terms of performance/price. * SLURM migration status: 1 EMI-3 CREAM and ARC CEs work fine and have been accepting/running jobs for a while. No major issues found (except for some mismatches with the information system). 1 UMD-2 CREAM and ARC CEs are in downtime and will be migrated to EMI-3 next week. 1 WNs: 9 nodes remain on Torque/pbs (UMD-2). By end of next week all will be migrated to SLURM (EMI-3). 1 Accounting (APEL). Old UMD-2 APEL shutdown, new one fetching data from EMI-3 CREAM-CEs. Still waiting for APEL people (John Gordon) to give us the green light to publish to the new APEL system. Accounting from SLURM in October may be lost due to official APEL migration process not clear (not working!). ARC to publish directly without passing thru the APEL server at CSCS. 1 BDII: still running UMD-2. Need to be upgraded to EMI-3 to solve issues with glue2 publication. Will be done ASAP. * CVMFS upgraded to 2.1.15 * dCache migration: * Testing 2.6 in pre-production - So far it seems to be a fairly simple migration with minimal changes * We will also use this opertunity to upgrade to Postgres 9.3 * Working on new monitoring system: http://ganglia.lcg.cscs.ch/ganglia3/ (views for GPFS/dCache IO stats) * *PSI* (reports Fabio): * Collected HW offers to request funds for the next 2 years. Basically we will need more: * WNs * a new NetApp * a couple of [[http://www.oracle.com/us/products/servers-storage/servers/x86/x4-2l/overview/index.html][2u Oracle NAS]] to relocate onto new HW the current replicated and shared =/shome= ( today we use 2 * SUN Thumper ) * We might decide to build the =/shome= as a bunch of 10*2T disks stored inside the future new NetApp and [[http://docs.oracle.com/cd/E19965-01/E22493/z4000c271005163.html#scrolltoc][SAS]] connected, [[http://docs.oracle.com/cd/E19253-01/820-1931/agkar/index.html][MPxIO managed]] to an Active/Passive couple of [[http://www.oracle.com/us/products/servers-storage/servers/x86/x4-2/overview/index.html][1u Oracle]]; the Active node will format the 10*2TB disks as a ZFS, and it will offer the resulting filesystem by NFSv3; if the Active node crashes then I simply have to import the ZFS into the Passive node, and switch on NFSv3. This conf is cheaper than having a couple of [[http://www.oracle.com/us/products/servers-storage/servers/x86/x4-2l/overview/index.html][2u NAS replicated]] ( less disks, cheaper nodes ). * maybe a 32 * 10Gbit/s ports [[http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps10110/data_sheet_c78-507093.html][Cisco N7K – Extender]]; not urgent but at a certain point 10Gbit/s Ethernet will become the default. * Updated the CMS Frontier Squid to the [[http://frontier.cern.ch/dist/rpms/RPMS/x86_64/frontier-squid-2.7.STABLE9-16.1.x86_64.rpm][latest version]] * Scheduled downtime on 8th Nov to upgrade both to dCache 2.6, and to UMD3 our [[http://repository.egi.eu/sw/production/umd/3/sl5/x86_64/updates/emi-ui-3.0.2-1.el5.x86_64.rpm][SL5 UIs]]; not sure about the SL5 WNs because of the [[http://repository.egi.eu/mirrors/EMI/tarball/production/sl5/emi3-emi-wn/][lack of the new tarball]] * dCache 2.6 upgrade * This can be also useful for CSCS if confirmed: with dCache 2.6 you can avoid to have a separate gPlazma1 configuration for the Xrootd door and simply use the common gPlazma2 cell. I wrote my conf [[https://twiki.cern.ch/twiki/bin/view/Main/DcacheXrootd#Xrootd_gPlazma2_and_dcache_2_6_1][here]]. I asked for a confirmation. * Since Spring we're trying to partition the T3 users in primary groups rapresenting the disjoint T3 subgroups: this allows us to easily compute the =/pnfs= group space usage by running a simple query vs Chimera DB; profiting from a such partitioning one can also produce Ganglia plots and similar; I hope to introduce this change on 8th Nov. In the future CSCS could consider to adopt the same partitioning, being conscious that this raises both complexity and security. * The Xrootd dCache pool plugin was incompatible with dCache 2.6.11, I asked to produce a [[http://linuxsoft.cern.ch/wlcg/sl6/x86_64/dcache26-plugin-xrootd-monitor-5.0.0-2.noarch.rpm][compatible one]] * *UNIBE* (reports Gianfranco): * ce01.lhep cluster * Still stable after moving Lustre to the Infiniband layer. Still one occurrance of Thumper NIC/PCI lockup a couple of weeks back. Lustre now under heavier load, working well so far. Will try to commission for analysis soon. * Latest version of CVMFS is said to cure the cache full issue we suffer from. Will updgrade soon * NFS v4 new user mapping defaults broke file ownership on a NFS share on the ce01.lhep cluster. This prevented the SW validation jobs from writing the SW tags to the shared area. The fix consists in explicitely declare the <span style="color: #7a4707; font-size: 10px; white-space: pre;">Domain</span> in <span style="color: #7a4707; font-size: 10px; white-space: pre;">/etc/idmapd.conf </span> * c202.lhep cluster * All ROCKS images (MDS, OSS, WN) ready, mass installation under way * ARC not yet installed. Might hope tu run test jobs tomorrow * *UNIGE* (reports Szymon): * Xxx * *UZH* (reports Sergio): * Xxx * *Switch* (reports Alessandro): * Xxx Other topics * Topic1 * Topic2 Next meeting date: AOB ---++ Attendants * CSCS: Miguel Gila, George Brown * CMS: Fabio, Daniel * ATLAS: * LHCb: Roland Bernet * EGI: ---++ Action items * Item1
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r15
<
r14
<
r13
<
r12
<
r11
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r15 - 2013-10-31
-
FabioMartinelli
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback