Tags:
meeting1Add my vote for this tag SwissGridOperationsMeeting1Add my vote for this tag create new tag
view all tags

Swiss Grid Operations Meeting on 2015-04-02

Site status

CSCS

  • dCache upgrade to 2.10 planned before Summer: some site-specific tools do not work out of the box and need to be modified (Fabio's experience will be very handy in this case);
  • new WNs installed and put into production during last maintenance and all WNs re-installed using Foreman:
    • when they all run full throttle we observed some load issues on WNs and GPFS servers
    • situation mitigated implementing a swap policy and improving the SLURM NodeHealthCheck
    • currently the situation looks quite stable and load under control
    • also considering to increase a GPFS timeout parameter server-side during next maintenance
  • GGUS ticket about ARC CEs still open: looking like a trivial issue at the beginning but getting more complicated than expected. User mapping involved and specific ARC scripts developed in house to make ARC work on SLURM (on-going investigation) *To be planned:
    • next Emergency Training for Phoenix: a doodle to be set for this. June/July a feasible period for everyone (F2F meeting most likely to happen on August)?
    • 6 IBM Server X3650 to be shipped to UNIBE
    • Help required about Dino and Dario's certificates: many thanks to Gianfranco for dealing with this issue

PSI

  • Mainly Upgraded dCache
    • Upgraded PG from 9.3 to the latest 9.4
    • Upgraded dCache from 2.6 to 2.10
    • Updated the T3 monitoring systems to gather info by the new dCache 2.10 JSON interface ; regrettably not all the info are published, for instance there are not the movers per POOL
    • Updated the dCache xrootd monitor SW from 5.0.7 to dcache26-plugin-xrootd-monitor-5.0.8-0.noarch
    • Added a new not Internet exposed, RW enabled, GSI auth/authz protected, dCache Xrootd door to the T3 to circumnavigate the defective CMSSW/dcap integration ; the dcap, gsidcap and this new xrootd traffic use the same dCache I/O queue regular ; I believe at certain point PSI will recommend the xrootd doors as its primary file access doors, both for LAN access and WAN access ; running the dcap, gsidcap, gsiftp, SRM, xrootd, NFS doors confuse the T3 users ( which protocol to use, and when ? which client ? )
    • Updated the Puppet recipes to reflect the new dCache 2.10 environment
    • Our dCache 2.10 confs are on https://bitbucket.org/fabio79ch/t3_ch_psi/src
    • I'm updating the Derek's dCache tools, CSCS can avoid to do the same, I'll inform you when I'll have done it.
  • I'm satisfied about the zfsnap+rsync replacement of my inexplicably defective ZFS send/receive ; btw, I notice some big site putting ZFS on Linux in production, it's a Linux filesystem to keep in mind.
  • Partially and remotely attended :
  • All of us have to update to cvmfs-2.1.20-1.el6.x86_64 ; I monitor its updates by Nagios
    • command[check_cvmfs_updates]=/opt/nagios/check_yum -vvv --disablerepo=* --enablerepo=cernvm --warn-on-any-update
  • Discovered the Xrootd Cache Proxy
  • The Pakiti cron took 10' to be installed everywhere and eventually it's very useful but it seems discarding my Scientific Linux 6.0 hosts reports and, even if I erase on the portal these hosts and I update their /etc/redhat-release to Scientific Linux 6.4 (Carbon), it still doesn't report nothing about them and also that these hosts are still Scientific Linux 6.0 ; I'll report this to the Pakiti devs.
Scientific Linux 6.0
  • I've opened a 'PSI T3 Admin' Skype account to better follow the T3 users ; by e-mail it takes ages both to listen their issues and to explain my solutions ; the primary mean to reach me will remain the T3 mailing list to keep a record of their requests but for time consuming tasks I want to talk with them and see their logs, or even their screen.

UNIBE-LHEP

UNIBE-ID

  • Xxx

UNIGE

  • Account review and cleanup of space (12 accounts closed, 71 active right now)
  • Firewall and config changes on the SE to allow WebDAV access
  • A grid UI for the T2K VO “installed locally”
    • using the software from /cvmfs like ATLAS
    • only the directories needed for the VOMS stuff (X509_VOMS_DIR, VOMS_USERCONF and X509_VOMSES) are really locally installed
  • Hardware issues:
    • overheating hardware raid on IBM x3630 M3 (radiator falling off, we are used to that one), quickly repaired
    • badly broken hardware raid on IBM x3630 M4 (corrupted config information written on the disks, RAID config had to be re-created) no data loss but the recovery was very trickyand took one week
    • a virtual disk added to the VM that runs the head node of the SE to have more space for logs
  • Renewed host certificates for all 17 machines that need them
    • more time until a new solution will be needed

NGI_CH

  • Multicore accounting for EGI
  • Pakiti made easy https://pakiti.egi.eu/client.php?site=UNIBE-LHEP (simple cron job on all WNs - requires access to the CAs)
    • Site Security Officer can check their own site: https://pakiti.egi.eu/ .
    • This is very likely to become a requirement in a month or so.
  • Issue with Certificates in CH following SWITCH withdrawal from the service as of 31st Aug 2015
    • EGI catch-all CA not suitable as long term solution
    • Offer from GRNET
    • Maybe check again with NGI-DE
  • EGI Conference 2015: http://conf2015.egi.eu - 18-22 May 2015

Other topics

  • Topic1
  • Topic2
Next meeting date:

A.O.B.

Attendants

  • CSCS: Dario, Dino, Gianni
  • CMS: Fabio Martinelli, Daniel Meister
  • ATLAS:Gianfranco Sciacc, Szymon Gadomski
  • LHCb: Roland Bernet
  • EGI:Gianfranco Sciacca

Action items

  • Item1

Short Presentation about The Xrootd Project and its usage in CMS made by Fabio at PSI

What is the Xrootd Project ?

The Xrootd Project http://xrootd.org offers a readily deployable framework in which to construct large fault-tolerant high performance data access configurations using commodity hardware with a minimum amount of administrative overhead

The Xrootd protocol is open, and indeed there is an implementation of it made by dCache and extensively used at PSI T3

All the LHC experiments at CERN are using Xrootd.

When should I use Xrootd ?

A typical use case for Xrootd :

  • you have TBs of WORM files, it doesn't matter their format, let's assume the .root CMS files
  • located on several file servers, and filesystems inside of them ( fs01:/data01, fs01:/data02, fs02:/data01, .. )
  • with your 'hot' files hosted on more than a file server ( to be both fault tolerant and provide high performance )
  • and you want to add/remove file servers in this cluster with a minimum amount of effort
  • and glue all the filesystems together as a single abstract namespace ( e.g. as /data )
  • but possibly without maintaining a DB for this namespace
  • and you want POSIX access to this /data
  • and mount /data by FUSE
  • and also X509, password, or Kerberos protection of /data files
  • and run your applications without crashes if a file is not hosted in /data but it can be accessed in another site ( CMS uses this feature )
  • and also keep in a cache these extra-site files to avoid to re-download them tens of times
  • and, ultimately, run a Linux batch farm to analyze this /data
In CMS Xrootd is typically installed on the top of GPFS, Lustre, HDFS, dCache but that's not a must.

Xrootd as a local cluster

In the following picture there is only 1 Local Redirector but they can be >1 for HA

zS6Qi.jpg

Xrootd as a federation of local clusters

In the following picture there is only 1 Global Redirector but they can be >1 for HA ; in CMS they are 3 masked by the entry point xrootd-cms.infn.it :

$ host xrootd-cms.infn.it
xrootd-cms.infn.it has address 193.205.76.83
xrootd-cms.infn.it has address 90.147.66.75
xrootd-cms.infn.it has address 134.158.132.31

OHcrA.jpg

How to download a file from a Xrootd federation ?

$ xrdcp -d 2 -f root://xrootd-cms.infn.it//store/user/martinelli_f/mt2_1.root /dev/null 
[2015-04-01 10:16:08.037912 +0200][Debug  ][Utility           ] CopyProcess: 2 jobs to prepare
[2015-04-01 10:16:08.038145 +0200][Debug  ][Utility           ] Creating a classic copy job, from root://xrootd-cms.infn.it:1094//store/user/martinelli_f/mt2_1.root to file:///dev/null
[2015-04-01 10:16:08.038200 +0200][Debug  ][Utility           ] Monitor library name not set. No monitoring
[2015-04-01 10:16:08.038286 +0200][Debug  ][Utility           ] Opening root://xrootd-cms.infn.it:1094//store/user/martinelli_f/mt2_1.root for reading
[2015-04-01 10:16:08.038368 +0200][Debug  ][File              ] [0xb0be00@root://xrootd-cms.infn.it:1094//store/user/martinelli_f/mt2_1.root] Sending an open command
[2015-04-01 10:16:08.038437 +0200][Debug  ][Poller            ] Available pollers: built-in
[2015-04-01 10:16:08.038454 +0200][Debug  ][Poller            ] Attempting to create a poller according to preference: built-in,libevent
[2015-04-01 10:16:08.038474 +0200][Debug  ][Poller            ] Creating poller: built-in
[2015-04-01 10:16:08.038492 +0200][Debug  ][Poller            ] Creating and starting the built-in poller...
[2015-04-01 10:16:08.038670 +0200][Debug  ][TaskMgr           ] Starting the task manager...
[2015-04-01 10:16:08.038754 +0200][Debug  ][TaskMgr           ] Task manager started
[2015-04-01 10:16:08.038778 +0200][Debug  ][JobMgr            ] Starting the job manager...
[2015-04-01 10:16:08.038935 +0200][Debug  ][JobMgr            ] Job manager started, 3 workers
[2015-04-01 10:16:08.038961 +0200][Debug  ][TaskMgr           ] Registering task: "FileTimer task" to be run at: [2015-04-01 10:16:08 +0200]
[2015-04-01 10:16:08.039023 +0200][Debug  ][PostMaster        ] Creating new channel to: xrootd-cms.infn.it:1094 1 stream(s)
[2015-04-01 10:16:08.039075 +0200][Debug  ][PostMaster        ] [xrootd-cms.infn.it:1094 #0] Stream parameters: Network Stack: IPAuto, Connection Window: 120, ConnectionRetry: 5, Stream Error Widnow: 1800
[2015-04-01 10:16:08.039178 +0200][Debug  ][TaskMgr           ] Registering task: "TickGeneratorTask for: xrootd-cms.infn.it:1094" to be run at: [2015-04-01 10:16:23 +0200]
[2015-04-01 10:16:08.051807 +0200][Debug  ][PostMaster        ] [xrootd-cms.infn.it:1094] Found 3 address(es): [::ffff:90.147.66.75]:1094, [::ffff:134.158.132.31]:1094, [::ffff:193.205.76.83]:1094
[2015-04-01 10:16:08.051864 +0200][Debug  ][AsyncSock         ] [xrootd-cms.infn.it:1094 #0.0] Attempting connection to [::ffff:90.147.66.75]:1094
[2015-04-01 10:16:08.051912 +0200][Debug  ][Poller            ] Adding socket 0xb11110 to the poller
[2015-04-01 10:16:08.100511 +0200][Debug  ][AsyncSock         ] [xrootd-cms.infn.it:1094 #0.0] Async connection call returned
[2015-04-01 10:16:08.100573 +0200][Debug  ][XRootDTransport   ] [xrootd-cms.infn.it:1094 #0.0] Sending out the initial hand shake + kXR_protocol
[2015-04-01 10:16:08.149043 +0200][Debug  ][XRootDTransport   ] [xrootd-cms.infn.it:1094 #0.0] Got the server hand shake response (type: manager [], protocol version 300)
[2015-04-01 10:16:08.149090 +0200][Debug  ][XRootDTransport   ] [xrootd-cms.infn.it:1094 #0.0] kXR_protocol successful (type: manager [], protocol version 300)
[2015-04-01 10:16:08.149244 +0200][Debug  ][XRootDTransport   ] [xrootd-cms.infn.it:1094 #0.0] Sending out kXR_login request, username: martinel, cgi: ?xrd.cc=ch&xrd.tz=1&xrd.appname=xrdcp&xrd.info=&xrd.hostname=t3ui18.psi.ch, dual-stack: false, private IPv4: false, private IPv6: false
[2015-04-01 10:16:08.198558 +0200][Debug  ][XRootDTransport   ] [xrootd-cms.infn.it:1094 #0.0] Logged in, session: 5f570000791a0000fe00000068570000
[2015-04-01 10:16:08.198597 +0200][Debug  ][PostMaster        ] [xrootd-cms.infn.it:1094 #0] Stream 0 connected.
[2015-04-01 10:16:08.247956 +0200][Debug  ][PostMaster        ] Creating new channel to: t3se01.psi.ch:1095 1 stream(s)
[2015-04-01 10:16:08.248025 +0200][Debug  ][PostMaster        ] [t3se01.psi.ch:1095 #0] Stream parameters: Network Stack: IPAuto, Connection Window: 120, ConnectionRetry: 5, Stream Error Widnow: 1800
[2015-04-01 10:16:08.248140 +0200][Debug  ][TaskMgr           ] Registering task: "TickGeneratorTask for: t3se01.psi.ch:1095" to be run at: [2015-04-01 10:16:23 +0200]
[2015-04-01 10:16:08.248350 +0200][Debug  ][PostMaster        ] [t3se01.psi.ch:1095] Found 1 address(es): [::ffff:192.33.123.24]:1095
[2015-04-01 10:16:08.248390 +0200][Debug  ][AsyncSock         ] [t3se01.psi.ch:1095 #0.0] Attempting connection to [::ffff:192.33.123.24]:1095
[2015-04-01 10:16:08.248437 +0200][Debug  ][Poller            ] Adding socket 0xe00013b0 to the poller
[2015-04-01 10:16:08.248751 +0200][Debug  ][AsyncSock         ] [t3se01.psi.ch:1095 #0.0] Async connection call returned
[2015-04-01 10:16:08.248810 +0200][Debug  ][XRootDTransport   ] [t3se01.psi.ch:1095 #0.0] Sending out the initial hand shake + kXR_protocol
[2015-04-01 10:16:08.249199 +0200][Debug  ][XRootDTransport   ] [t3se01.psi.ch:1095 #0.0] Got the server hand shake response (type: server [], protocol version 300)
[2015-04-01 10:16:08.249238 +0200][Debug  ][XRootDTransport   ] [t3se01.psi.ch:1095 #0.0] kXR_protocol successful (type: server [], protocol version 300)
[2015-04-01 10:16:08.249380 +0200][Debug  ][XRootDTransport   ] [t3se01.psi.ch:1095 #0.0] Sending out kXR_login request, username: martinel, cgi: ?xrd.cc=ch&xrd.tz=1&xrd.appname=xrdcp&xrd.info=&xrd.hostname=t3ui18.psi.ch, dual-stack: false, private IPv4: false, private IPv6: false
[2015-04-01 10:16:08.249654 +0200][Debug  ][XRootDTransport   ] [t3se01.psi.ch:1095 #0.0] Logged in, session: 4b1a0000ea5b0000070000004b1a0000
[2015-04-01 10:16:08.249681 +0200][Debug  ][PostMaster        ] [t3se01.psi.ch:1095 #0] Stream 0 connected.
[2015-04-01 10:16:08.250166 +0200][Debug  ][PostMaster        ] Creating new channel to: t3se01.psi.ch:1094 1 stream(s)
[2015-04-01 10:16:08.250238 +0200][Debug  ][PostMaster        ] [t3se01.psi.ch:1094 #0] Stream parameters: Network Stack: IPAuto, Connection Window: 120, ConnectionRetry: 5, Stream Error Widnow: 1800
[2015-04-01 10:16:08.250348 +0200][Debug  ][TaskMgr           ] Registering task: "TickGeneratorTask for: t3se01.psi.ch:1094" to be run at: [2015-04-01 10:16:23 +0200]
[2015-04-01 10:16:08.250559 +0200][Debug  ][PostMaster        ] [t3se01.psi.ch:1094] Found 1 address(es): [::ffff:192.33.123.24]:1094
[2015-04-01 10:16:08.250600 +0200][Debug  ][AsyncSock         ] [t3se01.psi.ch:1094 #0.0] Attempting connection to [::ffff:192.33.123.24]:1094
[2015-04-01 10:16:08.250654 +0200][Debug  ][Poller            ] Adding socket 0xe40013b0 to the poller
[2015-04-01 10:16:08.251016 +0200][Debug  ][AsyncSock         ] [t3se01.psi.ch:1094 #0.0] Async connection call returned
[2015-04-01 10:16:08.251065 +0200][Debug  ][XRootDTransport   ] [t3se01.psi.ch:1094 #0.0] Sending out the initial hand shake + kXR_protocol
[2015-04-01 10:16:08.253667 +0200][Debug  ][XRootDTransport   ] [t3se01.psi.ch:1094 #0.0] Got the server hand shake response (type: manager [], protocol version 289)
[2015-04-01 10:16:08.254770 +0200][Debug  ][XRootDTransport   ] [t3se01.psi.ch:1094 #0.0] kXR_protocol successful (type: manager [], protocol version 289)
[2015-04-01 10:16:08.254951 +0200][Debug  ][XRootDTransport   ] [t3se01.psi.ch:1094 #0.0] Sending out kXR_login request, username: martinel, cgi: ?xrd.cc=ch&xrd.tz=1&xrd.appname=xrdcp&xrd.info=&xrd.hostname=t3ui18.psi.ch, dual-stack: false, private IPv4: false, private IPv6: false
[2015-04-01 10:16:08.256334 +0200][Debug  ][XRootDTransport   ] [t3se01.psi.ch:1094 #0.0] Logged in, session: d5334db196a67ef59396e4e4820c4c29
[2015-04-01 10:16:08.256360 +0200][Debug  ][XRootDTransport   ] [t3se01.psi.ch:1094 #0.0] Authentication is required: &P=gsi,v:10200,c:ssl,ca:e72045ce
[2015-04-01 10:16:08.256380 +0200][Debug  ][XRootDTransport   ] [t3se01.psi.ch:1094 #0.0] Sending authentication data
[2015-04-01 10:16:08.258439 +0200][Debug  ][XRootDTransport   ] [t3se01.psi.ch:1094 #0.0] Trying to authenticate using gsi
[2015-04-01 10:16:08.290040 +0200][Debug  ][XRootDTransport   ] [t3se01.psi.ch:1094 #0.0] Sending more authentication data for gsi
[2015-04-01 10:16:08.337462 +0200][Debug  ][XRootDTransport   ] [t3se01.psi.ch:1094 #0.0] Authenticated with gsi.
[2015-04-01 10:16:08.337508 +0200][Debug  ][PostMaster        ] [t3se01.psi.ch:1094 #0] Stream 0 connected.
[2015-04-01 10:16:08.367209 +0200][Debug  ][PostMaster        ] Creating new channel to: 192.33.123.54:22217 1 stream(s)
[2015-04-01 10:16:08.367265 +0200][Debug  ][PostMaster        ] [192.33.123.54:22217 #0] Stream parameters: Network Stack: IPAuto, Connection Window: 120, ConnectionRetry: 5, Stream Error Widnow: 1800
[2015-04-01 10:16:08.367365 +0200][Debug  ][TaskMgr           ] Registering task: "TickGeneratorTask for: 192.33.123.54:22217" to be run at: [2015-04-01 10:16:23 +0200]
[2015-04-01 10:16:08.367495 +0200][Debug  ][PostMaster        ] [192.33.123.54:22217] Found 1 address(es): [::ffff:192.33.123.54]:22217
[2015-04-01 10:16:08.367528 +0200][Debug  ][AsyncSock         ] [192.33.123.54:22217 #0.0] Attempting connection to [::ffff:192.33.123.54]:22217
[2015-04-01 10:16:08.367568 +0200][Debug  ][Poller            ] Adding socket 0xd8000f30 to the poller
[2015-04-01 10:16:08.367735 +0200][Debug  ][AsyncSock         ] [192.33.123.54:22217 #0.0] Async connection call returned
[2015-04-01 10:16:08.367789 +0200][Debug  ][XRootDTransport   ] [192.33.123.54:22217 #0.0] Sending out the initial hand shake + kXR_protocol
[2015-04-01 10:16:08.369938 +0200][Debug  ][XRootDTransport   ] [192.33.123.54:22217 #0.0] Got the server hand shake response (type: server [], protocol version 289)
[2015-04-01 10:16:08.369998 +0200][Debug  ][XRootDTransport   ] [192.33.123.54:22217 #0.0] kXR_protocol successful (type: server [], protocol version 289)
[2015-04-01 10:16:08.370150 +0200][Debug  ][XRootDTransport   ] [192.33.123.54:22217 #0.0] Sending out kXR_login request, username: martinel, cgi: ?xrd.cc=ch&xrd.tz=1&xrd.appname=xrdcp&xrd.info=&xrd.hostname=t3ui18.psi.ch, dual-stack: false, private IPv4: false, private IPv6: false
[2015-04-01 10:16:08.371141 +0200][Debug  ][XRootDTransport   ] [192.33.123.54:22217 #0.0] Logged in, session: 89020000010000002000000000000000
[2015-04-01 10:16:08.371170 +0200][Debug  ][PostMaster        ] [192.33.123.54:22217 #0] Stream 0 connected.
[2015-04-01 10:16:08.371933 +0200][Debug  ][File              ] [0xb0be00@root://xrootd-cms.infn.it:1094//store/user/martinelli_f/mt2_1.root] Open has returned with status [SUCCESS] 
[2015-04-01 10:16:08.371963 +0200][Debug  ][File              ] [0xb0be00@root://xrootd-cms.infn.it:1094//store/user/martinelli_f/mt2_1.root] successfully opened at 192.33.123.54:22217, handle: 0x0, session id: 1
[2015-04-01 10:16:08.372043 +0200][Debug  ][Utility           ] Opening /dev/null for writing
[2015-04-01 10:16:08.372108 +0200][Debug  ][File              ] [0xb0be00@root://xrootd-cms.infn.it:1094//store/user/martinelli_f/mt2_1.root] Sending a read command for handle 0x0 to 192.33.123.54:22217
[510B/510B][100%][==================================================][0B/s]  [2015-04-01 10:16:08.372762 +0200][Debug  ][File              ] [0xb0be00@root://xrootd-cms.infn.it:1094//store/user/martinelli_f/mt2_1.root] Sending a close command for handle 0x0 to 192.33.123.54:22217
[2015-04-01 10:16:08.373145 +0200][Debug  ][File              ] [0xb0be00@root://xrootd-cms.infn.it:1094//store/user/martinelli_f/mt2_1.root] Close returned from 192.33.123.54:22217 with: [SUCCESS] 
[510B/510B][100%][==================================================][0B/s]  
[2015-04-01 10:16:08.373368 +0200][Debug  ][JobMgr            ] Stopping the job manager...
[2015-04-01 10:16:08.373760 +0200][Debug  ][JobMgr            ] Job manager stopped
[2015-04-01 10:16:08.373783 +0200][Debug  ][TaskMgr           ] Stopping the task manager...
[2015-04-01 10:16:08.373939 +0200][Debug  ][TaskMgr           ] Task manager stopped
[2015-04-01 10:16:08.373967 +0200][Debug  ][Poller            ] Stopping the poller...
[2015-04-01 10:16:08.374132 +0200][Debug  ][TaskMgr           ] Requesting unregistration of: "TickGeneratorTask for: 192.33.123.54:22217"
[2015-04-01 10:16:08.374160 +0200][Debug  ][AsyncSock         ] [192.33.123.54:22217 #0.0] Closing the socket
[2015-04-01 10:16:08.374180 +0200][Debug  ][Poller            ] <[::ffff:192.33.123.138]:38028><--><[::ffff:192.33.123.54]:22217> Removing socket from the poller
[2015-04-01 10:16:08.374221 +0200][Debug  ][PostMaster        ] [192.33.123.54:22217 #0] Destroying stream
[2015-04-01 10:16:08.374241 +0200][Debug  ][AsyncSock         ] [192.33.123.54:22217 #0.0] Closing the socket
[2015-04-01 10:16:08.374265 +0200][Debug  ][TaskMgr           ] Requesting unregistration of: "TickGeneratorTask for: t3se01.psi.ch:1094"
[2015-04-01 10:16:08.374283 +0200][Debug  ][AsyncSock         ] [t3se01.psi.ch:1094 #0.0] Closing the socket
[2015-04-01 10:16:08.374301 +0200][Debug  ][Poller            ] <[::ffff:192.33.123.138]:58415><--><[::ffff:192.33.123.24]:1094> Removing socket from the poller
[2015-04-01 10:16:08.374335 +0200][Debug  ][PostMaster        ] [t3se01.psi.ch:1094 #0] Destroying stream
[2015-04-01 10:16:08.374354 +0200][Debug  ][AsyncSock         ] [t3se01.psi.ch:1094 #0.0] Closing the socket
[2015-04-01 10:16:08.374376 +0200][Debug  ][TaskMgr           ] Requesting unregistration of: "TickGeneratorTask for: t3se01.psi.ch:1095"
[2015-04-01 10:16:08.374401 +0200][Debug  ][AsyncSock         ] [t3se01.psi.ch:1095 #0.0] Closing the socket
[2015-04-01 10:16:08.374425 +0200][Debug  ][Poller            ] <[::ffff:192.33.123.138]:45942><--><[::ffff:192.33.123.24]:1095> Removing socket from the poller
[2015-04-01 10:16:08.374458 +0200][Debug  ][PostMaster        ] [t3se01.psi.ch:1095 #0] Destroying stream
[2015-04-01 10:16:08.374477 +0200][Debug  ][AsyncSock         ] [t3se01.psi.ch:1095 #0.0] Closing the socket
[2015-04-01 10:16:08.374499 +0200][Debug  ][TaskMgr           ] Requesting unregistration of: "TickGeneratorTask for: xrootd-cms.infn.it:1094"
[2015-04-01 10:16:08.374516 +0200][Debug  ][AsyncSock         ] [xrootd-cms.infn.it:1094 #0.0] Closing the socket
[2015-04-01 10:16:08.374535 +0200][Debug  ][Poller            ] <[::ffff:192.33.123.138]:46151><--><[::ffff:90.147.66.75]:1094> Removing socket from the poller
[2015-04-01 10:16:08.374565 +0200][Debug  ][PostMaster        ] [xrootd-cms.infn.it:1094 #0] Destroying stream
[2015-04-01 10:16:08.374585 +0200][Debug  ][AsyncSock         ] [xrootd-cms.infn.it:1094 #0.0] Closing the socket

CMS requirements for Xrootd

  • Reliability: The end-user should never see an I/O error or failure propagated up to their application unless no USCMS site can serve the file. Failures should be caught as early as possible and I/O retried or rerouted to a different site (possibly degrading the service slightly).
  • Transparency: All actions of the underlying system should be automatic for the user – catalog lookups, redirections, reconnections. There should not be a different workflow for accessing the data "close by" versus halfway around the world. This implies the system serves user requests almost instantly; opening files should be a "lightweight" operation.
  • Usability: All CMS application frameworks (CMSSW, FWLite, bare ROOT) must natively integrate with any proposed solution. The proposed solution must not degrade the event processing rate significantly.
  • Global: A CMS user should be able to get at any CMS file through the Xrootd service.
To achieve these goals, we will be pursuing a distributed architecture based upon the Xrootd protocol and software developed by SLAC. The proposed architecture is also similar to the current data management architecture of the ALICE experiment. Note that we specifically did not put scalability here - we already have an existing infrastructure that scales just fine. We have no intents on replacing current CMS data access methods for production. We believe that these goals will greatly reduce the difficulty of data access for physicists on the small or medium scale.

This new architecture has four deliverables for CMS:

  • A production-quality, global xrootd infrastructure.
  • Fallback data access for jobs running at the T2.
  • Interactive access for CMS physicists.
  • A disk-free data access system for T3 sites.
CMS usage of Xrootd

u6wGS.jpg

Xrootd and dCache as at PSI T3

XroodDcacheIntegrationV2.png

Nice CMS / Xrootd talk

http://www.desy.de/dvsem/WS1112/donvito_talk.pdf

Latest Xrootd workshop, 2015

https://indico.cern.ch/event/330212/

Edit | Attach | Watch | Print version | History: r13 < r12 < r11 < r10 < r9 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r13 - 2015-04-02 - DanielMeister
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback