/tmp/slurmd
was not created by SLURM. This made the node become a job black hole not detected by the health check system.
/tmp
(local) instead of /tmpdir_slurm
(GPFS).
lcp-cp
got outdated and WLCG uses now gfal-copy. First run of gfal-copy
on SL5 crashed..
absent
/cms/chcms
that can be administrated by Christoph and Daniel.
04 Mar 2014 17:18:00 (gPlazma) [Xrootd-t3se01 Login AUTH voms] Certificate verification: Verifying certificate 'DC=ch,DC=cern,OU=computers,CN=voms.cern.ch' 04 Mar 2014 17:18:00 (gPlazma) [Xrootd-t3se01 Login MAP vorolemap] VOMS authorization successful for user with DN: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=silveira/CN=705497/CN=Gustavo Gil Da Silveira and FQAN: /cms for user name: cmsuser. 04 Mar 2014 17:18:00 (gPlazma) [Xrootd-t3se01 Login] Login attempt failed; detailed explanation follows: LOGIN FAIL | in: X509 Certificate chain: | | | +--CN=1758990799,CN=1394689704,CN=Gustavo Gil Da Silveira,CN=705497,CN=silveira,OU=Users,OU=Organic Units,DC=cern,DC=ch [1758990799] | | | | | +--Issuer: CN=1394689704,CN=Gustavo Gil Da Silveira,CN=705497,CN=silveira,OU=Users,OU=Organic Units,DC=cern,DC=ch | | +--Validity: OK for 16 hours, 54 minutes and 51.3 seconds | | +--Algorithm: SHA-1 with RSA | | +--Public key: RSA 1024 bits | | +--Key usage: digital signature, key encipherment, data encipherment | | | +--CN=1394689704,CN=Gustavo Gil Da Silveira,CN=705497,CN=silveira,OU=Users,OU=Organic Units,DC=cern,DC=ch [1455016582116403591211701295546859998117066254174] | | | | | +--Issuer: CN=Gustavo Gil Da Silveira,CN=705497,CN=silveira,OU=Users,OU=Organic Units,DC=cern,DC=ch | | +--Validity: OK for 7 days, 16 hours, 22 minutes and 19.2 seconds | | +--Algorithm: SHA-1 with RSA | | +--Public key: RSA 1024 bits | | +--Attribute certificates: | | | | | | | +--DC=ch,DC=cern,OU=computers,CN=voms.cern.ch | | | +--Validity: OK for 7 days, 16 hours, 22 minutes and 19.2 seconds | | | +--Algorithm: SHA-1 with RSA | | | +--FQANs: /cms, /cms/becms | | +--Key usage: digital signature, key encipherment, data encipherment | | | +--CN=Gustavo Gil Da Silveira,CN=705497,CN=silveira,OU=Users,OU=Organic Units,DC=cern,DC=ch [315385076395555361510222] | | | +--Issuer: CN=CERN Trusted Certification Authority,DC=cern,DC=ch | +--Validity: OK for 28 days, 22 hours, 41 minutes and 41.2 seconds | +--Algorithm: SHA-1 with RSA | +--Public key: RSA 2048 bits | +--Subject alternative names: | | otherName: 302a060a2b060104018237140203a01ca01a0c186775737461766f2e73696c7665697261406365726e2e6368 | | email: gustavo.silveira@cern.ch | +--Key usage: digital signature, key encipherment, SSL client, email protection, Microsoft EPS | | +--AUTH OK | | added: FQANPrincipal[/cms,primary] | | FQANPrincipal[/cms/becms] | | /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=silveira/CN=705497/CN=Gustavo Gil Da Silveira | | | +--x509 OPTIONAL:OK => OK | | added: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=silveira/CN=705497/CN=Gustavo Gil Da Silveira | | | +--voms OPTIONAL:OK => OK | added: FQANPrincipal[/cms,primary] | FQANPrincipal[/cms/becms] | +--MAP OK | | added: GidPrincipal[500,primary] | | UidPrincipal[501] | | UserNamePrincipal[cmsuser] | | GroupNamePrincipal[cmsuser,primary] | | | +--vorolemap REQUISITE:OK => OK | | added: GroupNamePrincipal[cmsuser,primary] | | | +--authzdb REQUISITE:OK => OK | added: GidPrincipal[500,primary] | UidPrincipal[501] | UserNamePrincipal[cmsuser] | +--ACCOUNT FAIL | | | +--banfile REQUISITE:FAIL (user banned) => FAIL (ends the phase) | +--(SESSION) skipped | +--(VALIDATION) skipped
[root@t3admin01 ~]# ipython Python 2.7.6 |Anaconda 1.9.0 (64-bit)| (default, Jan 17 2014, 10:13:17) Type "copyright", "credits" or "license" for more information. IPython 1.1.0 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. In [1]: import salt.client In [2]: saltclient = salt.client.LocalClient() # I connect to the salt master In [3]: mydict = saltclient.cmd('nodename:t3wn*', 'ps.top', expr_form='grain' ) # I run the Python top program only on the WNs # For each WN, the 5 top processes In [4]: print mydict {'t3wn34': [ {'status': 0, 'mem.vms': 1043345408, 'cmd': ['cmsRun', '/shome/cgalloni/TestSim/CMSSW_5_3_2_patch4/src/SIM/MR_M1000_SIM_cfi.py', 'maxEvents=500', 'skipEvents=228000', 'seed=3936709'], 'pid': 9985, 'cpu.user': 10463.36, 'cpu.system': 3.38, 'create_time': 1394091559.43, 'user': 'cgalloni', 'mem.rss': 725270528}, {'status': 0, 'mem.vms': 1024819200, 'cmd': ['cmsRun', '/shome/cgalloni/TestSim/CMSSW_5_3_2_patch4/src/SIM/MR_M1000_SIM_cfi.py', 'maxEvents=500', 'skipEvents=228500', 'seed=7239021'], 'pid': 10243, 'cpu.user': 10431.48, 'cpu.system': 3.3, 'create_time': 1394091591.95, 'user': 'cgalloni', 'mem.rss': 703696896}, {'status': 0, 'mem.vms': 1021722624, 'cmd': ['cmsRun', '/shome/cgalloni/TestSim/CMSSW_5_3_2_patch4/src/SIM/MR_M1000_SIM_cfi.py', 'maxEvents=500', 'skipEvents=224000', 'seed=2991033'], 'pid': 8030, 'cpu.user': 10838.1, 'cpu.system': 2.81, 'create_time': 1394091186.8, 'user': 'cgalloni', 'mem.rss': 702164992}, {'status': 0, 'mem.vms': 1019928576, 'cmd': ['cmsRun', '/shome/cgalloni/TestSim/CMSSW_5_3_2_patch4/src/SIM/MR_M1000_SIM_cfi.py', 'maxEvents=500', 'skipEvents=234000', 'seed=9386177'], 'pid': 11313, 'cpu.user': 10170.57, 'cpu.system': 2.74, 'create_time': 1394091852.93, 'user': 'cgalloni', 'mem.rss': 703258624}, {'status': 1, 'mem.vms': 4165632, 'cmd': ['mdadm', '--monitor', '--scan', '-f', '--pid-file=/var/run/mdadm/mdadm.pid'], 'pid': 20269, 'cpu.user': 0.0, 'cpu.system': 0.01, 'create_time': 1394036233.94, 'user': 'root', 'mem.rss': 401408} ], 't3wn33': .........
I | Attachment | History | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|---|
png | xrootd-queued-requests.png | r1 | manage | 10.9 K | 2014-03-04 - 14:41 | FabioMartinelli | T3 PSI xrootd queued requests |