[...] Tue Apr 14 20:22:52.964 2015: Recovered 1 nodes for file system phoenix_scratch. Tue Apr 14 20:29:13.455 2015: Accepted and connected to 148.187.65.62 wn62 <c0n14> *** glibc detected *** /usr/lpp/mmfs/bin//mmfsd: invalid fastbin entry (free): 0x00007fbf202829b0 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x76166)[0x7fbf4fc59166] /usr/lpp/mmfs/bin//mmfsd(_ZN10MsgDataBuf8freeDataEv+0x18)[0x90e5b8] /usr/lpp/mmfs/bin//mmfsd(_ZN10MsgDataBufD1Ev+0x9)[0x910469] /usr/lpp/mmfs/bin//mmfsd(_ZN7TcpConn9deleteMsgEP6RcvMsg+0x4c)[0x918cac] /usr/lpp/mmfs/bin//mmfsd(_ZN10NsdRequest14processRequestEP9NsdBufferP8NsdQueue+0x385)[0x10f0d65] /usr/lpp/mmfs/bin//mmfsd[0x10f17ba] /usr/lpp/mmfs/bin//mmfsd(_ZN6Thread8callBodyEPS_+0x66)[0x5a4676] /usr/lpp/mmfs/bin//mmfsd(_ZN6Thread15callBodyWrapperEPS_+0x79)[0x5963f9] /lib64/libpthread.so.0(+0x79d1)[0x7fbf5070c9d1] /lib64/libc.so.6(clone+0x6d)[0x7fbf4fccbb6d] ======= Memory map: ======== 00400000-0134c000 r-xp 00000000 fd:01 2757550 /usr/lpp/mmfs/bin/mmfsd 0144b000-01490000 rwxp 00f4b000 fd:01 2757550 /usr/lpp/mmfs/bin/mmfsd 01490000-014f7000 rwxp 00000000 00:00 0 0237d000-025a2000 rwxp 00000000 00:00 0 [heap]Both metadata servers were affected with a difference of about 8 hours. This also caused one of the metadata servers to be out of sync (its SSD disks were expelled from GPFS).
arc01
re-installed with nordugrid-arc 5.0.0-2
cgroups
in order to contain jobs. This would require a major SLURM upgrade. The following presentation done at HEPIX on this matter is very interesting.
dCache 2.6
will be out of support soon.
[wn] cat /etc/sysconfig/sgeexecd
export SGE_CGROUP_DIR=/dev/cpuset/sge
[wn] grep -Hn setup-cgroups-etc /etc/init.d/sgeexecd.p6444 /etc/init.d/sgeexecd.p6444:441:
/opt/sge/util/resources/scripts/setup-cgroups-etc start
[wn] qconf -sconf |grep -Hn CGR (standard input):28:execd_params USE_SMAPS=true KEEP_ACTIVE=true
USE_CGROUPS=true ENABLE_BINDING=true \
[submission_host] grep -v \# /opt/sge/default/common/sge_request | strings
-binding set linear
SRM client 2.10.7 (rpm)
-copyjobfile
option, like PhEDEx does ; dCache team acknowledged this bug
$ srmls -debug=false -x509_user_proxy=/home/phedex/gridcert/proxy.cert -retry_num=0 'srm://t3se01.psi.ch:8443/srm/managerv2?SFN=/pnfs/psi.ch/cms/trivcat/store/mc/RunIIWinter15GS/RSGravToWWToLNQQ_kMpl01_M-4000_TuneCUETP8M1_13TeV-pythia8/GEN-SIM/MCRUN2_71_V1-v1/10000/2898A22B-62B0-E411-B1D4-002590D600EE.root' srm client error: java.lang.IllegalArgumentException: Multiple entries with same key: x509_user_proxy=/home/phedex/gridcert/proxy.cert and x509_user_proxy=/tmp/x509up_u205
$ srm-advisory-delete -x509_user_proxy=${X509_USER_PROXY} -retry_num=0 srm client error: java.lang.IllegalArgumentException: Multiple entries with same key: x509_user_proxy=-retry_num=0 and x509_user_proxy=/tmp/x509up_u205
Warning: Can't find topic "".""
|
|