Tags:
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup --> %TOC% %ICON{arrowleft}% Go to [[CMSTier3LogXX][previous page]] / [[CMSTier3LogXX][next page]] of Tier3 site log %M% ---+ 21. 08. 2013 t3fs13 dCache pools found unresponsive Today since ~19:30 the =t3fs13= were unresponsive, I noticed it because the [[https://t3nagios.psi.ch/nagios/cgi-bin/status.cgi?navbarsearch=1&host=t3cmsvobox02][SRM tests constantly executed by Nagios vs each pool]] were in error just concerning the =t3fs13= pools; a =show pinboard= command in one of those pool was exiting with =Request timed out after 10 seconds while sending message to [>t3fs13_cms_6@local]= ; the =t3fs13= cells responsiveness was unusually slow as here reported: <pre> ... t3fs09_ops t3fs09-Domain-pool 0 37 21 msec 03/28 14:51:33 t3fs10_cms t3fs10-Domain-pool 0 74 18 msec 03/28 14:54:34 t3fs10_cms_1 t3fs10-Domain-pool 0 45 21 msec 03/28 14:54:3 t3fs10_ops t3fs10-Domain-pool 0 37 21 msec 03/28 14:54:39 t3fs11_cms t3fs11-Domain-pool 0 48 22 msec 08/08 08:31:20 t3fs11_cms_1 t3fs11-Domain-pool 0 43 22 msec 08/08 08:31:2 t3fs11_ops t3fs11-Domain-pool 0 37 22 msec 08/08 08:31:28 t3fs13_cms t3fs13-Domain-pool 0 92 48538 msec 08/16 16:44:14 <--------- look the ms delay t3fs13_cms_0 t3fs13-Domain-pool 0 45 49971 msec 08/16 16:44:21 t3fs13_cms_1 t3fs13-Domain-pool 0 81 48531 msec 08/16 16:44:17 t3fs13_cms_10 t3fs13-Domain-pool 6 61 27393 msec 08/16 16:44:21 t3fs13_cms_11 t3fs13-Domain-pool 8 78 27191 msec 08/16 16:44:21 t3fs13_cms_2 t3fs13-Domain-pool 4 78 27192 msec 08/16 16:44:18 t3fs13_cms_3 t3fs13-Domain-pool 11 103 27191 msec 08/16 16:44:19 t3fs13_cms_4 t3fs13-Domain-pool 6 95 18211 msec 08/16 16:44:19 t3fs13_cms_5 t3fs13-Domain-pool 5 70 27190 msec 08/16 16:44:20 t3fs13_cms_6 t3fs13-Domain-pool 8 69 18213 msec 08/16 16:44:20 t3fs13_cms_7 t3fs13-Domain-pool 0 72 8618 msec 08/16 16:44:20 t3fs13_cms_8 t3fs13-Domain-pool 13 60 18212 msec 08/16 16:44:20 t3fs13_cms_9 t3fs13-Domain-pool 13 63 18211 msec 08/16 16:44:2 t3fs13_ops t3fs13-Domain-pool 0 44 27190 msec 08/16 16:44:21 t3fs13_ops_1 t3fs13-Domain-pool 3 44 27394 msec 08/16 16:44:21 t3fs13_ops_10 t3fs13-Domain-pool 0 44 48537 msec 08/16 16:44:23 t3fs13_ops_11 t3fs13-Domain-pool 0 44 48737 msec 08/16 16:44:23 t3fs13_ops_2 t3fs13-Domain-pool 0 44 49961 msec 08/16 16:44:22 t3fs13_ops_3 t3fs13-Domain-pool 0 44 27190 msec 08/16 16:44:22 t3fs13_ops_4 t3fs13-Domain-pool 0 44 49963 msec 08/16 16:44:22 t3fs13_ops_5 t3fs13-Domain-pool 0 44 49963 msec 08/16 16:44:22 t3fs13_ops_6 t3fs13-Domain-pool 0 44 49962 msec 08/16 16:44:22 t3fs13_ops_7 t3fs13-Domain-pool 0 44 45155 msec 08/16 16:44:23 t3fs13_ops_8 t3fs13-Domain-pool 0 44 49964 msec 08/16 16:44: t3fs13_ops_9 t3fs13-Domain-pool 0 44 49965 msec 08/16 16:44: <--------------- t3fs14_cms t3fs14-Domain-pool 0 64 25 msec 08/16 16:52:05 t3fs14_cms_0 t3fs14-Domain-pool 0 45 24 msec 08/16 16:52:13 t3fs14_cms_1 t3fs14-Domain-pool 0 58 24 msec 08/16 16:52:08 t3fs14_cms_10 t3fs14-Domain-pool 0 57 25 msec 08/16 16:52:12 2.2.9(18131) ... </pre> to fix I ran: =[root@t3fs13 dcache]# dcache restart t3fs13-Domain-pool= </br> *but then the problem appeared again 1h later*: this time I've found this exception in the logs and all the pools went in =OFFLINE= mode: <pre> [root@t3fs13 dcache]# grep --color java.nio.HeapByteBuffer t3fs13-Domain-pool.log -A 14 ... 21 Aug 2013 22:24:35 (t3fs13_cms_3) [DCap-t3se01-<unknown>-1531425 PoolDeliverFile 00007629A65D9DA64BC8BEC959BB96C19D3E] Problem in command block : RequestBlock [Size=24 Code=11 Buffer=java.nio.HeapByteBuffer[pos=24 lim=24 cap=16384] java.lang.IllegalArgumentException: null at java.nio.Buffer.limit(Buffer.java:247) ~[na:1.6.0_37] at org.dcache.pool.movers.DCapProtocol_3_nio.doTheRead(DCapProtocol_3_nio.java:1083) ~[dcache-dcap-2.2.9.jar:2.2.9] at org.dcache.pool.movers.DCapProtocol_3_nio.runIO(DCapProtocol_3_nio.java:606) ~[dcache-dcap-2.2.9.jar:2.2.9] at org.dcache.pool.classic.PoolIOReadTransfer.transfer(PoolIOReadTransfer.java:58) [dcache-core-2.2.9.jar:2.2.9] at org.dcache.pool.classic.LegacyMoverExecutorService$MoverTask.run(LegacyMoverExecutorService.java:66) [dcache-core-2.2.9.jar:2.2.9] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [na:1.6.0_37] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [na:1.6.0_37] at java.lang.Thread.run(Thread.java:662) [na:1.6.0_37] 21 Aug 2013 22:24:35 (t3fs13_cms_3) [DCap-t3se01-<unknown>-1531425 PoolDeliverFile 00007629A65D9DA64BC8BEC959BB96C19D3E] (Transfer finished : 0 bytes in 1 seconds) 21 Aug 2013 22:24:37 (t3fs13_cms_3) [DCap-t3se01-<unknown>-1531425 PoolDeliverFile 00007629A65D9DA64BC8BEC959BB96C19D3E] DCapProtocol_3 (nio) created $Id: CMSTier3Log43.txt,v 1.2 2013/08/21 22:07:34 fabiom Exp $ 21 Aug 2013 22:24:37 (t3fs13_cms_6) [DCap-t3se01-<unknown>-1531462 PoolDeliverFile 00008AFD66ACE04949189937FF8632D4A229] DCapProtocol_3 (nio) created $Id: CMSTier3Log43.txt,v 1.2 2013/08/21 22:07:34 fabiom Exp $ 21 Aug 2013 22:24:37 (t3fs13_cms_3) [DCap-t3se01-<unknown>-1531425 PoolDeliverFile 00007629A65D9DA64BC8BEC959BB96C19D3E] Setup : Defaults Buffer Sizes : send=262144;recv=262144;io=262144 21 Aug 2013 22:24:37 (t3fs13_cms_3) [DCap-t3se01-<unknown>-1531425 PoolDeliverFile 00007629A65D9DA64BC8BEC959BB96C19D3E] Setup : Max Buffer Sizes : send=1048576;recv=1048576;io=1048576 -- 21 Aug 2013 22:24:37 (t3fs13_cms_3) [DCap-t3se01-<unknown>-1531425 PoolDeliverFile 00007629A65D9DA64BC8BEC959BB96C19D3E] Problem in command block : RequestBlock [Size=24 Code=11 Buffer=java.nio.HeapByteBuffer[pos=24 lim=24 cap=16384] java.lang.IllegalArgumentException: null at java.nio.Buffer.limit(Buffer.java:247) ~[na:1.6.0_37] at org.dcache.pool.movers.DCapProtocol_3_nio.doTheRead(DCapProtocol_3_nio.java:1083) ~[dcache-dcap-2.2.9.jar:2.2.9] at org.dcache.pool.movers.DCapProtocol_3_nio.runIO(DCapProtocol_3_nio.java:606) ~[dcache-dcap-2.2.9.jar:2.2.9] at org.dcache.pool.classic.PoolIOReadTransfer.transfer(PoolIOReadTransfer.java:58) [dcache-core-2.2.9.jar:2.2.9] at org.dcache.pool.classic.LegacyMoverExecutorService$MoverTask.run(LegacyMoverExecutorService.java:66) [dcache-core-2.2.9.jar:2.2.9] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [na:1.6.0_37] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [na:1.6.0_37] at java.lang.Thread.run(Thread.java:662) [na:1.6.0_37] 21 Aug 2013 22:24:39 (t3fs13_cms_3) [DCap-t3se01-<unknown>-1531425 PoolDeliverFile 00007629A65D9DA64BC8BEC959BB96C19D3E] (Transfer finished : 0 bytes in 1 seconds) 21 Aug 2013 22:24:41 (t3fs13_cms_3) [DCap-t3se01-<unknown>-1531425 PoolDeliverFile 00007629A65D9DA64BC8BEC959BB96C19D3E] DCapProtocol_3 (nio) created $Id: CMSTier3Log43.txt,v 1.2 2013/08/21 22:07:34 fabiom Exp $ 21 Aug 2013 22:24:41 (t3fs13_cms_3) [DCap-t3se01-<unknown>-1531425 PoolDeliverFile 00007629A65D9DA64BC8BEC959BB96C19D3E] Setup : Defaults Buffer Sizes : send=262144;recv=262144;io=262144 21 Aug 2013 22:24:41 (t3fs13_cms_3) [DCap-t3se01-<unknown>-1531425 PoolDeliverFile 00007629A65D9DA64BC8BEC959BB96C19D3E] Setup : Max Buffer Sizes : send=1048576;recv=1048576;io=1048576 21 Aug 2013 22:24:41 (t3fs13_cms_3) [DCap-t3se01-<unknown>-1531425 PoolDeliverFile 00007629A65D9DA64BC8BEC959BB96C19D3E] ioError = -1 -- 21 Aug 2013 22:24:41 (t3fs13_cms_3) [DCap-t3se01-<unknown>-1531425 PoolDeliverFile 00007629A65D9DA64BC8BEC959BB96C19D3E] Problem in command block : RequestBlock [Size=24 Code=11 Buffer=java.nio.HeapByteBuffer[pos=24 lim=24 cap=16384] java.lang.IllegalArgumentException: null at java.nio.Buffer.limit(Buffer.java:247) ~[na:1.6.0_37] at org.dcache.pool.movers.DCapProtocol_3_nio.doTheRead(DCapProtocol_3_nio.java:1083) ~[dcache-dcap-2.2.9.jar:2.2.9] at org.dcache.pool.movers.DCapProtocol_3_nio.runIO(DCapProtocol_3_nio.java:606) ~[dcache-dcap-2.2.9.jar:2.2.9] at org.dcache.pool.classic.PoolIOReadTransfer.transfer(PoolIOReadTransfer.java:58) [dcache-core-2.2.9.jar:2.2.9] at org.dcache.pool.classic.LegacyMoverExecutorService$MoverTask.run(LegacyMoverExecutorService.java:66) [dcache-core-2.2.9.jar:2.2.9] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [na:1.6.0_37] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [na:1.6.0_37] at java.lang.Thread.run(Thread.java:662) [na:1.6.0_37] </pre> [root@t3fs13 dcache]# elinks http://t3se01:2288 <pre> ... t3fs11_ops t3fs11-Domain-pool 0 37 19 msec 08/08 08:31:28 2.2.9(18131) t3fs13_cms t3fs13-Domain-pool OFFLINE t3fs13_cms_0 t3fs13-Domain-pool 0 45 19373 msec 08/21 22:15:27 2.2.9(18131) t3fs13_cms_1 t3fs13-Domain-pool OFFLINE t3fs13_cms_10 t3fs13-Domain-pool OFFLINE t3fs13_cms_11 t3fs13-Domain-pool OFFLINE t3fs13_cms_2 t3fs13-Domain-pool 0 51 19372 msec 08/21 22:15:23 2.2.9(18131) t3fs13_cms_3 t3fs13-Domain-pool OFFLINE t3fs13_cms_4 t3fs13-Domain-pool OFFLINE t3fs13_cms_5 t3fs13-Domain-pool OFFLINE t3fs13_cms_6 t3fs13-Domain-pool OFFLINE t3fs13_cms_7 t3fs13-Domain-pool OFFLINE t3fs13_cms_8 t3fs13-Domain-pool OFFLINE t3fs13_cms_9 t3fs13-Domain-pool OFFLINE t3fs13_ops t3fs13-Domain-pool 0 44 19373 msec 08/21 22:15:28 2.2.9(18131) t3fs13_ops_1 t3fs13-Domain-pool 0 44 19373 msec 08/21 22:15:28 2.2.9(18131) t3fs13_ops_10 t3fs13-Domain-pool 0 44 19372 msec 08/21 22:15:34 2.2.9(18131) t3fs13_ops_11 t3fs13-Domain-pool 0 44 19373 msec 08/21 22:15:36 2.2.9(18131) t3fs13_ops_2 t3fs13-Domain-pool 0 44 19373 msec 08/21 22:15:29 2.2.9(18131) t3fs13_ops_3 t3fs13-Domain-pool 0 44 18567 msec 08/21 22:15:29 2.2.9(18131) t3fs13_ops_4 t3fs13-Domain-pool 0 44 18568 msec 08/21 22:15:29 2.2.9(18131) t3fs13_ops_5 t3fs13-Domain-pool 0 44 25278 msec 08/21 22:15:29 2.2.9(18131) t3fs13_ops_6 t3fs13-Domain-pool 0 44 18568 msec 08/21 22:15:30 2.2.9(18131) t3fs13_ops_7 t3fs13-Domain-pool 0 44 18568 msec 08/21 22:15:30 2.2.9(18131) t3fs13_ops_8 t3fs13-Domain-pool 0 44 18568 msec 08/21 22:15:30 2.2.9(18131) t3fs13_ops_9 t3fs13-Domain-pool 0 44 18568 msec 08/21 22:15:31 2.2.9(18131) t3fs14_cms t3fs14-Domain-pool 0 65 24 msec 08/16 16:52:05 2.2.9(18131) ... </pre> [root@t3fs13 dcache]# dcache status <pre> DOMAIN STATUS PID USER t3fs13-Domain-pool running 946 dcache t3fs13-Domain-dcap running 5267 dcache t3fs13-Domain-gridftp running 5339 dcache t3fs13-Domain-gsidcap running 5406 dcache A heap dump file, /var/log/dcache/t3fs13-Domain-pool-oom.hprof, was found for domain t3fs13-Domain-pool. The file was generated at 2013-08-21 22:14:23.615147040 +0200 as a result of an out of memory failure in the domain. The dump contains debugging information that may help a developer determine the cause of high memory usage. Please note that the dump may contain confidential information. As long as the file exists no other dumps will be generated on out of memory failures. Please move or delete /var/log/dcache/t3fs13-Domain-pool-oom.hprof. Consider increasing the dcache.java.memory.heap property. The current value is 1024m. </pre> I doubled =dcache.java.memory.heap= in =/etc/dcache/dcache.conf= and completely restarted dcache on =t3fs13= ---------------- %ICON{arrowleft}% Go to [[CMSTier3LogXX][previous page]] / [[CMSTier3LogXX][next page]] of Tier3 site log %M%
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r2
<
r1
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r2 - 2013-08-21
-
FabioMartinelli
CmsTier3
Log In
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
User Pages
Main Page
Policies
Monitoring Storage Space
Monitoring Slurm Usage
Physics Groups
Steering Board Meetings
Admin Pages
AdminArea
Cluster Specs
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback