Tags:
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> KeyWords: SysAdmin ---+ documentation of PNFS related problems on our dcache installation We began looking in more depth at the problem of root owned directories in PNFS, i.e. directories that were created during normal user requests, but that for some reason had the ownership root.root, leading to errors for user requests trying to write to these areas. The problem became particularly bad shortly after upgrading from dCache 1.8.0-15p to 1.9.3, but we cannot really say whether the worsening of the condition has any connection to the new version. When additional service problems became apparent, we decided to rush into a migration of pnfs to chimera, which showed a number of other problems. All of this prompted us to examine out PNFS a little closer. One case where root owned directories appear, and their behavior <pre> seq 9|xargs -P9 -n1 --replace srmmkdir srm://storage01.lcg.cscs.ch:8443/pnfs/lcg.cscs.ch/cms/local_tests/dircreate-{} Return code: SRM_FAILURE Explanation: srm://storage01.lcg.cscs.ch:8443/pnfs/lcg.cscs.ch/cms/local_tests/dircreate-9 Failed to create, got error return code from pnfs: path /pnfs/fs/usr/cms/local_tests/dircreate-9 not found ( .(id)(dircreate-9) ) $ ls -ld dircreate-9 drwxr-xr-x 1 root root 512 Sep 8 16:49 dircreate-9 $ cat /pnfs/lcg.cscs.ch/cms/local_tests/".(id)(dircreate-9)" cat: /pnfs/lcg.cscs.ch/cms/local_tests/.(id)(dircreate-9): No such file or directory $ chown cmsprd.cms /pnfs/lcg.cscs.ch/cms/local_tests/dircreate-9 $ ls -ld dircreate-9 drwxr-xr-x 1 cmsprd cms 512 Sep 8 16:49 dircreate-9 $ cat /pnfs/lcg.cscs.ch/cms/local_tests/".(id)(dircreate-9)" cat: /pnfs/lcg.cscs.ch/cms/local_tests/.(id)(dircreate-9): No such file or directory </pre> Strange enough, when I checked the file 2 hours later, it had got a pnfs ID. <pre> $ cat /pnfs/lcg.cscs.ch/cms/local_tests/".(id)(dircreate-9)" 000200000000000002BCD9E0 </pre> How can that be? I see two possibilities * the original creation process had been stuck, and finished at some point * there is a repair process going over the filesystem in intervals (this would be strange) Checking in the pnfs log, I can identify two entries <pre> 09/08/09 16:49:44 127.0.0.1-0-0(0,1,2,3,4,6,10,) - mkdir dir 0002000000000000000010C0-000000000000 0000 name dircreate-9 : %GREEN%000200000000000002BCD9E0%ENDCOLOR% (0) -> 0 09/08/09 16:52:38 127.0.0.1-0-0(0,1,2,3,4,6,10,) - setattr %GREEN%000200000000000002BCD9E0%ENDCOLOR%-000000000000000 0 uid=4199;gid=4001;size=-1;mode=37777777777;a=ffffffff;m=ffffffff (0) -> 0 </pre> The settatr entry probably derives from my manual ownership change. Let's compare with the entries of one of the successfully created directories: dircreate-1 <pre> 09/08/09 16:49:43 127.0.0.1-0-0(0,1,2,3,4,6,10,) - mkdir dir 0002000000000000000010C0-000000000000 0000 name dircreate-1 : %GREEN%000200000000000002BCD9B8%ENDCOLOR% (0) -> 0 09/08/09 16:49:44 127.0.0.1-0-0(0,1,2,3,4,6,10,) - create dir 000000000000000000001040-0000000000000000 name .(pset)(%GREEN%000200000000000002BCD9B8%ENDCOLOR%)(attr)(0)(100775:4199:4001:4aa66f08:4aa66f08:4aa66f08) uid=0;gid= -1;size=-1;mode=100644;a=ffffffff;m=ffffffff;id=%GREEN%000200000000000002BCD9B8%ENDCOLOR%;;level=0;;line=100775:4199:4001: 4aa66f08:4aa66f08:4aa66f08; : 000200000000000002BCD9B9-0000001B00000000 (0) -> 0 09/08/09 16:49:44 127.0.0.1-0-0(0,1,2,3,4,6,10,) - create dir 000000000000000000001040-0000000000000000 name .(pset)(%GREEN%000200000000000002BCD9B8%ENDCOLOR%)(attr)(1)(100775:4199:4001:4aa66f08:4aa66f08:4aa66f08) uid=0;gid= -1;size=-1;mode=100644;a=ffffffff;m=ffffffff;id=%GREEN%000200000000000002BCD9B8%ENDCOLOR%;;level=1;;line=100775:4199:4001: 4aa66f08:4aa66f08:4aa66f08; : 000200000000000002BCD9B9-0000001B00000001 (0) -> 0 </pre> The *pset* lines seem to be exact copies of each other, except for the *level=X* part. In the case of the faulty "root owned" directory dircreate-9 bove, the pset line is missing completely. Just did a second sequence of the whole test. This time two directories ended up root owned. But both of them had pnfs IDs. So, it seems that these symptoms are not necessarily coupled (maybe the creation process dies at different places). Another test again created a single problematic directory with missing pnfs ID <pre> $ cat /pnfs/fs/usr/cms/local_tests/".(id)(dircreateC-5)" cat: /pnfs/fs/usr/cms/local_tests/.(id)(dircreateC-5): No such file or directory $ date Tue Sep 8 21:48:09 CEST 2009 </pre> Corresponding log entry <pre> 09/08/09 21:47:44 127.0.0.1-0-0(0,1,2,3,4,6,10,) - mkdir dir 0002000000000000000010C0-0000000000000000 name dircreateC-5 : 000200000000000002BCDFE0 (0) -> 0 </pre> Even though the name to ID resolution fails, the ID to name resolution (using the ID from the log) works: <pre> $ cat /pnfs/lcg.cscs.ch/cms/".(name)(000200000000000002BCDFE0)" dircreateC-5 </pre> This file showed the same behavior as the other such cases: In the morning of the next day, the ID was printed correctly when invoking the dot(id) command. The file still belonged to root.root. In the pnfs log file, there still only was the "mkdir" line that I could associate with this directory. No other line contained the matching pnfs ID. <pre> Wed Sep 9 09:26:44 CEST 2009 cat /pnfs/fs/usr/cms/local_tests/".(id)(dircreateC-5)" 000200000000000002BCDFE0 ls -ld /pnfs/fs/usr/cms/local_tests/dircreateC-5 drwxr-xr-x 1 root root 512 Sep 8 21:47 /pnfs/fs/usr/cms/local_tests/dircreateC-5 </pre> <!-- ---++ Readers' comments COMMENT{type="below"} -->
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r4
<
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r4 - 2009-09-09
-
DerekFeichtinger
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback