Tags:
view all tags
<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup --> ---+!! %TOPIC% %TOC% ---++ Symptoms Summary: %FORMFIELD{"Symptom summary"}% ---++ Occurrences At what times did this problem occur (used to estimate frequency): | 2011-06-16 | ---++ Observations <!-- #collect here the information which may help to better understand the state of the system or services, e.g. #log excerpts, strace output, etc. #this also may help to identify the problem if similar conditions arise again --> *NOTE:* We are running dcache golden release 1.9.5-16 with pnfs-postgresql-3.1.10-7 and postgresql-8.3.3 ---+++ srmcp and dccp failures Failure of an srmcp points to underlying reason <pre> srmcp -2 -streams_num=1 -globus_tcp_port_range=20000,25000 -debug srm://t3se01.psi.ch:8443/srm/managerv2?SFN=//pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/NTupleProducer_41X_MC_97_1_iAx.root file:////tmp/derek-srmcp1 ... Fri Jun 17 12:08:47 CEST 2011: srmPrepareToGet update failed, status : SRM_FAILURE explanation= at Fri Jun 17 12:08:47 CEST 2011 state Failed : at Fri Jun 17 12:08:42 CEST 2011 state Pending : created RequestFileStatus#-2141693797 failed with error:[ at Fri Jun 17 12:08:43 CEST 2011 state Failed : file not found : OSM info not found in /pnfs/fs/.(access)(00020000000000000273F0C0)(type=--I--d-----)] Fri Jun 17 12:08:47 CEST 2011: GetFileRequest[srm://t3se01.psi.ch:8443/srm/managerv2?SFN=//pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/NTupleProducer_41X_MC_97_1_iAx.root] status=SRM_INVALID_PATH explanation=Failed Fri Jun 17 12:08:47 CEST 2011: java.io.IOException: srmPrepareToGet update failed, status : SRM_FAILURE explanation= at Fri Jun 17 12:08:47 CEST 2011 state Failed : at Fri Jun 17 12:08:42 CEST 2011 state Pending : created RequestFileStatus#-2141693797 failed with error:[ at Fri Jun 17 12:08:43 CEST 2011 state Failed : file not found : OSM info not found in /pnfs/fs/.(access)(00020000000000000273F0C0)(type=--I--d-----)] </pre> Failure of dccp contains less useful output <pre> dccp -d 15 -p 20000:25000 dcap://t3se01.psi.ch:22125//pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/NTupleProducer_41X_MC_97_1_iAx.root /tmp/derek-dccp ... Sending control message: 2 0 client open "dcap://t3se01.psi.ch:22125//pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/NTupleProducer_41X_MC_97_1_iAx.root" r t3ui03.psi.ch 20000 -timeout=-1 -onerror=default -uid=3896 Server error message for [2]: "No such file or directory" (err code: 2, errno: (null)). dcap_pool: POLLIN on control line [3] id=1 Polling data for destination[4] queueID[2]. [-1] unpluging node Removing unneeded queue [2] [-1] destroing node Failed open file in the dCache. Can't open source file : "No such file or directory" System error: Input/output error </pre> ---+++ Files have missing OSM info. Storageinfoof command fails Listing of the storageinfo in the adminshell fails. <pre> echo /pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/NTupleProducer_41X_MC_97_1_iAx.root|dc_get_ID_from_pnfsnamelist.sh |dc_get_storageinfo_from_IDlist.sh 0002000000000000027647F0 storageinfoof failed : OSM info not found in /pnfs/fs/.(access)(00020000000000000273F0C0)(type=--I--d-----) </pre> Note, that we can get other info on the file and that we can also confirm that the local file on the pool exists! <pre> echo /pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/NTupleProducer_41X_MC_97_1_iAx.root|dc_get_ID_from_pnfsnamelist.sh |dc_get_cacheinfo_from_IDlist.sh 0002000000000000027647F0 t3fs09_cms_1 </pre> ---+++ Missing pnfs tags for directories All files in the directory =/pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6= I tested, failed to copy with the identical error. It turns out that the directory seems to have a corrupted tag for the OSMTemplate value. All other tags which should have been inherited also cannot be read: <pre> [root@t3se01 store]# dir=/pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6; for n in $(cat $dir/'.(tags)()'); do r=$(cat $dir/$n); echo "$n=$r"; done cat: /pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/.(tag)(OSMTemplate): No such file or directory .(tag)(OSMTemplate)= cat: /pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/.(tag)(sGroup): No such file or directory .(tag)(sGroup)= cat: /pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/.(tag)(AccessLatency): No such file or directory .(tag)(AccessLatency)= cat: /pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/.(tag)(RetentionPolicy): No such file or directory .(tag)(RetentionPolicy)= </pre> Testing the directory above which is ok shows the correct values: <pre> dir=/pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/; for n in $(cat $dir/'.(tags)()'); do r=$(cat $dir/$n); echo "$n=$r"; done .(tag)(OSMTemplate)=StoreName cms .(tag)(sGroup)=cms .(tag)(AccessLatency)=ONLINE .(tag)(RetentionPolicy)=OUTPUT </pre> I ran a find over the complete namespace to locate all corrupted directories <pre> for dir in $(find . -type d); do info=$(ls -ld $dir); echo -n "$info "; cat $dir/'.(tag)(OSMTemplate)'; done &> ~/OSMTemplate-corrupt-dirs-20110617-lst </pre> Out of 13802 directories, the following 40 were found with erroneous OSMTemplate tag: <pre> grep -v StoreName ~/OSMTemplate-corrupt-dirs-20110617-lst | sed -e 's#cat: \(.*\)/\..*#\1#' | xargs -i ls -ld {} drwxrwxr-x 1 cmsuser cms 512 Apr 14 16:04 ./user/pablom/TnP/ElectronsNew/data/mt2/JOB_PD_EG_PromptReco_Run2010B_TnP_ELE_JZB_ANTIPANIC_ANTIMULTITRG_DATA drwxrwxr-x 1 cmsuser cms 512 Apr 14 15:51 ./user/pablom/TnP/ElectronsNew/data/mt2/JOB_PD_EG_17thSepReReco_Run2010A_TnP_ELE_JZB_ANTIPANIC_ANTIMULTITRG_DATA drwxrwxr-x 1 cmsuser cms 512 Dec 16 2010 ./user/bortigno/WJets/FASTSIM_bug drwxrwxr-x 1 cmsuser cms 512 Apr 21 20:22 ./user/susy/trash-me drwxrwxr-x 1 cmsuser cms 512 Apr 15 15:02 ./user/susy/trash-me/QCD_TuneD6T_HT-500To1000_7TeV-madgraph drwxrwxr-x 1 cmsuser cms 512 Apr 20 00:46 ./user/susy/trash-me/GJets_TuneD6T_HT-200_7TeV-madgraph drwxrwxr-x 1 cmsuser cms 512 Apr 20 00:45 ./user/susy/trash-me/TToBLNu_TuneZ2_tW-channel_7TeV-madgraph drwxrwxr-x 1 cmsuser cms 512 Apr 20 00:44 ./user/susy/trash-me/GJets_TuneD6T_HT-100To200_7TeV-madgraph drwxrwxr-x 1 cmsuser cms 512 Apr 20 00:42 ./user/susy/trash-me/DYJetsToLL_TuneZ2_M-50_7TeV-madgraph-tauola drwxrwxr-x 1 cmsuser cms 512 Apr 19 07:26 ./user/susy/trash-me/VVJetsTo4L_TuneD6T_7TeV-madgraph-tauola drwxrwxr-x 1 cmsuser cms 512 Apr 15 15:19 ./user/susy/trash-me/LM12_SUSY_sftsht_7TeV-pythia6 drwxrwxr-x 1 cmsuser cms 512 Apr 15 15:29 ./user/susy/trash-me/LM13_SUSY_sftsht_7TeV-pythia6 drwxrwxr-x 1 cmsuser cms 512 Apr 19 07:24 ./user/susy/trash-me/TToBLNu_TuneZ2_s-channel_7TeV-madgraph drwxrwxr-x 1 cmsuser cms 512 Apr 20 00:37 ./user/susy/trash-me/GJets_TuneD6T_HT-40To100_7TeV-madgraph drwxrwxr-x 1 cmsuser cms 512 Apr 15 15:29 ./user/susy/trash-me/LM3_SUSY_sftsht_7TeV-pythia6 drwxrwxr-x 1 cmsuser cms 512 Apr 20 00:36 ./user/susy/trash-me/TToBLNu_TuneZ2_t-channel_7TeV-madgraph drwxrwxr-x 1 cmsuser cms 512 Apr 15 15:29 ./user/susy/trash-me/LM4_SUSY_sftsht_7TeV-pythia6 drwxrwxr-x 1 cmsuser cms 512 Mar 22 16:45 ./user/susy/ntuples/mc/V01-11-02/GMSBLongLivedNeutralino_CTau-2000_7TeV-pythia6 drwxrwxr-x 1 cmsuser cms 512 Mar 19 17:46 ./user/susy/ntuples/mc/V01-11-02/WWtoAnything_TuneZ2_7TeV-pythia6-tauola drwxrwxr-x 1 cmsuser cms 512 Mar 19 23:51 ./user/susy/ntuples/mc/V01-11-02/WZtoAnything_TuneZ2_7TeV-pythia6-tauola drwxrwxr-x 1 cmsuser cms 512 Mar 20 00:30 ./user/susy/ntuples/mc/V01-11-02/ZZtoAnything_TuneZ2_7TeV-pythia6-tauola drwxrwxr-x 1 cmsuser cms 512 Mar 19 19:44 ./user/susy/ntuples/mc/V01-11-02/WWTo2L2Nu_TuneZ2_7TeV-pythia6 drwxrwxr-x 1 cmsuser cms 512 May 1 12:07 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_5to15_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 May 1 20:13 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_120to170_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 May 1 12:06 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 May 1 15:13 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_470to600_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 May 1 02:17 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_30to50_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 May 1 15:27 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_80to120_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 May 1 23:45 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_600to800_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 May 2 02:57 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_800to1000_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 May 1 05:10 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_50to80_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 May 1 01:18 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_1400to1800_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 Apr 30 22:42 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_1000to1400_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 Apr 30 23:12 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_170to300_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 Apr 30 15:28 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_15to30_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 May 1 03:32 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_1800_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 Apr 30 13:39 ./user/susy/ntuples/data/V02-01-01/DoubleMu-Run2011A-PromptReco-v2-AOD drwxrwxr-x 1 cmsuser cms 512 Apr 30 15:38 ./user/susy/ntuples/data/V02-01-01/MuEG-Run2011A-PromptReco-v1-AOD drwxrwxr-x 1 cmsuser cms 512 Apr 30 15:15 ./user/susy/ntuples/data/V02-01-01/MuEG-Run2011A-PromptReco-v2-AOD drwxrwxr-x 1 cmsuser cms 512 Apr 30 14:20 ./user/susy/ntuples/data/V02-01-01/DoubleMu-Run2011A-PromptReco-v1-AOD </pre> ---++ Solution or Workaround We got a fast reply from Patrick Fuhrmann through dcache support. He proposed the following way to fix the problem, which turned out to work fine with the one file I tested with. Great! 1 please create a ./user/susy/trash-me-new <pre> cd /pnfs/psi.ch/cms/trivcat/store/user/susy mkdir trash-me-ne </pre> 1 make sure the new directory resides on the same database as the old (first 4 digits of the pnfs id must be identical. This is trivially true for our installation here at PSI, since we only use one DB for all CMS) <pre> cat ".(id)(trash-me)" 00020000000000000244AAA0 cat ".(id)(trash-me-new)" 000200000000000002F47FE0 </pre> 1 if not already the case, adjust the tags in trash-me-new so that they are identical to trash-me and adjust the permissions of trash-me-new. * I cannot read the tags on the old directory, but the trash-me-new directory has the tags I would expect it to have <pre> dir=trash-me-new; for n in $(cat $dir/'.(tags)()'); do r=$(cat $dir/$n); echo "$n=$r"; done .(tag)(OSMTemplate)=StoreName cms .(tag)(sGroup)=cms .(tag)(AccessLatency)=ONLINE .(tag)(RetentionPolicy)=OUTPUT dir=trash-me; for n in $(cat $dir/'.(tags)()'); do r=$(cat $dir/$n); echo "$n=$r"; done cat: trash-me/.(tag)(OSMTemplate): No such file or directory .(tag)(OSMTemplate)= cat: trash-me/.(tag)(sGroup): No such file or directory .(tag)(sGroup)= cat: trash-me/.(tag)(AccessLatency): No such file or directory .(tag)(AccessLatency)= cat: trash-me/.(tag)(RetentionPolicy): No such file or directory .(tag)(RetentionPolicy)= </pre> 1 try to move (mv) a file from trash-me to trash-me-new. * confirming first that I indeed fail to copy the old file <pre> srmcp -2 -streams_num=1 -globus_tcp_port_range=20000,25000 -debug srm://t3se01.psi.ch:8443/srm/managerv2?SFN=//pnfs/psi.ch/cms/trivcat/store/user/susy/trash-me/NTupleProducer_41X_MC_39_1_rMW.root file:////tmp/derek-srmcp1 ... RequestFileStatus#-2141685188 failed with error:[ at Fri Jun 17 15:50:41 CEST 2011 state Failed : file not found : OSM info not found in /pnfs/fs/.(access)(00020000000000000244AAA0)(type=--I--d-----)] Fri Jun 17 15:50:45 CEST 2011: stopping copier going to stop.... Exception in thread "main" java.lang.Exception: stopped at gov.fnal.srm.util.Copier.run(Copier.java:244) at java.lang.Thread.run(Thread.java:636) Fri Jun 17 15:50:45 CEST 2011: stopping copier </pre> * moving the file in pnfs<pre> mv trash-me/NTupleProducer_41X_MC_39_1_rMW.root trash-me-new/ </pre> * trying to read the storageinfo of the moved file from the admin shell %Y%<pre> echo /pnfs/psi.ch/cms/trivcat/store/user/susy/trash-me-new/NTupleProducer_41X_MC_39_1_rMW.root | dc_get_ID_from_pnfsnamelist.sh | dc_get_storageinfo_from_IDlist.sh 00020000000000000244AB88 size=83155077;new=false;stored=false;sClass=cms:cms;cClass=-;hsm=osm;accessLatency=ONLINE;retentionPolicy=OUTPUT;flag-h=no;flag-rp=OUTPUT;flag-c=1:c8c5f6eb;StoreName=cms;flag-al=ONLINE;flag-l=83155077;store=cms;group=cms;bfid=<Unknown>; </pre> 1 try to read the file by srmcp<pre> srmcp -2 -streams_num=1 -globus_tcp_port_range=20000,25000 -debug srm://t3se01.psi.ch:8443/srm/managerv2?SFN=//pnfs/psi.ch/cms/trivcat/store/user/susy/trash-me-new/NTupleProducer_41X_MC_39_1_rMW.root file:////tmp/derek-srmcp1 </pre> *WORKS!!!!!* :-) Now we need to do this in an automated way for all the problematic files/directories. ---++ Monitoring for this condition <!-- #how can this condition be recognized automatically, if at all? --> -- Main.DerekFeichtinger - 2011-06-17
IssueForm
Affected Service
dcache reading
Symptom summary
User cannot retrieve certain files which clearly exist on our fileservers and also are correctly listed in the pnfs namespace
Reason Understood
yes
Solution Exists
yes
Obsolete
no
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r2 - 2011-06-17
-
DerekFeichtinger
CmsTier3
Log In
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
User Pages
Main Page
Policies
Monitoring Storage Space
Monitoring Slurm Usage
Physics Groups
Steering Board Meetings
Admin Pages
AdminArea
Cluster Specs
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback