Tags:
view all tags
<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup --> ---+!! %TOPIC% %TOC% ---++ Symptoms Summary: %FORMFIELD{"Symptom summary"}% ---++ Occurrences At what times did this problem occur (used to estimate frequency): | 2011-06-16 | ---++ Observations <!-- #collect here the information which may help to better understand the state of the system or services, e.g. #log excerpts, strace output, etc. #this also may help to identify the problem if similar conditions arise again --> *NOTE:* We are running dcache golden release 1.9.5-16 with pnfs-postgresql-3.1.10-7 and postgresql-8.3.3 ---+++ srmcp and dccp failures Failure of an srmcp points to underlying reason <pre> srmcp -2 -streams_num=1 -globus_tcp_port_range=20000,25000 -debug srm://t3se01.psi.ch:8443/srm/managerv2?SFN=//pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/NTupleProducer_41X_MC_97_1_iAx.root file:////tmp/derek-srmcp1 ... Fri Jun 17 12:08:47 CEST 2011: srmPrepareToGet update failed, status : SRM_FAILURE explanation= at Fri Jun 17 12:08:47 CEST 2011 state Failed : at Fri Jun 17 12:08:42 CEST 2011 state Pending : created RequestFileStatus#-2141693797 failed with error:[ at Fri Jun 17 12:08:43 CEST 2011 state Failed : file not found : OSM info not found in /pnfs/fs/.(access)(00020000000000000273F0C0)(type=--I--d-----)] Fri Jun 17 12:08:47 CEST 2011: GetFileRequest[srm://t3se01.psi.ch:8443/srm/managerv2?SFN=//pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/NTupleProducer_41X_MC_97_1_iAx.root] status=SRM_INVALID_PATH explanation=Failed Fri Jun 17 12:08:47 CEST 2011: java.io.IOException: srmPrepareToGet update failed, status : SRM_FAILURE explanation= at Fri Jun 17 12:08:47 CEST 2011 state Failed : at Fri Jun 17 12:08:42 CEST 2011 state Pending : created RequestFileStatus#-2141693797 failed with error:[ at Fri Jun 17 12:08:43 CEST 2011 state Failed : file not found : OSM info not found in /pnfs/fs/.(access)(00020000000000000273F0C0)(type=--I--d-----)] </pre> Failure of dccp contains less useful output <pre> dccp -d 15 -p 20000:25000 dcap://t3se01.psi.ch:22125//pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/NTupleProducer_41X_MC_97_1_iAx.root /tmp/derek-dccp ... Sending control message: 2 0 client open "dcap://t3se01.psi.ch:22125//pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/NTupleProducer_41X_MC_97_1_iAx.root" r t3ui03.psi.ch 20000 -timeout=-1 -onerror=default -uid=3896 Server error message for [2]: "No such file or directory" (err code: 2, errno: (null)). dcap_pool: POLLIN on control line [3] id=1 Polling data for destination[4] queueID[2]. [-1] unpluging node Removing unneeded queue [2] [-1] destroing node Failed open file in the dCache. Can't open source file : "No such file or directory" System error: Input/output error </pre> ---+++ Files have missing OSM info. Storageinfoof command fails Listing of the storageinfo in the adminshell fails. <pre> echo /pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/NTupleProducer_41X_MC_97_1_iAx.root|dc_get_ID_from_pnfsnamelist.sh |dc_get_storageinfo_from_IDlist.sh 0002000000000000027647F0 storageinfoof failed : OSM info not found in /pnfs/fs/.(access)(00020000000000000273F0C0)(type=--I--d-----) </pre> Note, that we can get other info on the file and that we can also confirm that the local file on the pool exists! <pre> echo /pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/NTupleProducer_41X_MC_97_1_iAx.root|dc_get_ID_from_pnfsnamelist.sh |dc_get_cacheinfo_from_IDlist.sh 0002000000000000027647F0 t3fs09_cms_1 </pre> ---+++ Missing pnfs tags for directories All files in the directory =/pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6= I tested, failed to copy with the identical error. It turns out that the directory seems to have a corrupted tag for the OSMTemplate value. All other tags which should have been inherited also cannot be read: <pre> [root@t3se01 store]# dir=/pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6; for n in $(cat $dir/'.(tags)()'); do r=$(cat $dir/$n); echo "$n=$r"; done cat: /pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/.(tag)(OSMTemplate): No such file or directory .(tag)(OSMTemplate)= cat: /pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/.(tag)(sGroup): No such file or directory .(tag)(sGroup)= cat: /pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/.(tag)(AccessLatency): No such file or directory .(tag)(AccessLatency)= cat: /pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/.(tag)(RetentionPolicy): No such file or directory .(tag)(RetentionPolicy)= </pre> Testing the directory above which is ok shows the correct values: <pre> dir=/pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/; for n in $(cat $dir/'.(tags)()'); do r=$(cat $dir/$n); echo "$n=$r"; done .(tag)(OSMTemplate)=StoreName cms .(tag)(sGroup)=cms .(tag)(AccessLatency)=ONLINE .(tag)(RetentionPolicy)=OUTPUT </pre> I ran a find over the complete namespace to locate all corrupted directories <pre> for dir in $(find . -type d); do info=$(ls -ld $dir); echo -n "$info "; cat $dir/'.(tag)(OSMTemplate)'; done &> ~/OSMTemplate-corrupt-dirs-20110617-lst </pre> Out of 13802 directories, the following 40 were found with erroneous OSMTemplate tag: <pre> grep -v StoreName ~/OSMTemplate-corrupt-dirs-20110617-lst | sed -e 's#cat: \(.*\)/\..*#\1#' | xargs -i ls -ld {} drwxrwxr-x 1 cmsuser cms 512 Apr 14 16:04 ./user/pablom/TnP/ElectronsNew/data/mt2/JOB_PD_EG_PromptReco_Run2010B_TnP_ELE_JZB_ANTIPANIC_ANTIMULTITRG_DATA drwxrwxr-x 1 cmsuser cms 512 Apr 14 15:51 ./user/pablom/TnP/ElectronsNew/data/mt2/JOB_PD_EG_17thSepReReco_Run2010A_TnP_ELE_JZB_ANTIPANIC_ANTIMULTITRG_DATA drwxrwxr-x 1 cmsuser cms 512 Dec 16 2010 ./user/bortigno/WJets/FASTSIM_bug drwxrwxr-x 1 cmsuser cms 512 Apr 21 20:22 ./user/susy/trash-me drwxrwxr-x 1 cmsuser cms 512 Apr 15 15:02 ./user/susy/trash-me/QCD_TuneD6T_HT-500To1000_7TeV-madgraph drwxrwxr-x 1 cmsuser cms 512 Apr 20 00:46 ./user/susy/trash-me/GJets_TuneD6T_HT-200_7TeV-madgraph drwxrwxr-x 1 cmsuser cms 512 Apr 20 00:45 ./user/susy/trash-me/TToBLNu_TuneZ2_tW-channel_7TeV-madgraph drwxrwxr-x 1 cmsuser cms 512 Apr 20 00:44 ./user/susy/trash-me/GJets_TuneD6T_HT-100To200_7TeV-madgraph drwxrwxr-x 1 cmsuser cms 512 Apr 20 00:42 ./user/susy/trash-me/DYJetsToLL_TuneZ2_M-50_7TeV-madgraph-tauola drwxrwxr-x 1 cmsuser cms 512 Apr 19 07:26 ./user/susy/trash-me/VVJetsTo4L_TuneD6T_7TeV-madgraph-tauola drwxrwxr-x 1 cmsuser cms 512 Apr 15 15:19 ./user/susy/trash-me/LM12_SUSY_sftsht_7TeV-pythia6 drwxrwxr-x 1 cmsuser cms 512 Apr 15 15:29 ./user/susy/trash-me/LM13_SUSY_sftsht_7TeV-pythia6 drwxrwxr-x 1 cmsuser cms 512 Apr 19 07:24 ./user/susy/trash-me/TToBLNu_TuneZ2_s-channel_7TeV-madgraph drwxrwxr-x 1 cmsuser cms 512 Apr 20 00:37 ./user/susy/trash-me/GJets_TuneD6T_HT-40To100_7TeV-madgraph drwxrwxr-x 1 cmsuser cms 512 Apr 15 15:29 ./user/susy/trash-me/LM3_SUSY_sftsht_7TeV-pythia6 drwxrwxr-x 1 cmsuser cms 512 Apr 20 00:36 ./user/susy/trash-me/TToBLNu_TuneZ2_t-channel_7TeV-madgraph drwxrwxr-x 1 cmsuser cms 512 Apr 15 15:29 ./user/susy/trash-me/LM4_SUSY_sftsht_7TeV-pythia6 drwxrwxr-x 1 cmsuser cms 512 Mar 22 16:45 ./user/susy/ntuples/mc/V01-11-02/GMSBLongLivedNeutralino_CTau-2000_7TeV-pythia6 drwxrwxr-x 1 cmsuser cms 512 Mar 19 17:46 ./user/susy/ntuples/mc/V01-11-02/WWtoAnything_TuneZ2_7TeV-pythia6-tauola drwxrwxr-x 1 cmsuser cms 512 Mar 19 23:51 ./user/susy/ntuples/mc/V01-11-02/WZtoAnything_TuneZ2_7TeV-pythia6-tauola drwxrwxr-x 1 cmsuser cms 512 Mar 20 00:30 ./user/susy/ntuples/mc/V01-11-02/ZZtoAnything_TuneZ2_7TeV-pythia6-tauola drwxrwxr-x 1 cmsuser cms 512 Mar 19 19:44 ./user/susy/ntuples/mc/V01-11-02/WWTo2L2Nu_TuneZ2_7TeV-pythia6 drwxrwxr-x 1 cmsuser cms 512 May 1 12:07 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_5to15_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 May 1 20:13 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_120to170_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 May 1 12:06 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 May 1 15:13 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_470to600_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 May 1 02:17 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_30to50_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 May 1 15:27 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_80to120_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 May 1 23:45 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_600to800_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 May 2 02:57 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_800to1000_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 May 1 05:10 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_50to80_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 May 1 01:18 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_1400to1800_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 Apr 30 22:42 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_1000to1400_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 Apr 30 23:12 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_170to300_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 Apr 30 15:28 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_15to30_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 May 1 03:32 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_1800_TuneZ2_7TeV_pythia6 drwxrwxr-x 1 cmsuser cms 512 Apr 30 13:39 ./user/susy/ntuples/data/V02-01-01/DoubleMu-Run2011A-PromptReco-v2-AOD drwxrwxr-x 1 cmsuser cms 512 Apr 30 15:38 ./user/susy/ntuples/data/V02-01-01/MuEG-Run2011A-PromptReco-v1-AOD drwxrwxr-x 1 cmsuser cms 512 Apr 30 15:15 ./user/susy/ntuples/data/V02-01-01/MuEG-Run2011A-PromptReco-v2-AOD drwxrwxr-x 1 cmsuser cms 512 Apr 30 14:20 ./user/susy/ntuples/data/V02-01-01/DoubleMu-Run2011A-PromptReco-v1-AOD </pre> ---++ Solution or Workaround ---++ Monitoring for this condition <!-- #how can this condition be recognized automatically, if at all? --> -- Main.DerekFeichtinger - 2011-06-17
IssueForm
Affected Service
dcache reading
Symptom summary
User cannot retrieve certain files which clearly exist on our fileservers and also are correctly listed in the pnfs namespace
Reason Understood
yes
Solution Exists
no
Obsolete
no
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r1 - 2011-06-17
-
DerekFeichtinger
CmsTier3
Log In
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
User Pages
Main Page
Policies
Monitoring Storage Space
Monitoring Slurm Usage
Physics Groups
Steering Board Meetings
Admin Pages
AdminArea
Cluster Specs
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback