IssueDcacheCorruptOSM

Symptoms

Summary: User cannot retrieve certain files which clearly exist on our fileservers and also are correctly listed in the pnfs namespace

Occurrences

At what times did this problem occur (used to estimate frequency):

2011-06-16

Observations

NOTE: We are running dcache golden release 1.9.5-16 with pnfs-postgresql-3.1.10-7 and postgresql-8.3.3

srmcp and dccp failures

Failure of an srmcp points to underlying reason
srmcp -2 -streams_num=1 -globus_tcp_port_range=20000,25000 -debug srm://t3se01.psi.ch:8443/srm/managerv2?SFN=//pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/NTupleProducer_41X_MC_97_1_iAx.root file:////tmp/derek-srmcp1

...
Fri Jun 17 12:08:47 CEST 2011: srmPrepareToGet update failed, status : SRM_FAILURE explanation= at Fri Jun 17 12:08:47 CEST 2011 state Failed :  at Fri Jun 17 12:08:42 CEST 2011 state Pending : created
RequestFileStatus#-2141693797 failed with error:[  at Fri Jun 17 12:08:43 CEST 2011 state Failed : file not found : OSM info not found in /pnfs/fs/.(access)(00020000000000000273F0C0)(type=--I--d-----)]

Fri Jun 17 12:08:47 CEST 2011: GetFileRequest[srm://t3se01.psi.ch:8443/srm/managerv2?SFN=//pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/NTupleProducer_41X_MC_97_1_iAx.root] status=SRM_INVALID_PATH explanation=Failed
Fri Jun 17 12:08:47 CEST 2011: java.io.IOException: srmPrepareToGet update failed, status : SRM_FAILURE explanation= at Fri Jun 17 12:08:47 CEST 2011 state Failed :  at Fri Jun 17 12:08:42 CEST 2011 state Pending : created
RequestFileStatus#-2141693797 failed with error:[  at Fri Jun 17 12:08:43 CEST 2011 state Failed : file not found : OSM info not found in /pnfs/fs/.(access)(00020000000000000273F0C0)(type=--I--d-----)]

Failure of dccp contains less useful output

dccp -d 15 -p 20000:25000 dcap://t3se01.psi.ch:22125//pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/NTupleProducer_41X_MC_97_1_iAx.root /tmp/derek-dccp

...
Sending control message: 2 0 client open "dcap://t3se01.psi.ch:22125//pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/NTupleProducer_41X_MC_97_1_iAx.root" r t3ui03.psi.ch 20000 -timeout=-1 -onerror=default -uid=3896
Server error message for [2]: "No such file or directory" (err code: 2, errno: (null)).
dcap_pool:  POLLIN on control line [3] id=1
Polling data for destination[4] queueID[2].
[-1] unpluging node
Removing unneeded queue [2]
[-1] destroing node
Failed open file in the dCache.
Can't open source file : "No such file or directory"
System error: Input/output error

Files have missing OSM info. Storageinfoof command fails

Listing of the storageinfo in the adminshell fails.

echo /pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/NTupleProducer_41X_MC_97_1_iAx.root|dc_get_ID_from_pnfsnamelist.sh |dc_get_storageinfo_from_IDlist.sh 

0002000000000000027647F0 storageinfoof failed : OSM info not found in /pnfs/fs/.(access)(00020000000000000273F0C0)(type=--I--d-----)

Note, that we can get other info on the file and that we can also confirm that the local file on the pool exists!

echo /pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/NTupleProducer_41X_MC_97_1_iAx.root|dc_get_ID_from_pnfsnamelist.sh |dc_get_cacheinfo_from_IDlist.sh 

0002000000000000027647F0 t3fs09_cms_1

Missing pnfs tags for directories

All files in the directory /pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6 I tested, failed to copy with the identical error. It turns out that the directory seems to have a corrupted tag for the OSMTemplate value. All other tags which should have been inherited also cannot be read:

[root@t3se01 store]# dir=/pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6; for n in $(cat $dir/'.(tags)()'); do r=$(cat $dir/$n); echo "$n=$r"; done
cat: /pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/.(tag)(OSMTemplate): No such file or directory
.(tag)(OSMTemplate)=
cat: /pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/.(tag)(sGroup): No such file or directory
.(tag)(sGroup)=
cat: /pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/.(tag)(AccessLatency): No such file or directory
.(tag)(AccessLatency)=
cat: /pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6/.(tag)(RetentionPolicy): No such file or directory
.(tag)(RetentionPolicy)=

Testing the directory above which is ok shows the correct values:

dir=/pnfs/psi.ch/cms/trivcat/store/user/susy/ntuples/mc/V02-01-01/; for n in $(cat $dir/'.(tags)()'); do r=$(cat $dir/$n); echo "$n=$r"; done
.(tag)(OSMTemplate)=StoreName cms
.(tag)(sGroup)=cms
.(tag)(AccessLatency)=ONLINE
.(tag)(RetentionPolicy)=OUTPUT

I ran a find over the complete namespace to locate all corrupted directories

 for dir in $(find . -type d); do info=$(ls -ld $dir); echo -n "$info    "; cat $dir/'.(tag)(OSMTemplate)'; done &> ~/OSMTemplate-corrupt-dirs-20110617-lst

Out of 13802 directories, the following 40 were found with erroneous OSMTemplate tag:

 grep -v StoreName ~/OSMTemplate-corrupt-dirs-20110617-lst | sed -e 's#cat: \(.*\)/\..*#\1#' | xargs -i ls -ld {}

drwxrwxr-x  1 cmsuser cms 512 Apr 14 16:04 ./user/pablom/TnP/ElectronsNew/data/mt2/JOB_PD_EG_PromptReco_Run2010B_TnP_ELE_JZB_ANTIPANIC_ANTIMULTITRG_DATA
drwxrwxr-x  1 cmsuser cms 512 Apr 14 15:51 ./user/pablom/TnP/ElectronsNew/data/mt2/JOB_PD_EG_17thSepReReco_Run2010A_TnP_ELE_JZB_ANTIPANIC_ANTIMULTITRG_DATA
drwxrwxr-x  1 cmsuser cms 512 Dec 16  2010 ./user/bortigno/WJets/FASTSIM_bug
drwxrwxr-x  1 cmsuser cms 512 Apr 21 20:22 ./user/susy/trash-me
drwxrwxr-x  1 cmsuser cms 512 Apr 15 15:02 ./user/susy/trash-me/QCD_TuneD6T_HT-500To1000_7TeV-madgraph
drwxrwxr-x  1 cmsuser cms 512 Apr 20 00:46 ./user/susy/trash-me/GJets_TuneD6T_HT-200_7TeV-madgraph
drwxrwxr-x  1 cmsuser cms 512 Apr 20 00:45 ./user/susy/trash-me/TToBLNu_TuneZ2_tW-channel_7TeV-madgraph
drwxrwxr-x  1 cmsuser cms 512 Apr 20 00:44 ./user/susy/trash-me/GJets_TuneD6T_HT-100To200_7TeV-madgraph
drwxrwxr-x  1 cmsuser cms 512 Apr 20 00:42 ./user/susy/trash-me/DYJetsToLL_TuneZ2_M-50_7TeV-madgraph-tauola
drwxrwxr-x  1 cmsuser cms 512 Apr 19 07:26 ./user/susy/trash-me/VVJetsTo4L_TuneD6T_7TeV-madgraph-tauola
drwxrwxr-x  1 cmsuser cms 512 Apr 15 15:19 ./user/susy/trash-me/LM12_SUSY_sftsht_7TeV-pythia6
drwxrwxr-x  1 cmsuser cms 512 Apr 15 15:29 ./user/susy/trash-me/LM13_SUSY_sftsht_7TeV-pythia6
drwxrwxr-x  1 cmsuser cms 512 Apr 19 07:24 ./user/susy/trash-me/TToBLNu_TuneZ2_s-channel_7TeV-madgraph
drwxrwxr-x  1 cmsuser cms 512 Apr 20 00:37 ./user/susy/trash-me/GJets_TuneD6T_HT-40To100_7TeV-madgraph
drwxrwxr-x  1 cmsuser cms 512 Apr 15 15:29 ./user/susy/trash-me/LM3_SUSY_sftsht_7TeV-pythia6
drwxrwxr-x  1 cmsuser cms 512 Apr 20 00:36 ./user/susy/trash-me/TToBLNu_TuneZ2_t-channel_7TeV-madgraph
drwxrwxr-x  1 cmsuser cms 512 Apr 15 15:29 ./user/susy/trash-me/LM4_SUSY_sftsht_7TeV-pythia6
drwxrwxr-x  1 cmsuser cms 512 Mar 22 16:45 ./user/susy/ntuples/mc/V01-11-02/GMSBLongLivedNeutralino_CTau-2000_7TeV-pythia6
drwxrwxr-x  1 cmsuser cms 512 Mar 19 17:46 ./user/susy/ntuples/mc/V01-11-02/WWtoAnything_TuneZ2_7TeV-pythia6-tauola
drwxrwxr-x  1 cmsuser cms 512 Mar 19 23:51 ./user/susy/ntuples/mc/V01-11-02/WZtoAnything_TuneZ2_7TeV-pythia6-tauola
drwxrwxr-x  1 cmsuser cms 512 Mar 20 00:30 ./user/susy/ntuples/mc/V01-11-02/ZZtoAnything_TuneZ2_7TeV-pythia6-tauola
drwxrwxr-x  1 cmsuser cms 512 Mar 19 19:44 ./user/susy/ntuples/mc/V01-11-02/WWTo2L2Nu_TuneZ2_7TeV-pythia6
drwxrwxr-x  1 cmsuser cms 512 May  1 12:07 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_5to15_TuneZ2_7TeV_pythia6
drwxrwxr-x  1 cmsuser cms 512 May  1 20:13 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_120to170_TuneZ2_7TeV_pythia6
drwxrwxr-x  1 cmsuser cms 512 May  1 12:06 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_300to470_TuneZ2_7TeV_pythia6
drwxrwxr-x  1 cmsuser cms 512 May  1 15:13 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_470to600_TuneZ2_7TeV_pythia6
drwxrwxr-x  1 cmsuser cms 512 May  1 02:17 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_30to50_TuneZ2_7TeV_pythia6
drwxrwxr-x  1 cmsuser cms 512 May  1 15:27 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_80to120_TuneZ2_7TeV_pythia6
drwxrwxr-x  1 cmsuser cms 512 May  1 23:45 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_600to800_TuneZ2_7TeV_pythia6
drwxrwxr-x  1 cmsuser cms 512 May  2 02:57 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_800to1000_TuneZ2_7TeV_pythia6
drwxrwxr-x  1 cmsuser cms 512 May  1 05:10 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_50to80_TuneZ2_7TeV_pythia6
drwxrwxr-x  1 cmsuser cms 512 May  1 01:18 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_1400to1800_TuneZ2_7TeV_pythia6
drwxrwxr-x  1 cmsuser cms 512 Apr 30 22:42 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_1000to1400_TuneZ2_7TeV_pythia6
drwxrwxr-x  1 cmsuser cms 512 Apr 30 23:12 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_170to300_TuneZ2_7TeV_pythia6
drwxrwxr-x  1 cmsuser cms 512 Apr 30 15:28 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_15to30_TuneZ2_7TeV_pythia6
drwxrwxr-x  1 cmsuser cms 512 May  1 03:32 ./user/susy/ntuples/mc/V02-01-01/QCD_Pt_1800_TuneZ2_7TeV_pythia6
drwxrwxr-x  1 cmsuser cms 512 Apr 30 13:39 ./user/susy/ntuples/data/V02-01-01/DoubleMu-Run2011A-PromptReco-v2-AOD
drwxrwxr-x  1 cmsuser cms 512 Apr 30 15:38 ./user/susy/ntuples/data/V02-01-01/MuEG-Run2011A-PromptReco-v1-AOD
drwxrwxr-x  1 cmsuser cms 512 Apr 30 15:15 ./user/susy/ntuples/data/V02-01-01/MuEG-Run2011A-PromptReco-v2-AOD
drwxrwxr-x  1 cmsuser cms 512 Apr 30 14:20 ./user/susy/ntuples/data/V02-01-01/DoubleMu-Run2011A-PromptReco-v1-AOD

Solution or Workaround

Monitoring for this condition

-- DerekFeichtinger - 2011-06-17

IssueForm
Affected Service dcache reading
Symptom summary User cannot retrieve certain files which clearly exist on our fileservers and also are correctly listed in the pnfs namespace
Reason Understood yes
Solution Exists no
Obsolete no
Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2011-06-17 - DerekFeichtinger
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback