Tags:
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> %TOC% %ICON{arrowleft}% Go to [[CMSSiteLogXX][previous page]] / [[CMSSiteLogXX][next page]] of CMS site log %M% ---+ 04. 09. 2008 !PhEDEx problem with exports of a dataset to FZK *Summary:* <br> The dataset /QCD_Pt_80_120/CMSSW_1_6_7-JobRobot-1201523639/GEN-SIM-DIGI-RECO failed to get copied for several days from CSCS, DESY and RWTH to FZK. Original Mail from Armin Scheurer: %TWISTY{showlink="Show mail text" hidelink="Hide" showimgleft="%ICONURLPATH{toggleopen-small}%" hideimgleft="%ICONURLPATH{toggleclose-small}%"}% hi zusammen, ich hab eure email-adressen in der siteDB als phedex contact gefunden. ich hoffe, das ist alles noch aktuell. seit einigen tagen scheint es probleme beim transfer vor allem von einem speziellen datensatz ans FZK zu geben und zwar von allen sites. der datensatz wird vom DESY, CSCS und RWTH zum FZK gerouted und expired dann oder stirbt mit einem "agent lost the transfer". bei den expired transfers gibt es ja leider keine logs auf der phedex-seite und dementsprechend würde ich euch bitte, mal eure agent logs nach diesen transfers durchzuschauen. evtl. gibt es ja dort einen hinweis darauf, was schief läuft. wir versuchen nun schon seit tagen den datensatz zu kopieren. es sind auch nur 24 dateien. aber nichts tut sich. andere transfers von eurer seite allerdings liefen eigentlich ganz ordentlich durch, obwohl es auch bei denen einige "agent lost the transfer"-abbrüche gab. der datensatz ist folgender: /QCD_Pt_80_120/CMSSW_1_6_7-JobRobot-1201523639/GEN-SIM-DIGI-RECO der wurde "versehentlich" bei einer zentralen CERN löschaktion an allen T1 gelöscht. unglücklicherweise wird er von den SAM/JobRobot tests verwendet. ich bedanke mich auf jeden fall schon mal im voraus für eure hilfe. gruss, armin %ENDTWISTY% ---++ Collecting information ---+++ DBS <pre> find dataset where dataset like %QCD_Pt_80_120/CMSSW_1_6_7-JobRobot-1201523639% </pre> Then use the *plain* link to get the list of files. %TWISTY{showlink="Show query result" hidelink="Hide" showimgleft="%ICONURLPATH{toggleopen-small}%" hideimgleft="%ICONURLPATH{toggleclose-small}%"}% <pre> /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/027ED940-C8CD-DC11-A65C-000423D94D68.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/1872ED1D-C8CD-DC11-ACC1-000423D999AA.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/2A16A572-6DCF-DC11-BBB9-001617C3B6EC.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/2A1BB053-6DCF-DC11-A0CF-000423D655A2.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/3ED30F5F-6DCF-DC11-A28C-001617C3B5F6.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/4264615F-6DCF-DC11-83EC-000423D98658.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/44566191-CBCD-DC11-95A1-001617C3B71A.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/4ADE1A5C-C8CD-DC11-B683-001617C3B708.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/526B2C6A-C8CD-DC11-8E2A-000423D6A77C.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/5AD6F488-C8CD-DC11-8F65-000423D64922.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/703EFD48-C8CD-DC11-9E29-000423DCF0D8.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/7441D9A6-C8CD-DC11-839D-000423D6B1CC.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/864C305F-6DCF-DC11-B854-000423D30AF2.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/887A7DCE-C8CD-DC11-A7F5-003048563216.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/924E1A4B-C7CD-DC11-9289-000423D94E48.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/A8AF3E13-C8CD-DC11-A639-000423D6B1CC.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/AA0D6424-C8CD-DC11-B049-001617DBCF46.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/AA25E1AC-C8CD-DC11-9C0F-000423D998E6.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/C6F319C6-C8CD-DC11-967A-000423D94D68.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/CC5ABBA8-C8CD-DC11-9311-000423DCF0D8.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/D2DDC75E-C8CD-DC11-A9E2-001617DBCF94.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/D80D8D5E-C8CD-DC11-B558-000423D986B0.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/E8C4470A-C8CD-DC11-BA7C-001617E30D4C.root /store/mc/2008/1/28/JobRobot-QCD_Pt_80_120-1201523639/0034/EAC224A4-18CF-DC11-9445-0030487CF434.root </pre> %ENDTWISTY% ---+++ Are the files ok on dCache? The Trivial File Catalog rule for our dCache is just a prefix of =/pnfs/lcg.cscs.ch/cms/trivcat=, so mapping this to local filenames is just <pre> for n in `cat files.lst`; do echo /pnfs/lcg.cscs.ch/cms/trivcat$n; done > pnfs.lst </pre> Using our DcacheShellutils, I can see that the files look ok (will add this part later). ---+++ Using the !PhEDEx error query tools Let's look at all transfer errors saved in the central DB for transfers to FZK: <pre> /home/phedex/PHEDEX/Utilities/ErrorSiteQuery --db /home/phedex/config/DBParam.CSCS:Prod/CSCS --src "%CSCS%" -m 1000 -s "-48 hours" 2008-09-04 09:46:07: ErrorSiteQuery[923]: (re)connecting to database 2008-09-04 09:46:09: ErrorSiteQuery[923]: disconnected from database Results starting from date 1220348767 Tue Sep 2 11:46:07 2008 Number of results: 100 (of max 1000) **** from T2_CH_CSCS to T1_DE_FZK_Buffer: 100 agent lost the transfer </pre> Now we look at how many centers suffered from this error mode in the last 48 hours to get a more general picture: <pre> /home/phedex/PHEDEX/Utilities/ErrorQuery --db ~/config/DBParam.CSCS:Prod/CSCS -s "-48 hours" -e "%agent lost the transfer%" -x -m 1000 --sort dst 2008-09-04 10:07:13: ErrorQuery[2063]: (re)connecting to database 2008-09-04 10:07:19: ErrorQuery[2063]: disconnected from database #Number of results: 448 (of max 1000. Primary search retrieved 448) # #count src dst backend stech dtech fts channel nfiles 17 T1_US_FNAL_Buffer T1_CH_CERN_Buffer n.a. 11 castor n.a. n.a. n.a. 1 T1_FR_CCIN2P3_Buffer T1_DE_FZK_Buffer n.a. pnfs pnfs n.a. n.a. n.a. 100 T2_CH_CSCS T1_DE_FZK_Buffer n.a. pnfs pnfs n.a. n.a. n.a. 100 T2_DE_DESY T1_DE_FZK_Buffer n.a. pnfs pnfs n.a. n.a. n.a. 1 T2_CN_Beijing T1_DE_FZK_Buffer n.a. pnfs pnfs n.a. n.a. n.a. 2 T1_US_FNAL_Buffer T1_IT_CNAF_Buffer n.a. 11 castor n.a. n.a. n.a. 10 T1_CH_CERN_Buffer T1_US_FNAL_Buffer n.a. castor 11 n.a. n.a. n.a. 27 T2_DE_RWTH T1_US_FNAL_Buffer n.a. pnfs 11 n.a. n.a. n.a. 2 T0_CH_CERN_Export T1_US_FNAL_Buffer n.a. castor 11 n.a. n.a. n.a. 25 T1_US_FNAL_Buffer T2_CH_CAF n.a. 11 castor n.a. n.a. n.a. 59 T1_US_FNAL_Buffer T2_DE_DESY n.a. 11 pnfs n.a. n.a. n.a. 4 T1_CH_CERN_Buffer T2_US_Nebraska n.a. castor pnfs n.a. n.a. n.a. 100 T1_DE_FZK_Buffer T2_US_Nebraska n.a. pnfs pnfs n.a. n.a. n.a. </pre> The output of "n.a." in many columns tells me that FZK is using an old !PhEDEx version which does not record all data correctly in the central data base. This in itself may present a problem, since the update to PhEDEx 3.0.4 was compulsory. -- Main.DerekFeichtinger - 04 Sep 2008 ---------------- %ICON{arrowleft}% Go to [[CMSSiteLogXX][previous page]] / [[CMSSiteLogXX][next page]] of CMS site log %M%
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r6
|
r4
<
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r1 - 2008-09-04
-
DerekFeichtinger
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback