Tags:
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people * #Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+ !!CMS Site Log for PHOENIX Cluster %TOC% %ICON{arrowleft}% Go to [[CMSSiteLog3][previous page]] / [[CMSSiteLog5][next page]] of CMS site log %M% ---++ 10. 2. 2007 Cleaning of data sets from SE I cleaned several data sets from our DPM, since no more space was left (one shared pool from LHCb which we had opened to all had never been used. Seems to be a DPM problem). A number of production jobs had failed in writing to our storage, but the problem was only detected when the production team ran the merge jobs. The files exist in the nameserver of DPM, but there are no connected SFNs for them. They can be detected easily, since they show up as files with 0 bytes file length on commands like =edg-gridftp-ls --verbose= or =rfdir=. Gave this feedback to Alexander Flossdorf from DESY. ---++ 12. 2. 2007 Test of !PhEDEx 2_5_0_1 Tested Dev instance of new !PhEDEx distribution. Subscribed to =/phedex/monarctest_CERN/1= data set The files were routed from PIC and the FTS at FZK seemed to work fine. After half of the files had completed, there was again the typical error for many transfers. <verbatim> Failed Transfer failed. ERROR the server sent an error response: 425 425 Can't open data connection. timed out() failed. </verbatim> The !PhEDEx monitoring page showed, that T1_PIC transfers to a whole number of other sites also were problematic. But I found for every other site a different error. <br /> <img src="%ATTACHURLPATH%/cpu_report-20070212-1123.gif" alt="cpu_report-20070212-1123.gif" width='391' height='147' /> <br /> <img src="%ATTACHURLPATH%/network_report-20070212-1123.gif" alt="network_report-20070212-1123.gif" width='391' height='147' /> Since we are currently busy getting the new *dCache SE* up and our old DPM SE has not much space available, I just started the !PhEDEx Prod instance, but did not subscribe to new data right now. ---++ 20. 2. 2007 started Cycle-1 / Week-2 exercise Following the plan laid out in [[http://www.cnaf.infn.it/~dbonacorsi/LoadTest07.htm][Daniele Bonacorsi's pages]] I signed up T2_CSCS for a number of MONARC test samples in the *Dev* instance, after making sure that FZK had the same subscriptions (FZK had no subscriptions in Prod, so I chose Dev). After fixing a trivial error due to myproxy, we began to get the files at almost 30 MB/s from IN2P3. <br /> <img src="%ATTACHURLPATH%/cpu_report-20070220-1100.gif" alt="cpu_report-20070220-1100.gif" width='391' height='147' /> <br /> <img src="%ATTACHURLPATH%/network_report-20070220-1100.gif" alt="network_report-20070220-1100.gif" width='391' height='147' /> ---++ 21./22. 2. 2007 myproxy renewal problem / no source files from IN2P3 Due to a myproxy renewal problem on our side the transfers of last night failed. There have been multiple attempts to transfer from FZK. All the following routings picked IN2P3 as a source, even though the requested files had already been erased there (so there was an inconsistency in TMDB and what the center had on disk). The problem was corrected the following day. I cancelled all subscriptions in the Dev instance and subscribed to a new set of monarc test files in the Prod instance. The data sets were: * /phedex_prod/monarctest_CERN/6 * /phedex_prod/monarctest_CERN/7 * /phedex_prod/monarctest_CERN/8 Each has a size of ~484 GB. These graphs show load generated by transfers from FNAL: <img src="%ATTACHURLPATH%/cpu_report-20070222-1721.gif" alt="cpu_report-20070222-1721.gif" width='391' height='147' /> <br /> <img src="%ATTACHURLPATH%/network_report-20070222-1721.gif" alt="network_report-20070222-1721.gif" width='391' height='147' /> <br> In the evening transfers from FZK began to kick in with reasonable efficiency: <img src="%ATTACHURLPATH%/cpu_report-20070222-2100.gif" alt="cpu_report-20070222-2100.gif" width='391' height='147' /> <br /> <img src="%ATTACHURLPATH%/network_report-20070222-2100.gif" alt="network_report-20070222-2100.gif" width='391' height='147' /> ---++ 23. 2. 2007 Good transfers from FZK Finally very good throughput from FZK (considering that it still only goes to a single file server on our site!). The three test sets were completed around noon, so ~1.5 TB in 2 days with some routing problems. <img src="%ATTACHURLPATH%/cpu_report-20070223-0927.gif" alt="cpu_report-20070223-0927.gif" width='391' height='147' /> <br /> <img src="%ATTACHURLPATH%/network_report-20070223-0927.gif" alt="network_report-20070223-0927.gif" width='391' height='147' /> Based on the [[http://www.switch.ch/network/stat/weather/weathermap.html][SWITCH traffic weather map]] the traffic seems not to come via BelWue net, but rather via Lausanne and CERN. A traceroute yields exactly this route via geant2: <verbatim> traceroute to f01-110-106-e.gridka.de (192.108.45.151), 30 hops max, 38 byte packets 1 148.187.33.2 (148.187.33.2) 0.877 ms 0.438 ms 0.732 ms 2 148.187.32.4 (148.187.32.4) 0.728 ms 0.480 ms 0.770 ms 3 swima2.cscs.ch (148.187.20.2) 0.444 ms 0.697 ms 0.957 ms 4 swiEL2-10GE-1-4.switch.ch (130.59.37.77) 3.938 ms 3.420 ms 3.728 ms 5 swiCE3-10GE-1-3.switch.ch (130.59.37.65) 4.675 ms 4.520 ms 4.723 ms 6 swiCE2-10GE-1-4.switch.ch (130.59.36.209) 4.111 ms 4.759 ms 4.478 ms 7 switch.rt1.gen.ch.geant2.net (62.40.124.21) 4.525 ms 4.710 ms 4.371 ms 8 so-7-2-0.rt1.fra.de.geant2.net (62.40.112.22) 12.588 ms 12.851 ms 13.050 ms 9 dfn-gw.rt1.fra.de.geant2.net (62.40.124.34) 12.592 ms 19.913 ms 12.552 ms 10 xr-fzk1-te2-3.x-win.dfn.de (188.1.145.50) 15.632 ms 15.776 ms 15.342 ms 11 kr-fzk.x-win.dfn.de (188.1.38.222) 16.896 ms 16.719 ms 15.432 ms 12 * * * 13 f01-110-106-e.gridka.de (192.108.45.151) 17.772 ms 19.327 ms 20.521 ms </verbatim> <img src="%ATTACHURLPATH%/Switch-traffic-20070223mod.png" alt="Switch-traffic-20070223mod.png" width='707' height='487' /> ---++ 27. 2. 2007 Test with FZK MONARC sample Did a further test with FZK using a MONARC sample that had been prepared there: =/phedex_monarctest/FZK-DISK1/MonarcTest_FZK-DISK1=. Transfers started with high throughput ranging from 20-40 MB/s, but towards the end of the data set, the famous = 425 425 Can't open data connection= error repeatedly caused failure for one particular file (but the error was not connected to the file. Other sites had the same problem with other files). Based on a message from Doris Ressmann from FZK, these seem to be timeout issues due to overload of their dCache gridftp doors. %ICON{arrowleft}% Go to [[CMSSiteLog3][previous page]] / [[CMSSiteLog5][next page]] of CMS site log %M% -- Main.DerekFeichtinger - 28 Feb 2007
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
png
Switch-traffic-20070223mod.png
r1
manage
16.0 K
2007-02-23 - 14:33
DerekFeichtinger
gif
cpu_report-20070212-1123.gif
r1
manage
11.6 K
2007-02-12 - 10:39
DerekFeichtinger
gif
cpu_report-20070220-1100.gif
r1
manage
12.7 K
2007-02-20 - 09:59
DerekFeichtinger
gif
cpu_report-20070222-1721.gif
r1
manage
10.9 K
2007-02-22 - 16:19
DerekFeichtinger
gif
cpu_report-20070222-2100.gif
r1
manage
12.0 K
2007-02-22 - 20:00
DerekFeichtinger
gif
cpu_report-20070223-0927.gif
r1
manage
13.2 K
2007-02-23 - 08:49
DerekFeichtinger
gif
network_report-20070212-1123.gif
r1
manage
12.3 K
2007-02-12 - 10:39
DerekFeichtinger
gif
network_report-20070220-1100.gif
r1
manage
12.8 K
2007-02-20 - 10:00
DerekFeichtinger
gif
network_report-20070222-1721.gif
r1
manage
12.1 K
2007-02-22 - 16:20
DerekFeichtinger
gif
network_report-20070222-2100.gif
r1
manage
13.7 K
2007-02-22 - 20:00
DerekFeichtinger
gif
network_report-20070223-0927.gif
r1
manage
16.1 K
2007-02-23 - 08:49
DerekFeichtinger
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r12
<
r11
<
r10
<
r9
<
r8
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r12 - 2007-03-14
-
DerekFeichtinger
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback