Tags:
view all tags
<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups # * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup # * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup,Main.CMSAdminReaderGroup --> %INCLUDE{"ListNews" NEWSROWS="1"}% ---+!! Monitoring %CALC{"$SET(jobint,$IF($EXACT(%URLPARAM{"jobint"}%,),day,%URLPARAM{"jobint"}%))"}% %CALC{"$SET(freestorageint,$IF($EXACT(%URLPARAM{"freestorageint"}%,),week,%URLPARAM{"freestorageint"}%))"}% %CALC{"$SET(netwint,$IF($EXACT(%URLPARAM{"netwint"}%,),day,%URLPARAM{"netwint"}%))"}% %CALC{"$SET(zfs,$IF($EXACT(%URLPARAM{"zfs"}%,),day,%URLPARAM{"zfs"}%))"}% <a href="http://t3mon.psi.ch/ganglia/?c=PSI%20Tier3%20fileservers&m=&r=hour&s=descending&hc=4"><img src="http://t3mon.psi.ch/ganglia/PSIT3-custom/PSI%2520Tier3%2520fileservers-pie.png"/></a> <a href="http://t3mon.psi.ch/ganglia/?c=PSI%20Tier3%20services&m=&r=hour&s=descending&hc=4"><img src="http://t3mon.psi.ch/ganglia/PSIT3-custom/PSI%2520Tier3%2520services-pie.png"/></a> <a href="http://t3mon.psi.ch/ganglia/?c=PSI%20Tier3%20workers&m=&r=hour&s=descending&hc=4"><img src="http://t3mon.psi.ch/ganglia/PSIT3-custom/PSI%2520Tier3%2520workers-pie.png"/></a> %TOC% ---++ Batch jobs (queuing system) [[http://t3mon.psi.ch/ganglia/PSIT3-custom/qstat.txt][Current queue]] */* [[http://t3mon.psi.ch/ganglia/PSIT3-custom/accounting.txt][accounting]] Number of running and queued jobs: <form name="formJobint" action="%TOPICURL%?#Batch_jobs_queuing_system" method=GET> <select name="jobint" onchange="formJobint.submit()"> <option %CALC{"$IF($EXACT($GET(jobint),hour),selected,)"}%>hour</option> <option %CALC{"$IF($EXACT($GET(jobint),day),selected,)"}%>day</option> <option %CALC{"$IF($EXACT($GET(jobint),week),selected,)"}%>week</option> <option %CALC{"$IF($EXACT($GET(jobint),month),selected,)"}%>month</option> <option %CALC{"$IF($EXACT($GET(jobint),year),selected,)"}%>year</option> </select> <input type="hidden" name="freestorageint" value=%CALC{"$GET(freestorageint)"}%> <input type="hidden" name="netwint" value=%CALC{"$GET(netwint)"}%> </form> <img src="%GANGLIABASE%/PSIT3-custom/running-%CALC{"$GET(jobint)"}%.gif" /> <br><img src="%GANGLIABASE%/PSIT3-custom/waiting-%CALC{"$GET(jobint)"}%.gif" /> [[%GANGLIABASE%/?c=PSI%20Tier3%20workers&m=&r=day&s=descending&hc=4][Ganglia WN page]] <img src="%GANGLIABASE%/graph.php?g=load_report&z=medium&c=PSI%20Tier3%20workers&m=&r=%CALC{"$GET(jobint)"}%&s=descending&hc=4&st=now" /> <img src="%GANGLIABASE%/graph.php?g=cpu_report&z=medium&c=PSI%20Tier3%20workers&m=&r=%CALC{"$GET(jobint)"}%&s=descending&hc=4&st=now" /> ---++ Storage ---+++ =/pnfs= dir Show space graphs for <form name="formFreestorageint" action="%TOPICURL%?#Storage_Element" method=GET> <select name="freestorageint" onchange="formFreestorageint.submit()"> <option %CALC{"$IF($EXACT($GET(freestorageint),hour),selected,)"}%>hour</option> <option %CALC{"$IF($EXACT($GET(freestorageint),day),selected,)"}%>day</option> <option %CALC{"$IF($EXACT($GET(freestorageint),week),selected,)"}%>week</option> <option %CALC{"$IF($EXACT($GET(freestorageint),month),selected,)"}%>month</option> <option %CALC{"$IF($EXACT($GET(freestorageint),year),selected,)"}%>year</option> </select> <input type="hidden" name="jobint" value=%CALC{"$GET(jobint)"}%> <input type="hidden" name="netwint" value=%CALC{"$GET(netwint)"}%> </form> Links: * List all [[https://cmsweb.cern.ch/das/request?view=list&limit=500&instance=cms_dbs_prod_global&input=dataset+site%3DT3_CH_PSI][hosted datasets]] / [[https://cmsweb.cern.ch/das/request?view=plain&limit=500&instance=prod%2Fglobal&input=dataset+site%3DT3_CH_PSI][hosted datasets plaintext]] / [[https://cmsweb.cern.ch/phedex/prod/Request::View?type=any&nodes=T3_CH_PSI&state=any&.submit=Submit][requests]] */* [[https://cmsweb.cern.ch/phedex/prod/Reports::SiteUsage?node=T3_CH_PSI][accounting per phys. group]] * [[http://t3mon.psi.ch/ganglia/PSIT3-custom/v_pnfs_top_dirs.txt][/pnfs dirs ordered by size]] $ curl http://t3mon.psi.ch/ganglia/PSIT3-custom/v_pnfs_top_dirs.txt 2>/dev/null * [[http://t3mon.psi.ch/ganglia/PSIT3-custom/transfers.txt][/pnfs current files transfers]] * [[SearchingIntoSlashPNFS][/pnfs searching by the dc_find tool]] <!-- * show user/dataset [[http://t3mon.psi.ch/ganglia/PSIT3-custom/sespace.txt][space usage]] --> <img src="%GANGLIABASE%/PSIT3-custom/sespace-%CALC{"$GET(freestorageint)"}%.gif" /> <img src="%GANGLIABASE%/graph.php?c=PSI%20Tier3%20services&h=%SEHOST%&m=t3se01_free_cms&r=%CALC{"$GET(freestorageint)"}%&z=medium&jr=&js=&vl=GB&st=now" /> ---+++ =/pnfs= dir I/O queues * =regular= I/O queue movers = *dcap/gsidcap/LAN xrootd* movers (heavy random IO for *internal* analysis) ; MAX 100 %GREEN%ACTIVE%ENDCOLOR% movers per file server, others will get %ORANGE%QUEUED%ENDCOLOR% * =wan= I/O queue movers = *SRM/gridftp* movers (transfers of whole files also from outside) ; MAX 2 %GREEN%ACTIVE%ENDCOLOR% movers per file server, others will get %ORANGE%QUEUED%ENDCOLOR% * =xrootd= I/O queue movers = [[https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookXrootdService][WAN xrootd movers]] ; MAX 2 %GREEN%ACTIVE%ENDCOLOR% movers per file server, others will get %ORANGE%QUEUED%ENDCOLOR% * To check by CLI the I/O queues run from a UI =watch -n 1 -d lynx --dump --width=200 'http://t3dcachedb:2288/queueInfo'= e.g. if your jobs are not progressing it might be due to a file server with *too many queued movers* ; in this case you can inform by email the T3 users ( the T3 admins will get it too ) </br> %GREEN%ACTIVE%ENDCOLOR% movers: <img src="%GANGLIABASE%/graph.php?&c=PSI%20Tier3%20services&h=t3se01.psi.ch&m=t3se01_movers_regular_cms&r=%CALC{"$GET(netwint)"}%&z=medium&jr=&js=&st=now"/> <img src="%GANGLIABASE%/graph.php?&c=PSI%20Tier3%20services&h=t3se01.psi.ch&m=t3se01_movers_wan_cms&r=%CALC{"$GET(netwint)"}%&z=medium&jr=&js=&st=now"/> <img src="%GANGLIABASE%/graph.php?c=PSI%20Tier3%20services&h=t3se01.psi.ch&v=5&m=t3se01_movers_xrootd_cms&r=%CALC{"$GET(netwint)"}%&z=medium&jr=&js=&st=now"/> %ORANGE%QUEUED%ENDCOLOR% movers ( the associated I/O queue is exceeding the max amount of allowed %GREEN%ACTIVE%ENDCOLOR% movers ) : <img src="%GANGLIABASE%/graph.php?&c=PSI%20Tier3%20services&h=t3se01.psi.ch&m=t3se01_moversQ_regular_cms&r=%CALC{"$GET(netwint)"}%&z=medium&jr=&js=&st=now"/> <img src="%GANGLIABASE%/graph.php?&c=PSI%20Tier3%20services&h=t3se01.psi.ch&m=t3se01_moversQ_wan_cms&r=%CALC{"$GET(netwint)"}%&z=medium&jr=&js=&st=now"/> <img src="%GANGLIABASE%/graph.php?&c=PSI%20Tier3%20services&h=t3se01.psi.ch&v=0&m=t3se01_moversQ_xrootd_cms&r=%CALC{"$GET(netwint)"}%&z=medium&jr=&js=&st=now"/> %RED%PENDING%ENDCOLOR% requests (these are hanging file transfers, almost always an error state if they persist): <img src="%GANGLIABASE%/graph.php?&c=PSI%20Tier3%20services&h=t3se01.psi.ch&m=t3se01_pending_requests&r=%CALC{"$GET(netwint)"}%&z=medium&jr=&js=&st=now&z=medium"/> ---+++ =/mnt/t3nfs01/data01/{shome,swshare}= dirs %N% [[http://t3mon.psi.ch/ganglia/PSIT3-custom/space.report][User Space Report]] </br></br> <form name="formZFSint" action="%TOPICURL%?#ZFS" method=GET> <select name="zfs" onchange="formZFSint.submit()"> <option %CALC{"$IF($EXACT($GET(zfs),hour),selected,)"}%>hour</option> <option %CALC{"$IF($EXACT($GET(zfs),day),selected,)"}%>day</option> <option %CALC{"$IF($EXACT($GET(zfs),week),selected,)"}%>week</option> <option %CALC{"$IF($EXACT($GET(zfs),month),selected,)"}%>month</option> <option %CALC{"$IF($EXACT($GET(zfs),year),selected,)"}%>year</option> </select> </form> <img src="%GANGLIABASE%/graph.php?&c=PSI%20Tier3%20services&h=t3nfs01.psi.ch&v=5582083579608&m=t3nfs01_data01_available&r=%CALC{"$GET(zfs)"}%&z=medium&jr=&js=&st=now"/> <img src="%GANGLIABASE%/graph.php?&c=PSI%20Tier3%20services&h=t3nfs01.psi.ch&v=5582083579608&m=t3nfs01_data01_used&r=%CALC{"$GET(zfs)"}%&z=medium&jr=&js=&st=now"/> <img src="%GANGLIABASE%/graph.php?&c=PSI%20Tier3%20services&h=t3nfs01.psi.ch&v=5582083579608&m=t3nfs01_data01_shome_used&r=%CALC{"$GET(zfs)"}%&z=medium&jr=&js=&st=now"/> <img src="%GANGLIABASE%/graph.php?&c=PSI%20Tier3%20services&h=t3nfs01.psi.ch&v=5582083579608&m=t3nfs01_data01_swshare_used&r=%CALC{"$GET(zfs)"}%&z=medium&jr=&js=&st=now"/> ---++ Networking and File Transfers (+ PhEDEx) Links: * [[%GANGLIABASE%/?m=network_report&r=day&s=descending&c=PSI+Tier3+fileservers&h=&sh=1&hc=4][Ganglia fileserver page]] * [[https://fts3.cern.ch:8449/fts3/ftsmon/#/?vo=&source_se=&dest_se=srm:%2F%2Ft3se01.psi.ch&time_window=24][Last 24h FTS3 jobs induced by CRAB3]] %N% * [[https://twiki.cern.ch/twiki/bin/view/CMS/PhedexDraftDocumentation][PhEDEx Doc]] * [[http://t3mon.psi.ch/ganglia/PSIT3-custom/phedex-statistics.txt][PhEDEx local stats]] * [[http://t3serv001.mit.edu/~cmsprod/ConsistencyChecks/results.html?site=T3_CH_PSI][PhEDEx external stats]] _but check the timestamp of the report !_ * [[http://cmsweb.cern.ch/phedex/prod/Components::Links?from_filter=T3_CH_PSI&andor=or&to_filter=T3_CH_PSI&Update=Update#][PhEDEx links between T3_CH_PSI and other centers]] * [[https://cmsweb.cern.ch/phedex/prod/Request::View?type=delete&nodes=T3_CH_PSI&state=any&.submit=Submit][PhEDEx deletion request state]] * [[https://cmsweb.cern.ch/phedex/datasvc/xml/prod/transferrequests?node=T3_CH_PSI][PhEDEx transfer requests as XML]] * <pre>$ curl --capath /etc/grid-security/certificates -E $X509_USER_PROXY --cacert $X509_USER_PROXY https://cmsweb.cern.ch/phedex/datasvc/xml/prod/transferrequests?node=T3_CH_PSI 2>/dev/null | xmllint --format -</pre> * [[https://cmsweb.cern.ch/phedex/datasvc/xml/prod/blockreplicasummary?node=T3_CH_PSI][PhEDEx datasets as XML]] * <pre>$ curl --capath /etc/grid-security/certificates -E $X509_USER_PROXY --cacert $X509_USER_PROXY https://cmsweb.cern.ch/phedex/datasvc/xml/prod/blockreplicasummary?node=T3_CH_PSI 2>/dev/null | xmllint --format - </pre> * [[https://cmsweb.cern.ch/phedex/datasvc/json/prod/blockreplicasummary?node=T3_CH_PSI][PhEDEx datasets as JSON]] Plotting interval: <form name="formNetwint" action="%TOPICURL%?#Networking_and_File_Transfers_Ph" method=GET> <select name="netwint" onchange="formNetwint.submit()"> <option %CALC{"$IF($EXACT($GET(netwint),hour),selected,)"}%>hour</option> <option %CALC{"$IF($EXACT($GET(netwint),day),selected,)"}%>day</option> <option %CALC{"$IF($EXACT($GET(netwint),week),selected,)"}%>week</option> <option %CALC{"$IF($EXACT($GET(netwint),month),selected,)"}%>month</option> <option %CALC{"$IF($EXACT($GET(netwint),year),selected,)"}%>year</option> </select> <input type="hidden" name="jobint" value=%CALC{"$GET(jobint)"}%> <input type="hidden" name="freestorageint" value=%CALC{"$GET(freestorageint)"}%> </form> <br><img src="%GANGLIABASE%/graph.php?g=network_report&z=medium&c=PSI%20Tier3%20workers&m=&r=%CALC{"$GET(netwint)"}%&s=descending&hc=4&st=now" /> <img src="%GANGLIABASE%/graph.php?g=network_report&z=medium&c=PSI%20Tier3%20fileservers&m=&r=%CALC{"$GET(netwint)"}%&s=descending&hc=4&st=now" /> <img src="%GANGLIABASE%/graph.php?g=network_report&z=medium&c=PSI%20Tier3%20services&m=&r=%CALC{"$GET(netwint)"}%&s=descending&hc=4&st=now" /> ---++ Availability reports These tests are run by the centralized Grid monitoring services and they determine whether the T3 or the T2 are considered to be working correctly: * CMS Nagios : [[https://etf-cms-prod.cern.ch/etf/nagios/cgi-bin/status.cgi?host=t3se01.psi.ch][T3]] | Gstat : [[http://goc.grid.sinica.edu.tw/gstat/T3_CH_PSI/][T3]] ---++ Computer Room Temps *private link* </br> <img src=https://ganglia03.psi.ch/ganglia/graph.php?g=temperature_report&c=rztemp&h=T_18&r=&z=medium&st=now><img src=https://ganglia03.psi.ch/ganglia/graph.php?g=temperature_report&c=rztemp&h=T_19&r=&z=medium&st=now>
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r121
|
r97
<
r96
<
r95
<
r94
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r95 - 2016-07-03
-
FabioMartinelli
CmsTier3
Log In
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
User Pages
Main Page
Policies
Monitoring Storage Space
Monitoring Slurm Usage
Physics Groups
Steering Board Meetings
Admin Pages
AdminArea
Cluster Specs
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback