<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups # * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup # * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup,Main.CMSAdminReaderGroup --> %INCLUDE{"ListNews" NEWSROWS="1"}% ---+!! Monitoring %CALC{"$SET(jobint,$IF($EXACT(%URLPARAM{"jobint"}%,),day,%URLPARAM{"jobint"}%))"}% %CALC{"$SET(freestorageint,$IF($EXACT(%URLPARAM{"freestorageint"}%,),day,%URLPARAM{"freestorageint"}%))"}% %CALC{"$SET(netwint,$IF($EXACT(%URLPARAM{"netwint"}%,),day,%URLPARAM{"netwint"}%))"}% %CALC{"$SET(zfs,$IF($EXACT(%URLPARAM{"zfs"}%,),day,%URLPARAM{"zfs"}%))"}% [[http://t3mon.psi.ch/?c=PSI%20Tier3%20fileservers&m=&r=hour&s=descending&hc=4][Overview of PSI Tier3 fileservers]] <!--<img src="http://t3mon.psi.ch/PSIT3-custom/PSI%2520Tier3%2520fileservers-pie.png"/></a> --> [[http://t3mon.psi.ch/?c=PSI%20Tier3%20services&m=&r=hour&s=descending&hc=4][Overview of PSI Tier3 services]] <!--<img src="http://t3mon.psi.ch/PSIT3-custom/PSI%2520Tier3%2520services-pie.png"/></a> --> [[http://t3mon.psi.ch/?c=PSI%20Tier3%20workers&m=&r=hour&s=descending&hc=4][Overview of PSI Tier3 workers]] <!--<img src="http://t3mon.psi.ch/PSIT3-custom/PSI%2520Tier3%2520workers-pie.png"/></a>--> %TOC% ---++ Batch jobs (queuing system) [[http://t3mon.psi.ch/PSIT3-custom/qstat.txt][Current queue]] */* [[http://t3mon.psi.ch/PSIT3-custom/accounting.txt][accounting]] Number of running and queued jobs: <form name="formJobint" action="%TOPICURL%?#Batch_jobs_queuing_system" method=GET><select name="jobint" onchange="formJobint.submit()"> <option %CALC{"$IF($EXACT($GET(jobint),hour),selected,)"}%>hour</option> <option %CALC{"$IF($EXACT($GET(jobint),day),selected,)"}%>day</option> <option %CALC{"$IF($EXACT($GET(jobint),week),selected,)"}%>week</option> <option %CALC{"$IF($EXACT($GET(jobint),month),selected,)"}%>month</option> <option %CALC{"$IF($EXACT($GET(jobint),year),selected,)"}%>year</option> </select> <input type="hidden" name="freestorageint" value=%CALC{"$GET(freestorageint)"}%> <input type="hidden" name="netwint" value=%CALC{"$GET(netwint)"}%> </form> <img src="%GANGLIABASE%/PSIT3-custom/running-%CALC{"$GET(jobint)"}%.gif" /> <br><img src="%GANGLIABASE%/PSIT3-custom/waiting-%CALC{"$GET(jobint)"}%.gif" /> [[%GANGLIABASE%/?c=PSI%20Tier3%20workers&m=&r=day&s=descending&hc=4][Ganglia WN page]] <img src="%GANGLIABASE%/graph.php?g=load_report&z=medium&c=PSI%20Tier3%20workers&m=&r=%CALC{"$GET(jobint)"}%&s=descending&hc=4&st=now" /> <img src="%GANGLIABASE%/graph.php?g=cpu_report&z=medium&c=PSI%20Tier3%20workers&m=&r=%CALC{"$GET(jobint)"}%&s=descending&hc=4&st=now" /> ---++ Storage <!-- Show space graphs for <form name="formFreestorageint" action="%TOPICURL%?#Storage_Element" method=GET> <select name="freestorageint" onchange="formFreestorageint.submit()"> <option %CALC{"$IF($EXACT($GET(freestorageint),hour),selected,)"}%>hour</option> <option %CALC{"$IF($EXACT($GET(freestorageint),day),selected,)"}%>day</option> <option %CALC{"$IF($EXACT($GET(freestorageint),week),selected,)"}%>week</option> <option %CALC{"$IF($EXACT($GET(freestorageint),month),selected,)"}%>month</option> <option %CALC{"$IF($EXACT($GET(freestorageint),year),selected,)"}%>year</option> </select> <input type="hidden" name="jobint" value=%CALC{"$GET(jobint)"}%> <input type="hidden" name="netwint" value=%CALC{"$GET(netwint)"}%> </form> --> ---+++ =/pnfs= dir Links: <!-- * List all [[https://cmsweb.cern.ch/das/request?view=list&limit=500&instance=cms_dbs_prod_global&input=dataset+site%3DT3_CH_PSI][hosted datasets]] / --> * [[https://cmsweb.cern.ch/das/request?view=plain&limit=500&instance=prod%2Fglobal&input=dataset+site%3DT3_CH_PSI][hosted datasets plaintext]] * [[https://cmsweb.cern.ch/phedex/prod/Request::View?type=any&nodes=T3_CH_PSI&state=any&.submit=Submit][requests]] * [[https://cmsweb.cern.ch/phedex/prod/Reports::SiteUsage?node=T3_CH_PSI][accounting per phys. group]] * [[http://t3mon.psi.ch/PSIT3-custom/v_pnfs_top_dirs.txt][/pnfs dirs ordered by size]] ; read it by : =curl http://t3mon.psi.ch/PSIT3-custom/v_pnfs_top_dirs.txt 2>/dev/null= * [[http://t3mon.psi.ch/PSIT3-custom/transfers.txt][/pnfs current files transfers]] * [[SearchingIntoSlashPNFS][/pnfs searching by the dc_find tool]] <!-- * show user/dataset [[http://t3mon.psi.ch/PSIT3-custom/sespace.txt][space usage]] --> Show storage space graphs for <form name="formFreestorageint" action="%TOPICURL%?#Storage_Element" method=GET> <!--<form action="%TOPICURL%?#Storage_Element" method=GET>--> <select name="freestorageint" onchange="formFreestorageint.submit()"> <option %CALC{"$IF($EXACT($GET(freestorageint),hour),selected,)"}%>hour</option> <option %CALC{"$IF($EXACT($GET(freestorageint),day),selected,)"}%>day</option> <option %CALC{"$IF($EXACT($GET(freestorageint),week),selected,)"}%>week</option> <option %CALC{"$IF($EXACT($GET(freestorageint),month),selected,)"}%>month</option> <option %CALC{"$IF($EXACT($GET(freestorageint),year),selected,)"}%>year</option> </select> <!-- <input type="hidden" name="jobint" value=%CALC{"$GET(jobint)"}%> <input type="hidden" name="netwint" value=%CALC{"$GET(netwint)"}%> --> </form> <img src="%GANGLIABASE%/PSIT3-custom/sespace-%CALC{"$GET(freestorageint)"}%.gif" /> <!-- <img src="%GANGLIABASE%/graph.php?c=PSI%20Tier3%20services&h=%SEHOST%&m=t3se01_free_cms&r=%CALC{"$GET(freestorageint)"}%&z=medium&jr=&js=&vl=GB&st=now" /> --> ---+++ =/pnfs= dir I/O queues * =regular= I/O queue movers = *dcap/gsidcap/LAN xrootd* movers (heavy random IO for *internal* analysis) ; MAX 100 %GREEN%ACTIVE%ENDCOLOR% movers per file server, others will get %ORANGE%QUEUED%ENDCOLOR% * =wan= I/O queue movers = *SRM/gridftp* movers (transfers of whole files also from outside) ; MAX 2 %GREEN%ACTIVE%ENDCOLOR% movers per file server, others will get %ORANGE%QUEUED%ENDCOLOR% * =xrootd= I/O queue movers = [[https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookXrootdService][WAN xrootd movers]] ; MAX 2 %GREEN%ACTIVE%ENDCOLOR% movers per file server, others will get %ORANGE%QUEUED%ENDCOLOR% * To check by CLI the I/O queues run from a UI =watch -n 1 -d lynx --dump --width=200 'http://t3dcachedb:2288/queueInfo'= e.g. if your jobs are not progressing it might be due to a file server with *too many queued movers* ; in this case you can inform by email the T3 users ( the T3 admins will get it too ) </br> <!-- %GREEN%ACTIVE%ENDCOLOR% movers: <img src="%GANGLIABASE%/graph.php?&c=PSI%20Tier3%20services&h=t3se01.psi.ch&m=t3se01_movers_regular_cms&r=%CALC{"$GET(freestorageint)"}%&z=medium&jr=&js=&st=now"/> <img src="%GANGLIABASE%/graph.php?&c=PSI%20Tier3%20services&h=t3se01.psi.ch&m=t3se01_movers_wan_cms&r=%CALC{"$GET(freestorageint)"}%&z=medium&jr=&js=&st=now"/> <img src="%GANGLIABASE%/graph.php?c=PSI%20Tier3%20services&h=t3se01.psi.ch&v=5&m=t3se01_movers_xrootd_cms&r=%CALC{"$GET(freestorageint)"}%&z=medium&jr=&js=&st=now"/> %ORANGE%QUEUED%ENDCOLOR% movers ( the associated I/O queue is exceeding the max amount of allowed %GREEN%ACTIVE%ENDCOLOR% movers ) : <img src="%GANGLIABASE%/graph.php?&c=PSI%20Tier3%20services&h=t3se01.psi.ch&m=t3se01_moversQ_regular_cms&r=%CALC{"$GET(freestorageint)"}%&z=medium&jr=&js=&st=now"/> <img src="%GANGLIABASE%/graph.php?&c=PSI%20Tier3%20services&h=t3se01.psi.ch&m=t3se01_moversQ_wan_cms&r=%CALC{"$GET(freestorageint)"}%&z=medium&jr=&js=&st=now"/> <img src="%GANGLIABASE%/graph.php?&c=PSI%20Tier3%20services&h=t3se01.psi.ch&v=0&m=t3se01_moversQ_xrootd_cms&r=%CALC{"$GET(freestorageint)"}%&z=medium&jr=&js=&st=now"/> %RED%PENDING%ENDCOLOR% requests (these are hanging file transfers, almost always an error state if they persist): <img src="%GANGLIABASE%/graph.php?&c=PSI%20Tier3%20services&h=t3se01.psi.ch&m=t3se01_pending_requests&r=%CALC{"$GET(freestorageint)"}%&z=medium&jr=&js=&st=now&z=medium"/> --> ---+++ =/mnt/t3nfs01/data01/{shome,swshare}= dirs %N% [[http://t3mon.psi.ch/PSIT3-custom/space.report][User Space Report]] </br></br> <form name="formZFSint" action="%TOPICURL%?#ZFS" method=GET> <select name="zfs" onchange="formZFSint.submit()"> <option %CALC{"$IF($EXACT($GET(zfs),hour),selected,)"}%>hour</option> <option %CALC{"$IF($EXACT($GET(zfs),day),selected,)"}%>day</option> <option %CALC{"$IF($EXACT($GET(zfs),week),selected,)"}%>week</option> <option %CALC{"$IF($EXACT($GET(zfs),month),selected,)"}%>month</option> <option %CALC{"$IF($EXACT($GET(zfs),year),selected,)"}%>year</option> </select> </form> <img src="%GANGLIABASE%/graph.php?&c=PSI%20Tier3%20services&h=t3nfs01.psi.ch&m=t3nfs01.psi.ch_data01_available&r=%CALC{"$GET(zfs)"}%&z=medium&jr=&js=&st=now"/> <img src="%GANGLIABASE%/graph.php?&c=PSI%20Tier3%20services&h=t3nfs01.psi.ch&m=t3nfs01.psi.ch_data01_used&r=%CALC{"$GET(zfs)"}%&z=medium&jr=&js=&st=now"/> <img src="%GANGLIABASE%/graph.php?&c=PSI%20Tier3%20services&h=t3nfs01.psi.ch&m=t3nfs01.psi.ch_data01_shome_used&r=%CALC{"$GET(zfs)"}%&z=medium&jr=&js=&st=now"/> <img src="%GANGLIABASE%/graph.php?&c=PSI%20Tier3%20services&h=t3nfs01.psi.ch&m=t3nfs01.psi.ch_data01_swshare_used&r=%CALC{"$GET(zfs)"}%&z=medium&jr=&js=&st=now"/> ---++ Networking and File Transfers (+ PhEDEx) Links: * [[%GANGLIABASE%/?m=network_report&r=day&s=descending&c=PSI+Tier3+fileservers&h=&sh=1&hc=4][Ganglia fileserver page]] * [[https://cmsweb.cern.ch/phedex/prod/Reports::SiteUsage?node=T3_CH_PSI#][PhEDEx usage at T3]] | [[https://cmsweb.cern.ch/phedex/prod/Reports::SiteUsage?node=T2_CH_CSCS#][PhEDEx usage at T2]] * [[https://fts3.cern.ch:8449/fts3/ftsmon/#/?vo=&source_se=&dest_se=srm:%2F%2Ft3se01.psi.ch&time_window=24][Last day of FTS3 jobs induced by CRAB3, transfers details]] | [[https://fts3.cern.ch:8449/fts3/ftsmon/#/statistics/volume?source_se=&dest_se=srm:%2F%2Ft3se01.psi.ch&vo=&time_window=24][Last day of FTS3 jobs induced by CRAB3, Tot GBs]] * [[https://cmsweb.cern.ch/phedex/prod/Activity::RatePlots?graph=quantity_rates&entity=src&src_filter=&dest_filter=T3_CH_PSI&no_mss=true&period=l7d&upto=][7d avg MBs]] | [[https://cmsweb.cern.ch/phedex/prod/Activity::RatePlots?graph=quantity_rates&entity=src&src_filter=&dest_filter=T3_CH_PSI&no_mss=true&period=l24h&upto=][24h avg MBs]] * [[https://cmsweb.cern.ch/phedex/prod/Activity::QualityPlots?graph=quality_all&entity=src&src_filter=&dest_filter=T3_CH_PSI&no_mss=true&period=l7d&upto=][7d quality transfer]] | [[https://cmsweb.cern.ch/phedex/prod/Activity::QualityPlots?graph=quality_all&entity=src&src_filter=&dest_filter=T3_CH_PSI&no_mss=true&period=l24h&upto=][24h quality transfer]] * [[https://cmsweb.cern.ch/phedex/prod/Activity::ErrorInfo?tofilter=T3_CH_PSI&fromfilter=.*&report_code=.*&xfer_code=.*&to_pfn=.*&from_pfn=.*&log_detail=.*&log_validate=.*&.submit=Update#][PhEDEx errors]] * [[http://cmsweb.cern.ch/phedex/prod/Components::Links?from_filter=T3_CH_PSI&andor=or&to_filter=T3_CH_PSI&Update=Update#][PhEDEx links between T3_CH_PSI and other centers]] * [[https://twiki.cern.ch/twiki/bin/view/CMS/PhedexDraftDocumentation][PhEDEx Doc]] * [[http://t3mon.psi.ch/PSIT3-custom/phedex-statistics.txt][PhEDEx local stats]] * [[https://cmsweb.cern.ch/phedex/prod/Request::View?type=delete&nodes=T3_CH_PSI&state=any&.submit=Submit][PhEDEx deletion request state]] * [[https://cmsweb.cern.ch/phedex/datasvc/xml/prod/transferrequests?node=T3_CH_PSI][PhEDEx transfer requests as XML]] * <pre>$ curl --capath /etc/grid-security/certificates -E $X509_USER_PROXY --cacert $X509_USER_PROXY https://cmsweb.cern.ch/phedex/datasvc/xml/prod/transferrequests?node=T3_CH_PSI 2>/dev/null | xmllint --format -</pre> * [[https://cmsweb.cern.ch/phedex/datasvc/xml/prod/blockreplicasummary?node=T3_CH_PSI][PhEDEx datasets as XML]] * <pre>$ curl --capath /etc/grid-security/certificates -E $X509_USER_PROXY --cacert $X509_USER_PROXY https://cmsweb.cern.ch/phedex/datasvc/xml/prod/blockreplicasummary?node=T3_CH_PSI 2>/dev/null | xmllint --format - </pre> * [[https://cmsweb.cern.ch/phedex/datasvc/json/prod/blockreplicasummary?node=T3_CH_PSI][PhEDEx datasets as JSON]] Plotting interval: <form name="formNetwint" action="%TOPICURL%?#Networking_and_File_Transfers_Ph" method=GET> <select name="netwint" onchange="formNetwint.submit()"> <option %CALC{"$IF($EXACT($GET(netwint),hour),selected,)"}%>hour</option> <option %CALC{"$IF($EXACT($GET(netwint),day),selected,)"}%>day</option> <option %CALC{"$IF($EXACT($GET(netwint),week),selected,)"}%>week</option> <option %CALC{"$IF($EXACT($GET(netwint),month),selected,)"}%>month</option> <option %CALC{"$IF($EXACT($GET(netwint),year),selected,)"}%>year</option> </select> <input type="hidden" name="jobint" value=%CALC{"$GET(jobint)"}%> <input type="hidden" name="freestorageint" value=%CALC{"$GET(freestorageint)"}%> </form> <br><img src="%GANGLIABASE%/graph.php?g=network_report&z=medium&c=PSI%20Tier3%20workers&m=&r=%CALC{"$GET(netwint)"}%&s=descending&hc=4&st=now" /> <img src="%GANGLIABASE%/graph.php?g=network_report&z=medium&c=PSI%20Tier3%20fileservers&m=&r=%CALC{"$GET(netwint)"}%&s=descending&hc=4&st=now" /> <img src="%GANGLIABASE%/graph.php?g=network_report&z=medium&c=PSI%20Tier3%20services&m=&r=%CALC{"$GET(netwint)"}%&s=descending&hc=4&st=now" /> ---++ Availability reports These tests are run by the centralized Grid monitoring services and they determine whether the T3 or the T2 are considered to be working correctly: * Nagios : [[https://etf-cms-prod.cern.ch/etf/nagios/cgi-bin/status.cgi?hostgroup=T3_CH_PSI&style=detail][CMS]] | [[https://argo-mon.egi.eu/nagios/cgi-bin/status.cgi?host=t3se01.psi.ch&style=detail][EGI1]] | [[https://operations-portal.egi.eu/rodDashboard/site/T3_CH_PSI/tab/overview/filter/operators/page/overview/vo?entity_type=site&entity_name=T3_CH_PSI][EG2]] | [[https://t3nagios.psi.ch/nagios/cgi-bin/status.cgi?hostgroup=all&style=overview][PSI]] <!-- ---++ Computer Room Temps *private link* </br> <img src=https://ganglia03.psi.ch/ganglia/graph.php?g=temperature_report&c=rztemp&h=T_18&r=&z=medium&st=now><img src=https://ganglia03.psi.ch/ganglia/graph.php?g=temperature_report&c=rztemp&h=T_19&r=&z=medium&st=now> -->
This topic: CmsTier3
>
WebHome
>
Tier3Monitoring
Topic revision: r115 - 2018-05-09 - NinaLoktionova
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback