<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+!! Phoenix Ganglia configuration %TOC% Note: All configuration files, non-standard startup files and _gmetric_ scripts needed for this ganglia cluster are kept in the CSCS SVN at %SVNBASE%/monitoring/ganglia. The deployment of the configuration files is done mainly through cfengine. Please consult the CfEngine#Ganglia page to learn about how to implement changes. ---++ Ganglia main server: mon.lcg.cscs.ch This node runs the central ganglia services. There's a number of collector processes for the principal node groups, and a daemon storing the information in round robin data bases. It also runs a web server for displaying the information. ---+++ gmond There are three =gmond= processes running as collectors. They listen to UDP transmissions from the various nodes to be monitored. Each of these nodes runs a =gmond= process configured for sending UDP packets to the respective collector ports. The collector =gmonds= on mon.lcg.cscs.ch use these configuration files * =/etc/gmond-wn-collector.conf= * =/etc/gmond-service-collector.conf= * =/etc/gmond-fileserver-collector.conf= The standard service startup file =/etc/init.d/gmond= is modified to start/stop all three of these services (the init file can be found at %SVNBASE%/monitoring/ganglia/ganglia-config/mon-box/init.d/gmond) ---+++ gmetad The =gmetad= records the history of ganglia monitoring information in round robin data bases (located under =/var/lib/ganglia/rrds=). It's configuration file =/etc/gmetad.conf= contains directives for polling the three =gmond= collectors. The =gmetad= web pages reside under =/var/www/html/ganglia= and also contain a small configuration file =conf.php=. *Note:* In order to get pie charts, you need to have the *php-gd* package installed! ---+++ ramdisk / tmpfs for RRD files The =gmetad= writes the ganglia monitoring information to RRD (_round robin data base_) files, using one file per sensor. For a large cluster this leads to a high frequency of I/O operations to large numbers of files, always on the same disk area. The CPU will tend to be in I/O-wait states most of the time, and people have reported fast degradation of the hard disk. Therefore the RRD files are hosted in a tmpfs area in memory (earlier we had used a ram disk), and the contents of this area are synchronized every few minutes to a disk area to prevent information loss in case of system breakdown. *Note: The standard location for the ganglia RRDs is a symbolic link to the tmpfs area*:<br> =/var/lib/ganglia/rrds -> /dev/shm/ganglia/rrds= The ram disk is started as a service with the custom =/etc/init.d/tmpfs-sync-area= init script. The script resides in the CSCS svn at %SVNBASE%/monitoring/ganglia/ganglia-config/mon-box/init.d. It is started before =gmetad= and does the following * upon start * initializes tmpfs area with contents from disk area * does sanity checks on every important operation * installs a cron job for synchronizing tmpfs area to disk area in regular intervals * upon stop * makes sure that dependent services (gmetad) are stopped first * synchronizes tmpfs area to disk area * uninstalls the cron job Note: The same functionality is available in the form of a ramdisk based service (%SVNBASE%/monitoring/ganglia/ganglia-config/mon-box/init.d/ramdisk). The scripts are quite generic and use configuration information from an appropriate file in =/etc/sysconfg/=. ---+++ httpd Needs to allow running *php* scripts in the ganglia web directory. ---+++ CSCS custom graphs There is a script at =/root/CSCS_custom_graphs/custom_rrd_cscs.pl= producing the CSCS custom graphs which gets executed by the cron job =/etc/cron.d/CSCS_custom_graphs=. The custom graphs get stored in the =/var/www/html/ganglia/CSCS-custom= directory. They are used for the PhoenixMonOverview page and other statistics pages. A similar script pulls down the pie charts for the subclusters for display on the monitoring page. The sources for the scripts can be found under %SVNBASE%/monitoring/ganglia/custom_graphs. ---++ Client nodes ---+++ gmond For every class of node there exists a specific =gmond.conf= configuration file. These can be found in our SVN at %SVNBASE%/monitoring/ganglia/ganglia-config. The files need to be copied to =/etc/gmond.conf= on the node in order to work with the standard init procedure. There are three classes of nodes: * *worker-nodes* * *fileservers*: The dcache pool servers * *service-nodes*: all remaining nodes, including the dcache head and data base nodes ---+++ gmetric scripts Some nodes send additional information by using ganglia's =gmetric= utility. Every node making use of this feature has the same kind of basic configuration: * =/root/gmetric=: contains the specific scripts issuing the =gmetric= command lines. This is a direct checkout from the corresponding %SVNBASE%/monitoring/ganglia/gmetric-scripts subdirectory. Keep this up to date when you make changes * =/etc/cron.d/gmetric=: cron job to regularly run the scripts The following nodes have gmetric scripts: * *CE*: send queue length and running jobs per VO information * *SE head node*: collects information from dCache. NOTE: The =dCache_gmetric.py= is now located in =/opt/cscs/libexec/gmetric-scripts/dcache=!!! * *WN*: (not installed) There is a script that collects the jobID and user for each nodes and displays it as a string. ---++ Ganglia 3.1.7 on ganglia.lcg.cscs.ch ---+++ Build * =yum install apr apr-devel pango pango-devel pcre-devel= * download [[http://savannah.nongnu.org/download/confuse/confuse-2.7.tar.gz][confuse]] * =./configure CFLAGS=-fPIC --disable-nls= * =make && make install= * download [[http://oss.oetiker.ch/rrdtool/pub/rrdtool-1.4.3.tar.gz][rrdtool]] * =./configure --prefix=/usr= * =make && make install= * configure Ganglia: =./configure --with-gmetad --with-librrd=/usr/lib --sysconfdir=/etc/ganglia= * =cp gmond/gmond.init /etc/init.d/gmond= * =cp gmetad/gmetad.init /etc/init.d/gmetad= * =cp -r web/ /var/www/ganglia= ---++ Aggregate graphs Newer versions of ganglia support aggregate views, example below http://ganglia.lcg.cscs.ch/ganglia3/?r=hour&cs=&ce=&tab=v&vn=GPFS-clients Views are stored under /var/lib/ganglia-web/conf/ as json files. A short example config is displayed below. The key parts are the metric_regex host_regex. For graph_type we can use stack or line. If we want to display more than one graph simply another block of json within the "items" indentation. <verbatim> { "view_name": "GPFS-metadata", "default_size": "medium", "items": [ { "aggregate_graph": "true", "metric_regex": [ { "regex": "gpfs_disk_reads" } ], "host_regex": [ { "regex": "(mds1|mds2)" } ], "graph_type": "stack", "vertical_label": "MB\/s", "title": "GPFS-metadata reads MB\/s", "glegend": "show" } ], "view_type": "standard" } </verbatim> -- Main.PeterOettl - 2010-05-04 -- Main.DerekFeichtinger - 30 Jan 2008
This topic: LCGTier2
>
WebHome
>
ServiceInformation
>
PhoenixGanglia
Topic revision: r18 - 2014-01-08 - GeorgeBrown
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback