OLDServiceGPFS < LCGTier2

Tags: view all tags
<!-- keep this as a security measure:
   * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup
   * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup
   #uncomment this if you want the page only be viewable by the internal people
   #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup
-->

---+ Service Card for GPFS

%TOC%

---++ Disk failure/replacement

If a disk fails, there is a cron script that should check it (every five minutes) and remove it with =mmdeldisk= automatically. When a disk is removed this way, the filesystem rebalances (so that there are two copies of every file everywhere again), so there's no risk on a second disk failure.

---+++ Gathering information

   * To see the list of disks that are part of gpfs (and their status) you can do: <verbatim>mmlsdisk gpfs</verbatim>
   * To see the occupation of each disk (and metadata) you can: <verbatim>mmdf gpfs</verbatim>
   * To list the disks that are free (they were deleted with =mmdeldisk=) issue: <verbatim>mmlsnsd -F</verbatim>
   * If you see problems with the NSD descriptor, you should see some information with this (compare it with other disks): <verbatim>mmfsadm test readdescraw /dev/j1r6c5</verbatim>

---+++ Recovering a deleted disk

To bring the disk back, you need to:
   * Delete the disk (automatic by the cron script) and then delete the NSD first from GPFS
   * Add a new NSD (with the same name) and then add the disk to GPFS again
*You need to find its failure group* by looking at =mmlsdisk= to its colleagues (the ones from the same enclosure have the same failure group)

*You need to determine which is the primary server and which is the secondary*. The NSD name will tell you that.

*You need to make sure you have authorized hosts accepted for all GPFS nodes*, including the worker nodes

After you gathered all necesary information, do:
<verbatim>
# Check the free (deleted) disks:
mmlsnsd -F

# Delete the NSD
mmdelnsd oss11j1r6c5

# If disk is not new then do also (from the corresponding OSS):
dd if=/dev/zero of=/dev/j1r6c5 bs=1024k count=10k

# And then create the NSD and the disk (format= device:primaryserver,secondaryserver::dataOnly:failuregroup:nsdname:)
echo "j1r6c5:oss11(pri).ib.lcg.cscs.ch,oss12(sec).ib.lcg.cscs.ch::dataOnly:FAILURE_GROUP:oss11j1r6c5:" > /tmp/new_nsd
mmcrnsd -F /tmp/new_nsd
mmadddisk gpfs -F /tmp/new_nsd    # may take some time</verbatim>

---+++ Rebalancing the filesystem

After adding a new disk, you may want to rebalance the filesystem, in order to have all disks with the same free space. This is an expensive operation (an hour or two), and it's not really needed in production, since every time a file is written, it rebalances it correctly, and our scratch has a very quick file turnover. But in any case, this is how you do it:
<verbatim>mmrestripefs tmpgpfs -b</verbatim>

---++ Metadata server reboot

If, for any reason, a virident card is taken out of the filesystem (it's only vissible from one of the machines), you need to reenable it again. This may happen also when a MDS is rebooted
<verbatim>mmchdisk tmpgpfs start -d virident3</verbatim>

---++ Starting a down disk

If, for any reason, a virident card is taken offline (marked down) here's how to get it back.

<verbatim>[root@mds1:gen]# mmstartup
Fri Nov 11 10:44:18 CET 2011: mmstartup: Starting GPFS ...
[root@mds1:gen]# mmlsdisk gpfs -d "virident1"
disk         driver   sector failure holds    holds                            storage
name         type       size   group metadata data  status        availability pool
------------ -------- ------ ------- -------- ----- ------------- ------------ ------------
virident1    nsd         512    1009 yes      no    ready         down         system       
[root@mds1:gen]# mmchdisk gpfs start -d "virident1" 
Scanning file system metadata, phase 1 ... 
Scan completed successfully.
Scanning file system metadata, phase 2 ... 
Scan completed successfully.
Scanning file system metadata, phase 3 ... 
Scan completed successfully.
Scanning file system metadata, phase 4 ... 
Scan completed successfully.
Scanning user file metadata ...
 100.00 % complete on Fri Nov 11 10:45:06 2011
Scan completed successfully.
[root@mds1:gen]# mmlsdisk gpfs -d "virident1"
disk         driver   sector failure holds    holds                            storage
name         type       size   group metadata data  status        availability pool
------------ -------- ------ ------- -------- ----- ------------- ------------ ------------
virident1    nsd         512    1009 yes      no    ready         up           system    </verbatim>

---++ Procedure to follow when GPFS is blocked

In order to obtain all information available from gpfs, we need to create a dump for IBM:
<verbatim># mmfsadm dump all                                                
# mmfsadm dump waiters                                            
# mmfsadm dump kthreads                 </verbatim>

---++ Clean all CREAM stalled files on tmpdir_slurm
Running the following on a WN will manually delete all directories on =/gpfs/tmpdir_slurm/CREAM_FQDN= that are relative to jobs not currently running on the system for that CREAM CE:
<verbatim>
# CREAM_TAG=cre02
# CREAM=cream02.lcg.cscs.ch
# squeue -t R --noheader |grep ${CREAM_TAG} | awk '{print $1}' |sort  > /tmp/running_jobs.${CREAM_TAG}.txt
# cd /gpfs/tmpdir_slurm/${CREAM}/
# ls | egrep -vf /tmp/running_jobs.${CREAM_TAG}.txt | xargs -n 1 echo rm -rf > /tmp/delete_stalled_${CREAM_TAG}.txt
# bash -x /tmp/delete_stalled_${CREAM_TAG}.txt
</verbatim>

---++ What to do when receiving memory errors when doing ls
If you see errors relative to 'Insufficient memory' *when doing ls*, most likely you ran out of metadata space, even if =df -i= shows that there is space available on GPFS. Look at this example: <verbatim># mmdf gpfs 
[...]
virident1           293937152     1005 yes      no                0 (  0%)         21504 ( 0%)
virident2           293937152     1006 yes      no                0 (  0%)         20800 ( 0%)
ssd1                390711360     1007 yes      no             1024 (  0%)         44256 ( 0%)
ssd2                390711360     1008 yes      no                0 (  0%)         41568 ( 0%)
                -------------                         -------------------- -------------------
(pool total)     187932952384                          168898436096 ( 90%)    4428987840 ( 2%)

                =============                         ==================== ===================
(data)           186563655360                          168898435072 ( 91%)    4428859712 ( 2%)
(metadata)         1369297024                                  1024 (  0%)        128128 ( 0%)
                =============                         ==================== ===================
(total)          187932952384                          168898436096 ( 90%)    4428987840 ( 2%)

Inode Information
-----------------
Number of used inodes:        94736998
Number of free inodes:        55889306
Number of allocated inodes:  150626304
Maximum number of inodes:    150626304</verbatim>

As you can see there is no metadata space available although the amount of used inodes hasn't reached the limit. This is because the calculations of space hasn't been done right and more inodes than actually present have been assigned to the filesystem.

There are only two ways of solving this:
   1. Clean the filesystem via massive rm or GPFS policies
   1. Physically add more metadata disks to the system and then add the proper NSDs and disk to GPFS: <verbatim># cat nsdssd3
/dev/sdc:mds2.lcg.cscs.ch::metadataOnly:1008:ssd3:
/dev/sdc:mds1.lcg.cscs.ch::metadataOnly:1007:ssd4:
# mmcrnsd -F nsdssd3 -v no
# vim /var/mmfs/etc/nsddevices # and make sure /dev/sdc (in this case) is allowed
# mmlsnd -M
# mmadddisk gpfs -F ./nsdssd3 # Running mmcrnsd will modify this file and make it ready to add the disks just by using the file.
# mmlsdisk
# mmdf gpfs
</verbatim>

---++ Filesystem Creation

To create a new GPFS filesystem, do the following:

First, make sure that you can ssh in both directions to and from all servers and clients - mainly this involves typing yes everywhere. We are doing everything through the ib network, so make sure that the hosts file has the name.ib addresses resolvable.

<verbatim>
mmcrcluster -N gpfs01.ib.lcg.cscs.ch:manager-quorum,gpfs02.ib.lcg.cscs.ch:manager-quorum,gpfs03.ib.lcg.cscs.ch:quorum \
 -p gpfs01.ib.lcg.cscs.ch -s gpfs02.ib.lcg.cscs.ch -r /usr/bin/ssh -R /usr/bin/scp -C scratch

mmchlicense server --accept -N gpfs01.ib.lcg.cscs.ch,gpfs02.ib.lcg.cscs.ch,gpfs03.ib.lcg.cscs.ch

cp -f orig/sdk.dsc.all sdk.dsc.all
mmcrnsd -F sdk.dsc.all -v no

mmstartup -a

#edit this to make different filesystem
#A - automount
#B - blocksize
#E - 
#j cluster, just keep this way
#k all, just keep
#K replication, keep 1
#m number of replicated copies, keep 1
#M replicated something else, keep 2
#n number of nodes in cluster, keep 25
#T where to mount it, leave as /gpfs
#mmcrfs scratch -F sdk.dsc.all -A yes -B 1M -D posix -E no -j cluster -k all -K whenpossible -m 1 -M 2 -n 25 -T /gpfs

mmcrfs scratch -F sdk.dsc.all -A yes -B 1M -D posix -E no -j cluster -k all -K no -m 1 -M 1 -n 25 -T /gpfs

mmaddnode -N nodefile
mmchlicense client --accept -N nodefile
</verbatim>

the file sdk.dsc.all has the following:
<verbatim>
gpfsdata00:gpfs01.ib.lcg.cscs.ch:gpfs02.ib.lcg.cscs.ch:dataAndMetadata:-1:storage3lun0:
gpfsdata01:gpfs01.ib.lcg.cscs.ch:gpfs03.ib.lcg.cscs.ch:dataAndMetadata:-1:storage3lun1:
gpfsdata02:gpfs01.ib.lcg.cscs.ch:gpfs02.ib.lcg.cscs.ch:dataAndMetadata:-1:storage3lun2:
gpfsdata03:gpfs01.ib.lcg.cscs.ch:gpfs03.ib.lcg.cscs.ch:dataAndMetadata:-1:storage3lun3:

gpfsdata04:gpfs02.ib.lcg.cscs.ch:gpfs01.ib.lcg.cscs.ch:dataAndMetadata:-1:storage3lun4:
gpfsdata05:gpfs02.ib.lcg.cscs.ch:gpfs03.ib.lcg.cscs.ch:dataAndMetadata:-1:storage3lun5:
gpfsdata06:gpfs02.ib.lcg.cscs.ch:gpfs01.ib.lcg.cscs.ch:dataAndMetadata:-1:storage3lun6:
gpfsdata07:gpfs02.ib.lcg.cscs.ch:gpfs03.ib.lcg.cscs.ch:dataAndMetadata:-1:storage3lun7:

gpfsdata08:gpfs03.ib.lcg.cscs.ch:gpfs01.ib.lcg.cscs.ch:dataAndMetadata:-1:storage3lun8:
gpfsdata09:gpfs03.ib.lcg.cscs.ch:gpfs02.ib.lcg.cscs.ch:dataAndMetadata:-1:storage3lun9:
gpfsdata10:gpfs03.ib.lcg.cscs.ch:gpfs01.ib.lcg.cscs.ch:dataAndMetadata:-1:storage3lun10:
gpfsdata11:gpfs03.ib.lcg.cscs.ch:gpfs02.ib.lcg.cscs.ch:dataAndMetadata:-1:storage3lun11:
</verbatim>

I named the individual disks using udev rules:
<verbatim>
Aug 17 09:59 [root@gpfs01:gpfs]# cat /etc/udev/rules.d/10-local.rules 
ACTION=="add", SUBSYSTEM=="block", IMPORT{program}="/lib/udev/rename_device", ID=="0:2:1:0", NAME="gpfsmeta"

ACTION=="add", SUBSYSTEM=="block", IMPORT{program}="/lib/udev/rename_device", ID=="7:0:0:0", NAME="gpfsdata00"
ACTION=="add", SUBSYSTEM=="block", IMPORT{program}="/lib/udev/rename_device", ID=="7:0:0:1", NAME="gpfsdata01"
ACTION=="add", SUBSYSTEM=="block", IMPORT{program}="/lib/udev/rename_device", ID=="7:0:0:2", NAME="gpfsdata02"
ACTION=="add", SUBSYSTEM=="block", IMPORT{program}="/lib/udev/rename_device", ID=="7:0:0:3", NAME="gpfsdata03"
ACTION=="add", SUBSYSTEM=="block", IMPORT{program}="/lib/udev/rename_device", ID=="7:0:0:4", NAME="gpfsdata04"
ACTION=="add", SUBSYSTEM=="block", IMPORT{program}="/lib/udev/rename_device", ID=="7:0:0:5", NAME="gpfsdata05"
ACTION=="add", SUBSYSTEM=="block", IMPORT{program}="/lib/udev/rename_device", ID=="7:0:0:6", NAME="gpfsdata06"
ACTION=="add", SUBSYSTEM=="block", IMPORT{program}="/lib/udev/rename_device", ID=="7:0:0:7", NAME="gpfsdata07"
ACTION=="add", SUBSYSTEM=="block", IMPORT{program}="/lib/udev/rename_device", ID=="7:0:0:8", NAME="gpfsdata08"
ACTION=="add", SUBSYSTEM=="block", IMPORT{program}="/lib/udev/rename_device", ID=="7:0:0:9", NAME="gpfsdata09"
ACTION=="add", SUBSYSTEM=="block", IMPORT{program}="/lib/udev/rename_device", ID=="7:0:0:10", NAME="gpfsdata10"
ACTION=="add", SUBSYSTEM=="block", IMPORT{program}="/lib/udev/rename_device", ID=="7:0:0:11", NAME="gpfsdata11"
</verbatim>

The following tweaks were made to tweak the disk access parameters
<verbatim>
          echo "Updating params for $dev ..."
          echo 4 >    /sys/block/${dev}/queue/nr_requests
          echo noop > /sys/block/${dev}/queue/scheduler
          echo 1024 > /sys/block/${dev}/queue/max_sectors_kb
          echo 64 >   /sys/block/${dev}/device/queue_depth
          echo 512 > /sys/block/${dev}/queue/read_ahead_kb
</verbatim>

These were added to the /etc/sysctl.conf file to optimize memory usage
<verbatim>
kernel.shmall = 4294967296
vm.mmap_min_addr=65536
vm.min_free_kbytes=16901008
</verbatim>

This is the current working configuration that I did to the filesystem:
<verbatim>
Aug 17 10:03 [root@gpfs01:gpfs]# mmlsconfig 
Configuration data for cluster scratch.ib.lcg.cscs.ch:
------------------------------------------------------
clusterName scratch.ib.lcg.cscs.ch
clusterId 10717238835674925567
autoload yes
minReleaseLevel 3.3.0.2
dmapiFileHandleSize 32
pagepool 2048M
nsdbufspace 30
nsdMaxWorkerThreads 36
maxMBpS 1600
maxFilesToCache 10000
worker1Threads 48
subnets 148.187.70.0 148.187.71.0
prefetchThreads 72
verbsRdma enable
verbsPorts mlx4_0
nsdThreadsPerDisk 3
minMissedPingTimeout 240
adminMode central

File systems in cluster scratch.ib.lcg.cscs.ch:
-----------------------------------------------
/dev/scratch
</verbatim>

You also will want to add this to the ib configuration:
<verbatim>
RENICE_IB_MAD=yes
</verbatim>

This prevents gpfs from overwhelming the ib communication. If the kernel is too busy to respond to ib pings, then the gpfs server will assume that the node is dead and kick it out of the fs, even when it isn't dead.

---+++ NOTES
<verbatim>
dsh -w wn[120-136] -w wn[139-142] mmstartup
dsh -w wn[120-136] -w wn[139-142] mmmount tmpgpfs
dsh -w wn[120-136] -w wn[139-142] mmumount tmpgpfs
dsh -w wn[120-136] -w wn[139-142] mmshutdown
on oss11: mmchconfig minMissedPingTimeout=60
 /var/adm/ras/mmfs.log.latest
mmchdisk virident1 start -N oss12
mmlsdisk tmpgpfs
mmlsnsd
mmlsnsd -M

http://141.85.107.254/Documentatie/Sisteme_Paralele_si_Distribuite/IBM_HPC/GPFS/a7604134.pdf
</verbatim>

---++ GPFS Policies

The cleanup policy is located in =/opt/cscs/libexec/gpfs-policies/empty_tmpdir_slurm_usertmp_home.policy= and deletes all files older than *6 days* on =tmpdir_slurm=, =gridhome= and =usertemp=:<verbatim>RULE 'gpfswipe' DELETE FROM POOL 'system' WHERE (PATH_NAME like '/gpfs/tmpdir_slurm/%' OR PATH_NAME like '/gpfs/home/%' OR PATH_NAME like '/gpfs/usertmp/%') AND (CURRENT_TIMESTAMP - MODIFICATION_TIME > INTERVAL '6' DAYS)
</verbatim>

The policy runs each time on a single node and is scheduled as follows:

   * =wn65= runs every Sunday at 12:20
   * =wn23= runs every Tuesday at 00:20
   * =wn18= runs every Thursday at 12:20

NOTE: Please, be aware that this policy does not delete directories and these need to be removed by hand!

---++ How to refresh a reinstalled client to be back in the GPFS cluster

   1 Copy /var/mmfs/gen/mmsdrfs from other node (any) to the reinstalled system. <verbatim>[root@wn11:/] scp wn45:/var/mmfs/gen/mmsdrfs /var/mmfs/gen/</verbatim>
   1 Run mmrefresh -f on the reinstalled node. <verbatim>[root@wn11:gen]# mmrefresh -f
[root@wn11:gen]# mmgetstate 

 Node number  Node name        GPFS state 
------------------------------------------
      11      wn11             down
[root@wn11:gen]# mmstartup </verbatim>

---++ CLIENT INSTALLATION

   1 Stop all GRID services. <verbatim>grid-service stop</verbatim>
   1 Remove all the grid users in the system <verbatim>/opt/cscs/sbin/clean_grid_accounts.bash</verbatim>
   1 Install the kernel *2.6.18-274.3.1.el5* <verbatim>yum install kernel-2.6.18-274.3.1.el5 kernel-headers-2.6.18-274.3.1.el5 --disableexcludes=main</verbatim>
   1 Install RPMS available in =xen11:/nfs/gpfs_clean= <verbatim>mount xen11:/nfs /media
cd /media/gpfs_clean
./gpfs_cleanslate.sh
cd -
umount /media</verbatim>
   1 *Reboot* the machine.
   1 Add to =/etc/hosts= the *hostname.10* IP addresess
   1 From mds1 <verbatim>scp /var/mmfs/gen/mmsdrfs $client:/var/mmfs/gen/
mmdelnode -N $client
mmaddnode $client.lcg.cscs.ch
mmchlicense client --accept -N $client.lcg.cscs.ch</verbatim>
   1 On mds1 <verbatim>mmlscluster</verbatim>
   1 Startup GPFS on each client. <verbatim>mmstartup
</verbatim>

---++++ GPFS Repo

I have set up a repo on puppet to install GPFS only using yum the base urls for which are http://148.187.64.40:81/gpfs/$releasever/base/ and http://148.187.64.40:81/gpfs/$releasever/base/

First you need to install the initial release of the base package.

This is because updated base packages only provide a delta and do not list the initial release package as a dependency but rather check by a pre install script in the RPM. Thanks IBM...

<verbatim>
yum localinstall http://phoenix1.lcg.cscs.ch:81/gpfs/el6/base/gpfs.base-3.5.0-0.x86_64.rpm
</verbatim>

Ok lets install the other packages we need.

<verbatim>
yum install gpfs.docs gpfs.msg gpfs.docs gpfs.base
</verbatim>

No all we need is the kernel module if it is not already available.

<verbatim>
cd /usr/lpp/mmfs/src
make LINUX_DISTRIBUTION=REDHAT_AS_LINUX Autoconfig                                                            
make World
make rpm             
rpm -ivh /root/rpmbuild/RPMS/x86_64/gpfs.gplbin-*.rpm
modprobe mmfs26    
</verbatim>

Let make this available by yum for future installs. Note this was built with the GPFS-3.5.0-10 packages.

<verbatim>
scp /root/rpmbuild/RPMS/x86_64/gpfs.gplbin-*.rpm phoenix1:/cm/www/html/gpfs/el6/updates/

#On Phoenix1

cd /cm/www/html/gpfs/el6/updates/
createrepo --update -p .
</verbatim>

We can now install via yum on the client, obviously make sure the correct kernel version is used.

<verbatim>
yum clean all
yum install gpfs.gplbin-$(uname -r)
</verbatim>

---+++ Other information - 2 GPFS CLUSTERS (obsolete)

Now, we have created two clusters, the i/o servers and the clients. Here's what I did:

Servers - create the cluster the same way as last time (I just left the old one)

<verbatim>
May 08 11:26 [root@oss11:gpfs_fs_creation]# mmlscluster

GPFS cluster information
========================
  GPFS cluster name:         gpfs.lcg.cscs.ch
  GPFS cluster id:           10717232453390325744
  GPFS UID domain:           gpfs.lcg.cscs.ch
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp

GPFS cluster configuration servers:
-----------------------------------
  Primary server:    mds1.lcg.cscs.ch
  Secondary server:  mds2.lcg.cscs.ch

 Node  Daemon node name            IP address       Admin node name             Designation    
-----------------------------------------------------------------------------------------------
   1   mds1.lcg.cscs.ch            148.187.66.34    mds1.lcg.cscs.ch            quorum-manager
   2   mds2.lcg.cscs.ch            148.187.66.35    mds2.lcg.cscs.ch            quorum-manager
   3   oss11.lcg.cscs.ch           148.187.66.3     oss11.lcg.cscs.ch           quorum
   4   oss12.lcg.cscs.ch           148.187.66.4     oss12.lcg.cscs.ch           quorum
   5   oss21.lcg.cscs.ch           148.187.66.9     oss21.lcg.cscs.ch           quorum
   6   oss22.lcg.cscs.ch           148.187.66.10    oss22.lcg.cscs.ch           quorum
   7   oss31.lcg.cscs.ch           148.187.66.15    oss31.lcg.cscs.ch           quorum
   8   oss32.lcg.cscs.ch           148.187.66.16    oss32.lcg.cscs.ch           
   9   oss41.lcg.cscs.ch           148.187.66.21    oss41.lcg.cscs.ch           
  10   oss42.lcg.cscs.ch           148.187.66.22    oss42.lcg.cscs.ch           



mmcrcluster -N mds1.lcg.cscs.ch:manager-quorum,mds2.lcg.cscs.ch:manager-quorum,oss11.lcg.cscs.ch:quorum,oss12.lcg.cscs.ch:quorum,oss21.lcg.cscs.ch:quorum,oss22.lcg.cscs.ch:quorum,oss31.lcg.cscs.ch:quorum,oss32.lcg.cscs.ch:quorum,oss41.lcg.cscs.ch:quorum,oss42.lcg.cscs.ch:quorum -p mds1.lcg.cscs.ch -s mds2.lcg.cscs.ch -r /usr/bin/ssh -R /usr/bin/scp -C gpfs

mmchlicense server --accept -N mds1.lcg.cscs.ch,mds2.lcg.cscs.ch,oss11.lcg.cscs.ch,oss12.lcg.cscs.ch,oss21.lcg.cscs.ch,oss22.lcg.cscs.ch,oss31.lcg.cscs.ch,oss32.lcg.cscs.ch,oss41.lcg.cscs.ch,oss42.lcg.cscs.ch

mmcrfs gpfs -F sdk.dsc.all -A yes -B 1M -D posix -E no -j scatter -k all -K always -m 2 -M 2 -r 2 -R 2 -n 200 -v no -T /gpfs
</verbatim>

Clients - create a new cluster:

<verbatim>
mmcrcluster -N wn46.lcg.cscs.ch:manager-quorum,wn36.lcg.cscs.ch:manager-quorum,wn26.lcg.cscs.ch:quorum,wn16.lcg.cscs.ch:quorum,wn13.lcg.cscs.ch:quorum,wn40.lcg.cscs.ch:quorum,wn28.lcg.cscs.ch:quorum -p wn46.lcg.cscs.ch -s wn36.lcg.cscs.ch -r /usr/bin/ssh -R /usr/bin/scp -C gpfsclients

mmchlicense server --accept -N wn46.lcg.cscs.ch,wn36.lcg.cscs.ch,wn26.lcg.cscs.ch,wn13.lcg.cscs.ch,wn40.lcg.cscs.ch,wn28.lcg.cscs.ch,wn16.lcg.cscs.ch

mmaddnode -N nodes
mmchlicense client --accept -N nodes

May 08 11:29 [root@wn46:~]# mmlscluster

GPFS cluster information
========================
  GPFS cluster name:         gpfsclients.lcg.cscs.ch
  GPFS cluster id:           10717231405418994821
  GPFS UID domain:           gpfsclients.lcg.cscs.ch
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp

GPFS cluster configuration servers:
-----------------------------------
  Primary server:    wn46.lcg.cscs.ch
  Secondary server:  wn36.lcg.cscs.ch

 Node  Daemon node name            IP address       Admin node name             Designation    
-----------------------------------------------------------------------------------------------
   1   wn46.lcg.cscs.ch            148.187.65.46    wn46.lcg.cscs.ch            quorum-manager
   2   wn36.lcg.cscs.ch            148.187.65.36    wn36.lcg.cscs.ch            quorum-manager
   3   wn26.lcg.cscs.ch            148.187.65.26    wn26.lcg.cscs.ch            quorum
   4   wn16.lcg.cscs.ch            148.187.65.16    wn16.lcg.cscs.ch            quorum
   5   wn13.lcg.cscs.ch            148.187.65.13    wn13.lcg.cscs.ch            quorum
   6   wn40.lcg.cscs.ch            148.187.65.40    wn40.lcg.cscs.ch            quorum
   7   wn28.lcg.cscs.ch            148.187.65.28    wn28.lcg.cscs.ch            quorum
   8   wn03.lcg.cscs.ch            148.187.65.3     wn03.lcg.cscs.ch            
   9   wn04.lcg.cscs.ch            148.187.65.4     wn04.lcg.cscs.ch            
  10   wn07.lcg.cscs.ch            148.187.65.7     wn07.lcg.cscs.ch            
  11   wn08.lcg.cscs.ch            148.187.65.8     wn08.lcg.cscs.ch            
  12   wn09.lcg.cscs.ch            148.187.65.9     wn09.lcg.cscs.ch            
  13   wn11.lcg.cscs.ch            148.187.65.11    wn11.lcg.cscs.ch            
  14   wn12.lcg.cscs.ch            148.187.65.12    wn12.lcg.cscs.ch            
  15   wn14.lcg.cscs.ch            148.187.65.14    wn14.lcg.cscs.ch            
  16   wn15.lcg.cscs.ch            148.187.65.15    wn15.lcg.cscs.ch            
  17   wn17.lcg.cscs.ch            148.187.65.17    wn17.lcg.cscs.ch            
  18   wn18.lcg.cscs.ch            148.187.65.18    wn18.lcg.cscs.ch            
  19   wn19.lcg.cscs.ch            148.187.65.19    wn19.lcg.cscs.ch            
  20   wn20.lcg.cscs.ch            148.187.65.20    wn20.lcg.cscs.ch            
  21   wn21.lcg.cscs.ch            148.187.65.21    wn21.lcg.cscs.ch            
  22   wn22.lcg.cscs.ch            148.187.65.22    wn22.lcg.cscs.ch            
  23   wn23.lcg.cscs.ch            148.187.65.23    wn23.lcg.cscs.ch            
  24   wn24.lcg.cscs.ch            148.187.65.24    wn24.lcg.cscs.ch            
  25   wn25.lcg.cscs.ch            148.187.65.25    wn25.lcg.cscs.ch            
  26   wn27.lcg.cscs.ch            148.187.65.27    wn27.lcg.cscs.ch            
  27   wn29.lcg.cscs.ch            148.187.65.29    wn29.lcg.cscs.ch            
  28   wn30.lcg.cscs.ch            148.187.65.30    wn30.lcg.cscs.ch            
  29   wn31.lcg.cscs.ch            148.187.65.31    wn31.lcg.cscs.ch            
  30   wn32.lcg.cscs.ch            148.187.65.32    wn32.lcg.cscs.ch            
  31   wn33.lcg.cscs.ch            148.187.65.33    wn33.lcg.cscs.ch            
  32   wn34.lcg.cscs.ch            148.187.65.34    wn34.lcg.cscs.ch            
  33   wn35.lcg.cscs.ch            148.187.65.35    wn35.lcg.cscs.ch            
  34   wn37.lcg.cscs.ch            148.187.65.37    wn37.lcg.cscs.ch            
  35   wn38.lcg.cscs.ch            148.187.65.38    wn38.lcg.cscs.ch            
  36   wn39.lcg.cscs.ch            148.187.65.39    wn39.lcg.cscs.ch            
  37   wn41.lcg.cscs.ch            148.187.65.41    wn41.lcg.cscs.ch            
  38   wn42.lcg.cscs.ch            148.187.65.42    wn42.lcg.cscs.ch            
  39   wn43.lcg.cscs.ch            148.187.65.43    wn43.lcg.cscs.ch            
  40   wn44.lcg.cscs.ch            148.187.65.44    wn44.lcg.cscs.ch            
  41   wn45.lcg.cscs.ch            148.187.65.45    wn45.lcg.cscs.ch            
  42   cream01.lcg.cscs.ch         148.187.66.43    cream01.lcg.cscs.ch         
  43   cream02.lcg.cscs.ch         148.187.66.44    cream02.lcg.cscs.ch         
  44   arc01.lcg.cscs.ch           148.187.67.10    arc01.lcg.cscs.ch           
  45   arc02.lcg.cscs.ch           148.187.66.40    arc02.lcg.cscs.ch
</verbatim>

now, set it up to connect to the remote cluster, on both sides:

Client Side:
<verbatim>
mmauth genkey new
mmchconfig cipherList=AUTHONLY

scp /var/mmfs/ssl/id_rsa.pub oss11:/root/gpfs_fs_creation/id_rsa.gpfsclients.pub

mmremotecluster add gpfs.lcg.cscs.ch -k id_rsa.gpfs.pub -n mds1.lcg.cscs.ch,mds2.lcg.cscs.ch
mmremotefs add gpfs -f /dev/gpfs -C gpfs.lcg.cscs.ch -T /gpfs

May 08 11:30 [root@wn46:~]# mmremotecluster show all
Cluster name:    gpfs.lcg.cscs.ch
Contact nodes:   mds1.lcg.cscs.ch,mds2.lcg.cscs.ch
SHA digest:      6d732758ebedeedd0c73eb87cf0d00cf1df9ef0f
File systems:    gpfs (gpfs)
</verbatim>

Server Side:
<verbatim>
mmauth genkey new
scp /var/mmfs/ssl/id_rsa.pub wn46:~/id_rsa.gpfs.pub
mmremotecluster add gpfsclients.lcg.cscs.ch -k id_rsa.gpfsclients.pub -n wn46.lcg.cscs.ch,wn36.lcg.cscs.ch

mmauth add gpfsclients.lcg.cscs.ch -k id_rsa.gpfsclients.pub

mmauth grant gpfsclients.lcg.cscs.ch -f gpfs
</verbatim>

---++ GPFS Metrics

We can use the mmpmon command to gather metrics about GPFS. A simple use would be as follows. Note these counters are cumulative!

We echo into the command as if we don't provide an input file we actually get a prompt within mmpmon.

<verbatim>
Oct 31 14:55 [root@wn01:~]# echo io_s | mmpmon -s
mmpmon node 148.187.65.1 name wn01 io_s OK
timestamp:      1383227732/587764
bytes read:     27845668669
bytes written:  19746297868
opens:               56460
closes:              55757
reads:               13513
writes:           10478999
readdir:           7276233
inode updates:      837741
</verbatim>

For command line usage you can use the following to gather interactive metrics, these counters are relative.

<verbatim>
Oct 31 14:54 [root@wn01:~]# gpfs_getio_s.ksh 
Started: Thu Oct 31 14:54:21 CET 2013
Sample Interval: 2 Seconds
Timestamp       ReadMB/s  WriteMB/s     F_open  f_close reads   writes  rdir    inode
1383227663        0.0     0.0           0       0       0       0       0       0
1383227665        0.0     0.0           0       0       0       0       0       0
</verbatim>

For metrics there is a script (/opt/cscs/libexec/gmetric-scripts/gpfs/gpfs_stats.sh) that is run every minute that feeds data to ganglia. 

Some notes about mmpmon

   * Timestamps are in EPOCH
   * When using an input file (-i flag) the separator for options is a newline
   * You can use fs_io_s rather than io_s to gather per filesystem metrics useful if you mount more than one gpfs filesystem on a host

-- Main.JasonTemple - 2011-08-17