Difference: NodeTypedCachet3fs13t3fs14 (32 vs. 33)

Revision 332016-11-04 - FabioMartinelli

Line: 1 to 1
 
META TOPICPARENT name="AdminArea"
Changed:
<
<

Checking HW failure by Nagios

Nagios will notice a HP HW failure :
>
>

Checking failures by Nagios

Nagios will notice a SW/HW failure :
 
Changed:
<
<
accordingly open a case on the HP Support WebSite
>
>
if it's a HW failure open a case on the HP Support WebSite
  These CLI tools show the status of HP components:
  • hpasmcli: Status about the HP HW
Line: 532 to 534
 RAID OK: Smart Array P410i in Slot 0 (Embedded) array A logicaldrive 1 (279.4 GB, RAID 1, OK) physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 300 GB, OK) physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 300 GB, OK) [Controller Status: OK Cache Status: OK Battery/Capacitor Status: OK] </>
<--/twistyPlugin-->
Changed:
<
<

10Gbit/s failure NEW

>
>

10Gbit/s failure - OLD, 10Gbs was on Fibre at that time

 We really got this case during 2014
If the 10Gb card will stop to work you have to:
Changed:
<
<
  • Unplug the Fibre
>
>
  • Unplug the Fibre ( nowadays it's a 10GbE Copper )
 
  • Try to unplug/wait 20s/plug the transceiver this was enough to fix
  • if that doesn't work try a clean server stop/ wait 60s/ server restart
  • If the 10Gb port is really broken try to use the other 10Gbit port by moving the transceiver and the server IP from eth0 to eth1
Line: 543 to 545
 

Installation

Changed:
<
<
Puppet coordinates: Fabio uses these aliases + Puppet recipes are in puppetdirnodes
>
>
Puppet coordinates: Fabio uses these aliases + Puppet recipes are in /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/manifests
 

Added:
>
>
alias ROOT='. /afs/cern.ch/sw/lcg/external/gcc/4.8/x86_64-slc6/setup.sh && . /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.26/x86_64-slc6-gcc48-opt/root/bin/thisroot.sh' alias cscsela='ssh -AX fmartine@ela.cscs.ch' alias cscslogin='ssh -AX fmartine@login.lcg.cscs.ch' alias cscspub='ssh -AX fmartinelli@pub.lcg.cscs.ch'
 alias dcache='ssh -2 -l admin -p 22224 t3dcachedb.psi.ch'
Added:
>
>
alias dcache04='ssh -2 -l admin -p 22224 t3dcachedb04.psi.ch' alias gempty='git commit --allow-empty-message -m '\'''\''' alias kscustom54='cd /afs/psi.ch/software/linux/dist/scientific/54/custom' alias kscustom57='cd /afs/psi.ch/software/linux/dist/scientific/57/custom' alias kscustom60='cd /afs/psi.ch/software/linux/dist/scientific/60/custom'
 alias kscustom64='cd /afs/psi.ch/software/linux/dist/scientific/64/custom'
Added:
>
>
alias kscustom66='cd /afs/psi.ch/software/linux/dist/scientific/66/x86_64/custom'
 alias ksdir='cd /afs/psi.ch/software/linux/kickstart/configs'
Changed:
<
<
alias puppetdir='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/' alias puppetdirnodes='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/manifests/nodes' alias puppetdirredhat='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/modules/Tier3/files/RedHat' alias puppetdirsolaris='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/modules/Tier3/files/Solaris/5.10'
>
>
alias ksprepostdir='cd /afs/psi.ch/software/linux/dist/scientific/60/kickstart/bin' alias l.='ls -d .* --color=auto' alias ll='ls -l --color=auto' alias ls='ls --color=tty' alias mc='. /usr/libexec/mc/mc-wrapper.sh' alias pdir='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/' alias pdirf='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/FabioDevelopment/' alias pdirmanifests='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/manifests/' alias pdirredhat='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/modules/Tier3/files/RedHat' alias pdirsolaris='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/modules/Tier3/files/Solaris/5.10' alias vi='vim' alias which='alias | /usr/bin/which --tty-only --read-alias --show-dot --show-tilde' alias yumdir5='cd /afs/psi.ch/software/linux/dist/scientific/57/scripts'
 alias yumdir6='cd /afs/psi.ch/software/linux/dist/scientific/6/scripts'
Added:
>
>
alias yumdir7='cd /afs/psi.ch/software/linux/dist/scientificlinux/7x/x86_64/Tier3/all' alias yumdir7old='cd /afs/psi.ch/software/linux/dist/scientific/70.PLEASE_DO_NOT_USE_AND_DO_NOT_RENAME/scripts'
 

dCache nowadays runs as the user dcache and not anymore as the user root so you might be hit by a permission denied.

Line: 585 to 608
  you need to reboot the server to make active the new FW.

OS installation

Changed:
<
<
The servers are installed like SL6 64bit by Kickstart + Puppet ; see the Puppet SL6_dcache_fs210_fs13fs14.pp file in puppetdirnodes
>
>
The servers are installed like SL6 64bit by Kickstart + Puppet ; see the Puppet SL6_dcache_fs215_fs13fs14.pp file in /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/manifests
 Be aware of the LVM installation upon the HW RAID 1.
[root@t3fs14 ~]# mount  | grep  ext4

Line: 605 to 628
 [root@t3fs14 ~]# find /sys | grep ql

LSI RDAC - Redundant Dual Active Controller

Changed:
<
<
The HP DL380 G7 servers own 2 Dual Port Qlogic 8Gbit/s FC type QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02), Tot 4 ports like shown in the following picture; 2 ports are connected to the NetApp E5400 and 2 ports are connected to the SGI IS5500
>
>
The HP DL380 G7 servers feature 2 Dual Port Qlogic 8Gbit/s FC type QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02), Tot 4 ports like shown in the following picture; 2 ports are connected to the NetApp E5400 and 2 ports are connected to the SGI IS5500
 QlogicDualChannel8GbitFC.JPG
To allow to the XFS filesystems to exploit the 2 distinct paths to the LUNs Linux has to be configured to aggregate the paths in a single virtual path and monitor if one of these paths is down and accordingly exclude the I/O from it; afaik there are 2 major tools on Red Hat to create the virtual path, the former is based on the RHEL6 Multipath Daemon while the latter is based on the LSI RDAC driver; because the LSI RDAC driver is the official tool provided by LSI, that's also the producer of the IS5500 RAID controllers, I've decided to use that one. Also I got that the RHEL6 multipath daemon in turn uses the LSI RDAC driver but it offers a driver independent configuration interface ; since I don't mind about being exposed to the LSI driver details while on the other hand I care about using the latest stable version of the LSI driver I've decided to avoid another software layer and directly compile and use the LSI RDAC driver.

Line: 1105 to 1128
 mpp 1:0:0:3: rejecting I/O to offline device mpp 1:0:0:3: rejecting I/O to offline device mpp 1:0:0:4: rejecting I/O to offline device
Changed:
<
<
mpp 1:0:0:5: rejecting I/O to offline device mpp 1:0:0:5: rejecting I/O to offline device mpp 1:0:0:5: rejecting I/O to offline device mpp 1:0:0:7: rejecting I/O to offline device mpp 1:0:0:7: rejecting I/O to offline device mpp 1:0:0:7: rejecting I/O to offline device mpp 1:0:0:7: rejecting I/O to offline device mpp 1:0:0:9: rejecting I/O to offline device mpp 1:0:0:9: rejecting I/O to offline device mpp 1:0:0:9: rejecting I/O to offline device mpp 1:0:0:9: rejecting I/O to offline device mpp 1:0:0:11: rejecting I/O to offline device mpp 1:0:0:11: rejecting I/O to offline device mpp 1:0:0:11: rejecting I/O to offline device mpp 1:0:0:11: rejecting I/O to offline device
>
>
...
 94 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:1 Selection Retry count exhausted 7 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0 Path Failed 496 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:1 No new path: fall to failover controller case. vcmnd SN 154546623 pdev H1:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
Line: 1223 to 1232
 To inquire about the HW status ( fans, temps, .. ) there are some RPMs that must to be installed, they are present inside /afs/psi.ch/software/linux/dist/scientific/6/others/all/.
[root@t3fs13 ~]# rpm -qa | grep hp 

Added:
>
>
hp-health-9.25-1551.9.rhel6.x86_64
 hpsmh-6.3.1-23.x86_64 hpacucli-9.30-15.0.x86_64
Changed:
<
<
hp-health-9.25-1551.9.rhel6.x86_64
>
>
  Services:

Line: 1816 to 1826
 
FORM FIELD Services Services dcache pool cells, gridftp, dcap, gsidcap
FORM FIELD Hardware Hardware HP Proliant DL380 G7
FORM FIELD Install Profile InstallProfile fs13fs14
Changed:
<
<
FORM FIELD Guarantee/maintenance until Guaranteemaintenanceuntil 2015-01-14
>
>
FORM FIELD Guarantee/maintenance until Guaranteemaintenanceuntil 31-07-2018
 
META FILEATTACHMENT attachment="QlogicDualChannel8GbitFC.JPG" attr="" comment="HP G7 DL380 Qlogic Dual Channel 8Gbit/s FC" date="1332422442" name="QlogicDualChannel8GbitFC.JPG" path="QlogicDualChannel8GbitFC.JPG" size="2096788" user="fabiom" version="1"
META FILEATTACHMENT attachment="ols2009-pages-169-1842.pdf" attr="" comment="IBM Paper about 10Gbit/s links and Linux." date="1332431965" name="ols2009-pages-169-1842.pdf" path="ols2009-pages-169-1842.pdf" size="196046" user="fabiom" version="1"
META FILEATTACHMENT attachment="13201_div.pdf" attr="" comment="Manual of - Smart Array Controller FBWC 1GB Flash Backed Cache 8 only to P410i" date="1350378493" name="13201_div.pdf" path="13201_div.pdf" size="932236" user="fabiom" version="1"
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback