Node Type: dCachet3fs13t3fs14
Firewall requirements
local port |
open to |
reason |
2811/tcp |
* |
gridftp control connection |
22125/tcp |
192.33.123.0/24 |
unauthenticated dcap (read only) |
22128/tcp |
192.33.123.0/24 |
gsidcap (GSI authenticated dcap) |
20000-25000/tcp |
* |
Globus port range for gridftp/xrootd data streams |
Regular Maintenance work
Once in production the configuration is basically frozen, so simply keep an eye on our Nagios
t3fs13,
t3fs14. There you'll see also the pnp4nagios graphs about the XFS filesystems usage.
Emergency Measures
HP warranties
- Serial number
t3fs13
: CZ31513SBR , Product number t3fs13
: 583914-B21
- Serial number
t3fs14
: CZ31513SBT , Product number t3fs14
: 583914-B21
- Product description: HP ProLiant DL380 G7 Server
- Date of warranty check : 2011-12-16
- Entitlement type: Base Warranty
- Start date: 2011-12-16
- Title: Wty: HP HW Maintenance Onsite SupportSupport
- Status: Active
- Start date: Dec 16, 2011
- End date: Jan 14, 2015
- Service level: Standard Parts Logistics
- Deliverables: Onsite Support Parts and Material provided Hardware Problem Diagnosis
- Title: Wty: HP Support for Initial SetupSupport
- Status: Active
- Start date: Dec 16, 2011
- End date: Apr 13, 2012
- Service level: Unlimited Named Callers
- Deliverables: Initial Setup Assistance
Generic HW failure
If the HW will fail then you will receive an e-mail from
t3nagios; according to the kind of HW failure open a case on the
HP Support WebSite
These are the tools for getting the status of the HP components:
-
hpasmcli
: Status about the server HW
-
hpacucli
: specific for the RAID controllers
10Gbit/s failure
We really got this case during 2014
If the 10Gb card will stop to work you have to:
- Remove the Fibre
- Try to unplug/wait 20s/plug the transceiver this was enough to fix
- if that doesn't work try a clean server stop/ wait 60s/ server restart
- If the port seems really broken try to use the other 10Gbit port by moving the transceiver and the server IP from
eth0
to eth1
- If it's broken not the single port but the whole 10Gb board then connect 4 cables from the 4 onboard Gigabit ports to the Switch, create a Linux bonding mode=6 and move onto the bonding device the server IP.
Installation
Puppet coordinates: Fabio uses these aliases and Puppet recipes are in puppetdirnodes
alias dcache='ssh -2 -l admin -p 22224 t3dcachedb.psi.ch'
alias kscustom57='cd /afs/psi.ch/software/linux/dist/scientific/57/custom'
alias kscustom64='cd /afs/psi.ch/software/linux/dist/scientific/64/custom'
alias ksdir='cd /afs/psi.ch/software/linux/kickstart/configs'
alias puppetdir='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/'
alias puppetdirnodes='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/manifests/nodes'
alias puppetdirredhat='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/modules/Tier3/files/RedHat'
alias puppetdirsolaris='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/modules/Tier3/files/Solaris/5.10'
alias yumdir5='cd /afs/psi.ch/software/linux/dist/scientific/57/scripts'
alias yumdir6='cd /afs/psi.ch/software/linux/dist/scientific/6/scripts'
A MUST NOTE: dCache runs as the user dcache
not anymore as the user root
so you might be hit by a permission denied.
The 2 HP Proliant G7 DL380 servers
t3fs[13,14]
are the dCache 10Gbit/s gateway to the 6+6 XFS 22TB filesystems offered by our
IS5500 360TB storage; before to study the SW details is worth to get an overview about the HW details;
HP Website about these server,
HP Bulletin about these servers. These servers are HW RAID 1 protected.
HW Raid controller
The servers own 8 external 2.5" hot-plug disks slots managed by a PCI-e x8 1GB RAM Raid controller type
Smart Array P410i, please read its manual.
Follows an example of RAID Controller FW update made on 27th Dec 2012:
[root@t3fs14 CP017907.xmlhpsetupLdrImage.binlo100flashlo100.sh]# ./hpsetup
HP Enclosure ROM Flash.
Flash Engine Version: 2.06.10
Copyright (c) 2006-2009 Hewlett-Packard Development Company L.P.
Device [P410i]: FW Ver [ Current:5.14 | Apply:5.70 ?]Flash this device? [NO, yes, quit] yes
Preparing to flash devices on the array controller...
Requesting flash - this could take up to 15 minutes...
Flash complete.
The array flash operation succeeded.
Device [P410i]: FW Ver [ Current:5.14 | Apply:5.70 ?]Flash this device? [NO, yes, quit] yes
Preparing to flash devices on the array controller...
Requesting flash - this could take up to 15 minutes...
Flash complete.
The array flash operation succeeded.
you need to reboot the server to make active the new FW.
OS installation
The servers are installed like SL6 64bit by our common Kickstart + Puppet installation method; see the Puppet
t3fs13fs14
file to see what's installed in terms of RPMs and configuration files.
An important detail to be aware of is the usage of
LVM
over the HW RAID 1. That allows a better allocation of space, now and in the future. In this moment there are still
~200GB
of space that can be used.
[root@t3fs14 ~]# mount | grep ext4
/dev/sda2 on / type ext4 (rw)
/dev/sda1 on /boot type ext4 (rw)
/dev/mapper/vg_local_raid1-opt on /opt type ext4 (rw,nosuid,nodev,noatime,barrier)
/dev/mapper/vg_local_raid1-var on /var type ext4 (rw,noexec,nosuid,nodev,noatime,nobarrier)
/dev/mapper/vg_local_raid1-tmp on /tmp type ext4 (rw,noexec,nosuid,nodev,noatime,nobarrier)
qla2xxx driver
Qlogic drivers, see the next
LSI RDAC
chapter for the details:
[root@t3fs14 ~]# lsmod | grep ql
qla2xxx 366369 50
scsi_transport_fc 52241 1 qla2xxx
[root@t3fs14 ~]# find /sys | grep ql
LSI RDAC - Redundant Dual Active Controller
*Need to adapt this discussion to the new
NetApp E5400*
These HP servers own 2 Dual Port Qlogic 8Gbit/s FC type
QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02)
, Tot 4 ports like showed in the following picture;
To allow to the
XFS
filesystem to exploit these 4 distinct paths to the LUNs offered by the
IS5500 Linux needs to be configured to aggregate the paths in a
single virtual path and monitor if one of these paths is down and exclude the
I/O
from it; afaik there are 2 major tools on Red Hat to create the virtual path, the former is based on the
RHEL6
Multipath Daemon while the latter is based on the
LSI RDAC
driver; because the
LSI RDAC
driver is the official tool provided by LSI, that's also the producer of the
IS5500 RAID controllers, I've decided to use that. Also I got that the
RHEL6
multipath daemon will exploit its own version of
LSI RDAC
but by offering a driver independent configuration interface, nevertheless I've decided to avoid an other software layer and use natively
LSI RDAC
.
To install
LSI RDAC
you need the RPM
kernel-devel
and create a special
initrd
like
/boot/mpp-2.6.32-220.4.1.el6.x86_64.img
by compiling the SW inside
/root/rdac/linuxrdac-09.03.0C08.0535
; that also means that
if a kernel update/upgrade is needed then a new initrd file produced by LSI RDAC is needed as well. Furthermore take into account that in the future new versions of
LSI RDAC
driver itself will be released by
SGI
, so it's worth to check if it's available a new
LSI RDAC
during a kernel update/upgrade operation and exploit the scheduled server downtime to update the kernel and
LSI RDAC
at the same time.
All this said the online documentation about this important topic is poor!
this is one of the best overview that I've found.
mpp drivers
[root@t3fs14 ~]# lsmod | grep mpp
mppVhba 138253 12
mppUpper 156950 1 mppVhba
[root@t3fs14 ~]# find /sys | grep mpp
LUNs # to SCSI disk labels corrispondence
To easily map an
sd*
Linux label to the related
NetApp LUN
More... Close
[root@t3fs13 ~]# /opt/mpp/lsvdev
Array Name Lun sd device
-------------------------------------
T3_CMS_SGI_STORAGE 1 -> /dev/sdb
T3_CMS_SGI_STORAGE 2 -> /dev/sdc
T3_CMS_SGI_STORAGE 3 -> /dev/sdd
T3_CMS_SGI_STORAGE 4 -> /dev/sde
T3_CMS_SGI_STORAGE 5 -> /dev/sdf
T3_CMS_SGI_STORAGE 6 -> /dev/sdg
T3_CMS_SGI_STORAGE 7 -> /dev/sdh
T3_CMS_SGI_STORAGE 8 -> /dev/sdi
T3_CMS_SGI_STORAGE 9 -> /dev/sdj
T3_CMS_SGI_STORAGE 10 -> /dev/sdk
T3_CMS_SGI_STORAGE 11 -> /dev/sdl
T3_CMS_SGI_STORAGE 12 -> /dev/sdm
T3_CMS_E5460_01 1 -> /dev/sdn
T3_CMS_E5460_01 2 -> /dev/sdo
T3_CMS_E5460_01 3 -> /dev/sdp
T3_CMS_E5460_01 4 -> /dev/sdq
T3_CMS_E5460_01 5 -> /dev/sdr
T3_CMS_E5460_01 6 -> /dev/sds
T3_CMS_E5460_01 7 -> /dev/sdt
T3_CMS_E5460_01 8 -> /dev/sdu
T3_CMS_E5460_01 9 -> /dev/sdv
T3_CMS_E5460_01 10 -> /dev/sdw
T3_CMS_E5460_01 11 -> /dev/sdx
T3_CMS_E5460_01 12 -> /dev/sdy
mppUtil utility
The
mppUtil
tool can be used to interact with the
LSI RDAC
drivers, run
man mppUtill
to read its manual, following some outputs:
[root@t3fs13 ~]# mppUtil -a
Hostname = t3fs13.psi.ch
Domainname = (none)
Time = GMT 11/28/2013 10:06:32
---------------------------------------------------------------
Info of Array Module's seen by this Host.
---------------------------------------------------------------
ID WWN Type Name
---------------------------------------------------------------
0 60080e50001f98f0000000004f20e355 FC T3_CMS_SGI_STORAGE
1 60080e50001fe1500000000051f234a6 FC T3_CMS_E5460_01
---------------------------------------------------------------
connected LUNs
More... Close
Hostname = t3fs13.psi.ch
Domainname = (none)
Time = GMT 11/28/2013 10:07:14
MPP Information:
----------------
ModuleName: T3_CMS_SGI_STORAGE SingleController: N
VirtualTargetID: 0x000 ScanTriggered: N
ObjectCount: 0x000 AVTEnabled: N
WWN: 60080e50001f98f0000000004f20e355 RestoreCfg: N
ModuleHandle: none Page2CSubPage: Y
FirmwareVersion: 7.86.33.xx FailoverMethod: C
ScanTaskState: 0x00000000
LBPolicy: LeastQueueDepth
ProtectionType: 0
Controller 'A' Status:
-----------------------
ControllerHandle: none ControllerPresent: Y
UTMLunExists: N Failed: N
NumberOfPaths: 1 FailoverInProg: N
ServiceMode: N
Path #1
---------
DirectoryVertex: present Present: Y
PathState: OPTIMAL
PathId: 77010000 (hostId: 1, channelId: 0, targetId: 0)
ProtCapability: 0
Controller 'B' Status:
-----------------------
ControllerHandle: none ControllerPresent: Y
UTMLunExists: N Failed: N
NumberOfPaths: 1 FailoverInProg: N
ServiceMode: N
Path #1
---------
DirectoryVertex: present Present: Y
PathState: OPTIMAL
PathId: 77030000 (hostId: 3, channelId: 0, targetId: 0)
ProtCapability: 0
Lun Information
---------------
Lun #1 - WWN: 60080e50001f98f000000652523271b2
----------------
LunObject: present CurrentOwningPath: A
RemoveEligible: N BootOwningPath: A
NotConfigured: N PreferredPath: A
DevState: OPTIMAL ReportedPresent: Y
ReportedMissing: N
NeedsReservationCheck: N
TASBitSet: Y
NotReady: N
Busy: N
Quiescent: N
VD_Ownership_Transfer_Attempt_Count: 0
ProtectionType: 0
Controller 'A' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Controller 'B' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Lun #2 - WWN: 60080e50001f9920000001014f28bdc4
----------------
LunObject: present CurrentOwningPath: B
RemoveEligible: N BootOwningPath: B
NotConfigured: N PreferredPath: B
DevState: OPTIMAL ReportedPresent: Y
ReportedMissing: N
NeedsReservationCheck: N
TASBitSet: Y
NotReady: N
Busy: N
Quiescent: N
VD_Ownership_Transfer_Attempt_Count: 0
ProtectionType: 0
Controller 'A' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Controller 'B' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Lun #3 - WWN: 60080e50001f98f00000014f4f28bd6d
----------------
LunObject: present CurrentOwningPath: A
RemoveEligible: N BootOwningPath: A
NotConfigured: N PreferredPath: A
DevState: OPTIMAL ReportedPresent: Y
ReportedMissing: N
NeedsReservationCheck: N
TASBitSet: Y
NotReady: N
Busy: N
Quiescent: N
VD_Ownership_Transfer_Attempt_Count: 0
ProtectionType: 0
Controller 'A' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Controller 'B' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Lun #4 - WWN: 60080e50001f9920000001044f28bdd1
----------------
LunObject: present CurrentOwningPath: B
RemoveEligible: N BootOwningPath: B
NotConfigured: N PreferredPath: B
DevState: OPTIMAL ReportedPresent: Y
ReportedMissing: N
NeedsReservationCheck: N
TASBitSet: Y
NotReady: N
Busy: N
Quiescent: N
VD_Ownership_Transfer_Attempt_Count: 0
ProtectionType: 0
Controller 'A' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Controller 'B' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Lun #5 - WWN: 60080e50001f98f0000002264f30e909
----------------
LunObject: present CurrentOwningPath: A
RemoveEligible: N BootOwningPath: A
NotConfigured: N PreferredPath: A
DevState: OPTIMAL ReportedPresent: Y
ReportedMissing: N
NeedsReservationCheck: N
TASBitSet: Y
NotReady: N
Busy: N
Quiescent: N
VD_Ownership_Transfer_Attempt_Count: 0
ProtectionType: 0
Controller 'A' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Controller 'B' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Lun #6 - WWN: 60080e50001f9920000001074f28bddf
----------------
LunObject: present CurrentOwningPath: B
RemoveEligible: N BootOwningPath: B
NotConfigured: N PreferredPath: B
DevState: OPTIMAL ReportedPresent: Y
ReportedMissing: N
NeedsReservationCheck: N
TASBitSet: Y
NotReady: N
Busy: N
Quiescent: N
VD_Ownership_Transfer_Attempt_Count: 0
ProtectionType: 0
Controller 'A' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Controller 'B' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Lun #7 - WWN: 60080e50001f98f00000065052327180
----------------
LunObject: present CurrentOwningPath: A
RemoveEligible: N BootOwningPath: A
NotConfigured: N PreferredPath: A
DevState: OPTIMAL ReportedPresent: Y
ReportedMissing: N
NeedsReservationCheck: N
TASBitSet: Y
NotReady: N
Busy: N
Quiescent: N
VD_Ownership_Transfer_Attempt_Count: 0
ProtectionType: 0
Controller 'A' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Controller 'B' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Lun #8 - WWN: 60080e50001f9920000004d14f4c7cdd
----------------
LunObject: present CurrentOwningPath: B
RemoveEligible: N BootOwningPath: B
NotConfigured: N PreferredPath: B
DevState: OPTIMAL ReportedPresent: Y
ReportedMissing: N
NeedsReservationCheck: N
TASBitSet: Y
NotReady: N
Busy: N
Quiescent: N
VD_Ownership_Transfer_Attempt_Count: 0
ProtectionType: 0
Controller 'A' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Controller 'B' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Lun #9 - WWN: 60080e50001f98f0000005084f4c7cfc
----------------
LunObject: present CurrentOwningPath: A
RemoveEligible: N BootOwningPath: A
NotConfigured: N PreferredPath: A
DevState: OPTIMAL ReportedPresent: Y
ReportedMissing: N
NeedsReservationCheck: N
TASBitSet: Y
NotReady: N
Busy: N
Quiescent: N
VD_Ownership_Transfer_Attempt_Count: 0
ProtectionType: 0
Controller 'A' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Controller 'B' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Lun #10 - WWN: 60080e50001f9920000004d34f4c7d10
----------------
LunObject: present CurrentOwningPath: B
RemoveEligible: N BootOwningPath: B
NotConfigured: N PreferredPath: B
DevState: OPTIMAL ReportedPresent: Y
ReportedMissing: N
NeedsReservationCheck: N
TASBitSet: Y
NotReady: N
Busy: N
Quiescent: N
VD_Ownership_Transfer_Attempt_Count: 0
ProtectionType: 0
Controller 'A' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Controller 'B' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Lun #11 - WWN: 60080e50001f98f00000050a4f4c7d33
----------------
LunObject: present CurrentOwningPath: A
RemoveEligible: N BootOwningPath: A
NotConfigured: N PreferredPath: A
DevState: OPTIMAL ReportedPresent: Y
ReportedMissing: N
NeedsReservationCheck: N
TASBitSet: Y
NotReady: N
Busy: N
Quiescent: N
VD_Ownership_Transfer_Attempt_Count: 0
ProtectionType: 0
Controller 'A' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Controller 'B' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Lun #12 - WWN: 60080e50001f9920000004d54f4c7d5c
----------------
LunObject: present CurrentOwningPath: B
RemoveEligible: N BootOwningPath: B
NotConfigured: N PreferredPath: B
DevState: OPTIMAL ReportedPresent: Y
ReportedMissing: N
NeedsReservationCheck: N
TASBitSet: Y
NotReady: N
Busy: N
Quiescent: N
VD_Ownership_Transfer_Attempt_Count: 0
ProtectionType: 0
Controller 'A' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
Controller 'B' Path
--------------------
NumLunObjects: 1 RoundRobinIndex: 0
Path #1: LunPathDevice: present
DevState: OPTIMAL
RemoveState: 0x0 StartState: 0x1 PowerState: 0x0
What during 1 FC cable/port failure ?
Because of the 2 redundant paths to the
SGIIS5500andNetAppE5400 the XFS filesystems will stay mounted
; a
dmesg
will 1st report :
More... Close
qla2xxx [0000:0b:00.0]-500b:1: LOOP DOWN detected (2 3 0 0).
rport-1:0-0: blocked FC remote port time out: removing target and saving binding
mpp 1:0:0:1: rejecting I/O to offline device
mpp 1:0:0:1: rejecting I/O to offline device
mpp 1:0:0:1: rejecting I/O to offline device
mpp 1:0:0:1: rejecting I/O to offline device
mpp 1:0:0:1: rejecting I/O to offline device
mpp 1:0:0:1: rejecting I/O to offline device
mpp 1:0:0:1: rejecting I/O to offline device
mpp 1:0:0:2: rejecting I/O to offline device
mpp 1:0:0:3: rejecting I/O to offline device
mpp 1:0:0:3: rejecting I/O to offline device
mpp 1:0:0:3: rejecting I/O to offline device
mpp 1:0:0:3: rejecting I/O to offline device
mpp 1:0:0:4: rejecting I/O to offline device
mpp 1:0:0:5: rejecting I/O to offline device
mpp 1:0:0:5: rejecting I/O to offline device
mpp 1:0:0:5: rejecting I/O to offline device
mpp 1:0:0:7: rejecting I/O to offline device
mpp 1:0:0:7: rejecting I/O to offline device
mpp 1:0:0:7: rejecting I/O to offline device
mpp 1:0:0:7: rejecting I/O to offline device
mpp 1:0:0:9: rejecting I/O to offline device
mpp 1:0:0:9: rejecting I/O to offline device
mpp 1:0:0:9: rejecting I/O to offline device
mpp 1:0:0:9: rejecting I/O to offline device
mpp 1:0:0:11: rejecting I/O to offline device
mpp 1:0:0:11: rejecting I/O to offline device
mpp 1:0:0:11: rejecting I/O to offline device
mpp 1:0:0:11: rejecting I/O to offline device
94 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:1 Selection Retry count exhausted
7 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0 Path Failed
496 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:1 No new path: fall to failover controller case. vcmnd SN 154546623 pdev H1:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
497 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:1 Failed controller to 1. retry. vcmnd SN 154546623 pdev H1:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
94 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:1 Selection Retry count exhausted
496 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:1 No new path: fall to failover controller case. vcmnd SN 154546609 pdev H1:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
497 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:1 Failed controller to 1. retry. vcmnd SN 154546609 pdev H1:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
94 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:1 Selection Retry count exhausted
496 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:1 No new path: fall to failover controller case. vcmnd SN 154546608 pdev H1:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
497 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:1 Failed controller to 1. retry. vcmnd SN 154546608 pdev H1:C0:T0:L1 0x00/0x00/0x00 0x00010000 mpp_status:6
94 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:3 Selection Retry count exhausted
496 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:3 No new path: fall to failover controller case. vcmnd SN 154546622 pdev H1:C0:T0:L3 0x00/0x00/0x00 0x00010000 mpp_status:6
497 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:3 Failed controller to 1. retry. vcmnd SN 154546622 pdev H1:C0:T0:L3 0x00/0x00/0x00 0x00010000 mpp_status:6
94 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:3 Selection Retry count exhausted
496 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:3 No new path: fall to failover controller case. vcmnd SN 154546599 pdev H1:C0:T0:L3 0x00/0x00/0x00 0x00010000 mpp_status:6
497 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:3 Failed controller to 1. retry. vcmnd SN 154546599 pdev H1:C0:T0:L3 0x00/0x00/0x00 0x00010000 mpp_status:6
94 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:3 Selection Retry count exhausted
496 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:3 No new path: fall to failover controller case. vcmnd SN 154546598 pdev H1:C0:T0:L3 0x00/0x00/0x00 0x00010000 mpp_status:6
497 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:3 Failed controller to 1. retry. vcmnd SN 154546598 pdev H1:C0:T0:L3 0x00/0x00/0x00 0x00010000 mpp_status:6
94 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:5 Selection Retry count exhausted
496 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:5 No new path: fall to failover controller case. vcmnd SN 154546621 pdev H1:C0:T0:L5 0x00/0x00/0x00 0x00010000 mpp_status:6
497 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:5 Failed controller to 1. retry. vcmnd SN 154546621 pdev H1:C0:T0:L5 0x00/0x00/0x00 0x00010000 mpp_status:6
94 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:5 Selection Retry count exhausted
496 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:5 No new path: fall to failover controller case. vcmnd SN 154546602 pdev H1:C0:T0:L5 0x00/0x00/0x00 0x00010000 mpp_status:6
497 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:5 Failed controller to 1. retry. vcmnd SN 154546602 pdev H1:C0:T0:L5 0x00/0x00/0x00 0x00010000 mpp_status:6
94 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:5 Selection Retry count exhausted
496 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:5 No new path: fall to failover controller case. vcmnd SN 154546601 pdev H1:C0:T0:L5 0x00/0x00/0x00 0x00010000 mpp_status:6
497 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:5 Failed controller to 1. retry. vcmnd SN 154546601 pdev H1:C0:T0:L5 0x00/0x00/0x00 0x00010000 mpp_status:6
94 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:7 Selection Retry count exhausted
496 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:7 No new path: fall to failover controller case. vcmnd SN 154546550 pdev H1:C0:T0:L7 0x00/0x00/0x00 0x00010000 mpp_status:6
497 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:7 Failed controller to 1. retry. vcmnd SN 154546550 pdev H1:C0:T0:L7 0x00/0x00/0x00 0x00010000 mpp_status:6
94 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:7 Selection Retry count exhausted
496 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:7 No new path: fall to failover controller case. vcmnd SN 154546549 pdev H1:C0:T0:L7 0x00/0x00/0x00 0x00010000 mpp_status:6
497 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:7 Failed controller to 1. retry. vcmnd SN 154546549 pdev H1:C0:T0:L7 0x00/0x00/0x00 0x00010000 mpp_status:6
94 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:7 Selection Retry count exhausted
496 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:7 No new path: fall to failover controller case. vcmnd SN 154546548 pdev H1:C0:T0:L7 0x00/0x00/0x00 0x00010000 mpp_status:6
497 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:7 Failed controller to 1. retry. vcmnd SN 154546548 pdev H1:C0:T0:L7 0x00/0x00/0x00 0x00010000 mpp_status:6
94 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:7 Selection Retry count exhausted
496 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:7 No new path: fall to failover controller case. vcmnd SN 154546547 pdev H1:C0:T0:L7 0x00/0x00/0x00 0x00010000 mpp_status:6
497 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:7 Failed controller to 1. retry. vcmnd SN 154546547 pdev H1:C0:T0:L7 0x00/0x00/0x00 0x00010000 mpp_status:6
94 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:9 Selection Retry count exhausted
496 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:9 No new path: fall to failover controller case. vcmnd SN 154546558 pdev H1:C0:T0:L9 0x00/0x00/0x00 0x00010000 mpp_status:6
497 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:9 Failed controller to 1. retry. vcmnd SN 154546558 pdev H1:C0:T0:L9 0x00/0x00/0x00 0x00010000 mpp_status:6
94 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:9 Selection Retry count exhausted
496 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:9 No new path: fall to failover controller case. vcmnd SN 154546557 pdev H1:C0:T0:L9 0x00/0x00/0x00 0x00010000 mpp_status:6
497 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:9 Failed controller to 1. retry. vcmnd SN 154546557 pdev H1:C0:T0:L9 0x00/0x00/0x00 0x00010000 mpp_status:6
94 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:9 Selection Retry count exhausted
496 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:9 No new path: fall to failover controller case. vcmnd SN 154546556 pdev H1:C0:T0:L9 0x00/0x00/0x00 0x00010000 mpp_status:6
497 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:9 Failed controller to 1. retry. vcmnd SN 154546556 pdev H1:C0:T0:L9 0x00/0x00/0x00 0x00010000 mpp_status:6
94 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:9 Selection Retry count exhausted
496 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:9 No new path: fall to failover controller case. vcmnd SN 154546555 pdev H1:C0:T0:L9 0x00/0x00/0x00 0x00010000 mpp_status:6
497 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:9 Failed controller to 1. retry. vcmnd SN 154546555 pdev H1:C0:T0:L9 0x00/0x00/0x00 0x00010000 mpp_status:6
94 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:11 Selection Retry count exhausted
496 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:11 No new path: fall to failover controller case. vcmnd SN 154546566 pdev H1:C0:T0:L11 0x00/0x00/0x00 0x00010000 mpp_status:6
497 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:11 Failed controller to 1. retry. vcmnd SN 154546566 pdev H1:C0:T0:L11 0x00/0x00/0x00 0x00010000 mpp_status:6
94 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:11 Selection Retry count exhausted
496 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:11 No new path: fall to failover controller case. vcmnd SN 154546565 pdev H1:C0:T0:L11 0x00/0x00/0x00 0x00010000 mpp_status:6
497 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:11 Failed controller to 1. retry. vcmnd SN 154546565 pdev H1:C0:T0:L11 0x00/0x00/0x00 0x00010000 mpp_status:6
94 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:11 Selection Retry count exhausted
496 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:11 No new path: fall to failover controller case. vcmnd SN 154546564 pdev H1:C0:T0:L11 0x00/0x00/0x00 0x00010000 mpp_status:6
497 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:11 Failed controller to 1. retry. vcmnd SN 154546564 pdev H1:C0:T0:L11 0x00/0x00/0x00 0x00010000 mpp_status:6
94 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:11 Selection Retry count exhausted
496 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:11 No new path: fall to failover controller case. vcmnd SN 154546563 pdev H1:C0:T0:L11 0x00/0x00/0x00 0x00010000 mpp_status:6
497 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:0:0:11 Failed controller to 1. retry. vcmnd SN 154546563 pdev H1:C0:T0:L11 0x00/0x00/0x00 0x00010000 mpp_status:6
10 [RAIDarray.mpp]T3_CMS_SGI_STORAGE:1 Failover command issued
746 [RAIDarray.mpp]Device(0x12e66000) is already removed, cannot send Synchronous IO request
746 [RAIDarray.mpp]Device(0x11791800) is already removed, cannot send Synchronous IO request
746 [RAIDarray.mpp]Device(0x11791000) is already removed, cannot send Synchronous IO request
746 [RAIDarray.mpp]Device(0x11790800) is already removed, cannot send Synchronous IO request
746 [RAIDarray.mpp]Device(0x11790000) is already removed, cannot send Synchronous IO request
746 [RAIDarray.mpp]Device(0x117d9800) is already removed, cannot send Synchronous IO request
746 [RAIDarray.mpp]Device(0x117d9000) is already removed, cannot send Synchronous IO request
746 [RAIDarray.mpp]Device(0x1271a800) is already removed, cannot send Synchronous IO request
801 [RAIDarray.mpp]Failover succeeded to T3_CMS_SGI_STORAGE:1
then when you'll reconnect the FC cable :
More... Close
qla2xxx [0000:0b:00.0]-500a:1: LOOP UP detected (8 Gbps).
scsi 1:0:0:0: Direct-Access LSI INF-01-00 0786 PQ: 1 ANSI: 5
736 [RAIDarray.mpp]Host 1 Target 0 Lun 0 Is a physical device but is an Unconfigured Device.
scsi 1:0:0:1: Direct-Access LSI INF-01-00 0786 PQ: 0 ANSI: 5
scsi 1:0:0:2: Direct-Access LSI INF-01-00 0786 PQ: 0 ANSI: 5
scsi 1:0:0:3: Direct-Access LSI INF-01-00 0786 PQ: 0 ANSI: 5
scsi 1:0:0:4: Direct-Access LSI INF-01-00 0786 PQ: 0 ANSI: 5
scsi 1:0:0:5: Direct-Access LSI INF-01-00 0786 PQ: 0 ANSI: 5
scsi 1:0:0:6: Direct-Access LSI INF-01-00 0786 PQ: 0 ANSI: 5
scsi 1:0:0:7: Direct-Access LSI INF-01-00 0786 PQ: 0 ANSI: 5
scsi 1:0:0:8: Direct-Access LSI INF-01-00 0786 PQ: 0 ANSI: 5
scsi 1:0:0:9: Direct-Access LSI INF-01-00 0786 PQ: 0 ANSI: 5
scsi 1:0:0:10: Direct-Access LSI INF-01-00 0786 PQ: 0 ANSI: 5
scsi 1:0:0:11: Direct-Access LSI INF-01-00 0786 PQ: 0 ANSI: 5
scsi 1:0:0:12: Direct-Access LSI INF-01-00 0786 PQ: 0 ANSI: 5
dCache won't notice anything !
iLO3 internal configuration
HPProLiantDL380G7ILO3
iLO3 external configuration ( OS involved )
To inquire about the HW status ( fans, temps, .. ) there are some RPMs that must to be installed, they are present inside
/afs/psi.ch/software/linux/dist/scientific/6/others/all/
.
[root@t3fs13 ~]# rpm -qa | grep hp
hpsmh-6.3.1-23.x86_64
hpacucli-9.30-15.0.x86_64
hp-health-9.25-1551.9.rhel6.x86_64
Services:
[root@t3fs13 ~]# chkconfig --list | grep hp
hp-asrd 0:off 1:off 2:on 3:on 4:on 5:on 6:off
hp-health 0:off 1:off 2:on 3:on 4:on 5:on 6:off
hpsmhd 0:off 1:off 2:off 3:off 4:off 5:off 6:off
Daemons ON:
[root@t3fs13 ~]# ps fax | grep -i hp
568 ? S 0:02 \_ [hpsa]
27596 pts/0 S+ 0:00 \_ grep -i hp
3465 ? Ssl 10:55 hpasmlited -f /dev/hpilo
3501 ? Ss 0:00 /opt/hp/hp-health/bin/hp-asrd -p 1 -t 600
3502 ? S 1:30 \_ /opt/hp/hp-health/bin/hp-asrd -p 1 -t 600
When turned ON these HP daemons will be connected to the
iLO3
:
[root@t3fs13 ~]# ll /dev/hpilo/
total 0
crw-rw---- 1 root root 246, 0 Sep 13 10:25 d0ccb0
crw-rw---- 1 root root 246, 1 Sep 13 10:25 d0ccb1
crw-rw---- 1 root root 246, 2 Sep 13 10:25 d0ccb2
crw-rw---- 1 root root 246, 3 Sep 13 10:25 d0ccb3
crw-rw---- 1 root root 246, 4 Sep 13 10:25 d0ccb4
crw-rw---- 1 root root 246, 5 Sep 13 10:25 d0ccb5
crw-rw---- 1 root root 246, 6 Sep 13 10:25 d0ccb6
crw-rw---- 1 root root 246, 7 Sep 13 10:25 d0ccb7
HW status by Nagios
More... Close
/usr/local/nagios/libexec/check_hpasm --perfdata=short --timeout=20 --servertype proliant -vvv
calling /sbin/hpasmcli
skipping temperature #16 I/O_ZONE - 70C/158F
skipping temperature #17 I/O_ZONE - 70C/158F
skipping temperature #18 I/O_ZONE - 70C/158F
skipping temperature #28 I/O_ZONE - 70C/158F
HP::Proliant::Component::DiskSubsystem::Da::CLI controllers und platten zusammenf?hren
has 0 controllers
has 0 accelerators
has 0 physical_drives
has 0 logical_drives
has 0 spare_drives
HP::Proliant::Component::DiskSubsystem::Sas::CLI controllers und platten zusammenf?hren
has 0 controllers
has 0 physical_drives
has 0 logical_drives
has 0 spare_drives
HP::Proliant::Component::DiskSubsystem::Scsi::CLI controllers und platten zusammenf?hren
has 0 controllers
has 0 physical_drives
has 0 logical_drives
has 0 spare_drives
HP::Proliant::Component::DiskSubsystem::Ide::CLI controllers und platten zusammenf?hren
has 0 controllers
has 0 physical_drives
has 0 logical_drives
has 0 spare_drives
HP::Proliant::Component::DiskSubsystem::Fca::CLI controllers und platten zusammenf?hren
has 0 host controllers
has 0 controllers
has 0 physical_drives
has 0 logical_drives
has 0 spare_drives
[CPU_0]
cpqSeCpuSlot: 1
cpqSeCpuUnitIndex: 0
cpqSeCpuName: Intel Xeon
cpqSeCpuStatus: ok
info: cpu 0 is ok
[CPU_1]
cpqSeCpuSlot: 2
cpqSeCpuUnitIndex: 1
cpqSeCpuName: Intel Xeon
cpqSeCpuStatus: ok
info: cpu 1 is ok
[PS_1]
cpqHeFltTolPowerSupplyBay: 1
cpqHeFltTolPowerSupplyChassis: 1
cpqHeFltTolPowerSupplyPresent: present
cpqHeFltTolPowerSupplyCondition: ok
cpqHeFltTolPowerSupplyRedundant: redundant
info: powersupply 1 is ok
[PS_2]
cpqHeFltTolPowerSupplyBay: 2
cpqHeFltTolPowerSupplyChassis: 1
cpqHeFltTolPowerSupplyPresent: present
cpqHeFltTolPowerSupplyCondition: ok
cpqHeFltTolPowerSupplyRedundant: redundant
info: powersupply 2 is ok
[FAN_1]
cpqHeFltTolFanChassis: 1
cpqHeFltTolFanIndex: 1
cpqHeFltTolFanLocale: system
cpqHeFltTolFanPresent: present
cpqHeFltTolFanType: other
cpqHeFltTolFanSpeed: normal
cpqHeFltTolFanRedundant: notRedundant
cpqHeFltTolFanRedundantPartner: 0
cpqHeFltTolFanCondition: ok
cpqHeFltTolFanHotPlug: hotPluggable
info: fan 1 is present, speed is normal, pctmax is 29%, location is system, redundance is notRedundant, partner is 0
[FAN_2]
cpqHeFltTolFanChassis: 1
cpqHeFltTolFanIndex: 2
cpqHeFltTolFanLocale: system
cpqHeFltTolFanPresent: present
cpqHeFltTolFanType: other
cpqHeFltTolFanSpeed: normal
cpqHeFltTolFanRedundant: notRedundant
cpqHeFltTolFanRedundantPartner: 0
cpqHeFltTolFanCondition: ok
cpqHeFltTolFanHotPlug: hotPluggable
info: fan 2 is present, speed is normal, pctmax is 29%, location is system, redundance is notRedundant, partner is 0
[FAN_3]
cpqHeFltTolFanChassis: 1
cpqHeFltTolFanIndex: 3
cpqHeFltTolFanLocale: system
cpqHeFltTolFanPresent: present
cpqHeFltTolFanType: other
cpqHeFltTolFanSpeed: normal
cpqHeFltTolFanRedundant: notRedundant
cpqHeFltTolFanRedundantPartner: 0
cpqHeFltTolFanCondition: ok
cpqHeFltTolFanHotPlug: hotPluggable
info: fan 3 is present, speed is normal, pctmax is 45%, location is system, redundance is notRedundant, partner is 0
[FAN_4]
cpqHeFltTolFanChassis: 1
cpqHeFltTolFanIndex: 4
cpqHeFltTolFanLocale: system
cpqHeFltTolFanPresent: present
cpqHeFltTolFanType: other
cpqHeFltTolFanSpeed: normal
cpqHeFltTolFanRedundant: notRedundant
cpqHeFltTolFanRedundantPartner: 0
cpqHeFltTolFanCondition: ok
cpqHeFltTolFanHotPlug: hotPluggable
info: fan 4 is present, speed is normal, pctmax is 53%, location is system, redundance is notRedundant, partner is 0
[FAN_5]
cpqHeFltTolFanChassis: 1
cpqHeFltTolFanIndex: 5
cpqHeFltTolFanLocale: system
cpqHeFltTolFanPresent: present
cpqHeFltTolFanType: other
cpqHeFltTolFanSpeed: normal
cpqHeFltTolFanRedundant: notRedundant
cpqHeFltTolFanRedundantPartner: 0
cpqHeFltTolFanCondition: ok
cpqHeFltTolFanHotPlug: hotPluggable
info: fan 5 is present, speed is normal, pctmax is 53%, location is system, redundance is notRedundant, partner is 0
[FAN_6]
cpqHeFltTolFanChassis: 1
cpqHeFltTolFanIndex: 6
cpqHeFltTolFanLocale: system
cpqHeFltTolFanPresent: present
cpqHeFltTolFanType: other
cpqHeFltTolFanSpeed: normal
cpqHeFltTolFanRedundant: notRedundant
cpqHeFltTolFanRedundantPartner: 0
cpqHeFltTolFanCondition: ok
cpqHeFltTolFanHotPlug: hotPluggable
info: fan 6 is present, speed is normal, pctmax is 13%, location is system, redundance is notRedundant, partner is 0
[TEMP_1]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 1
cpqHeTemperatureLocale: ambient
cpqHeTemperatureCelsius: 18
cpqHeTemperatureThreshold: 41
cpqHeTemperatureCondition: unknown
info: 1 ambient temperature is 18C (41 max)
[TEMP_2]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 2
cpqHeTemperatureLocale: cpu#1
cpqHeTemperatureCelsius: 40
cpqHeTemperatureThreshold: 82
cpqHeTemperatureCondition: unknown
info: 2 cpu#1 temperature is 40C (82 max)
[TEMP_3]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 3
cpqHeTemperatureLocale: cpu#2
cpqHeTemperatureCelsius: 40
cpqHeTemperatureThreshold: 82
cpqHeTemperatureCondition: unknown
info: 3 cpu#2 temperature is 40C (82 max)
[TEMP_4]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 4
cpqHeTemperatureLocale: memory_bd
cpqHeTemperatureCelsius: 28
cpqHeTemperatureThreshold: 87
cpqHeTemperatureCondition: unknown
info: 4 memory_bd temperature is 28C (87 max)
[TEMP_5]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 5
cpqHeTemperatureLocale: memory_bd
cpqHeTemperatureCelsius: 30
cpqHeTemperatureThreshold: 87
cpqHeTemperatureCondition: unknown
info: 5 memory_bd temperature is 30C (87 max)
[TEMP_6]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 6
cpqHeTemperatureLocale: memory_bd
cpqHeTemperatureCelsius: 28
cpqHeTemperatureThreshold: 87
cpqHeTemperatureCondition: unknown
info: 6 memory_bd temperature is 28C (87 max)
[TEMP_7]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 7
cpqHeTemperatureLocale: memory_bd
cpqHeTemperatureCelsius: 32
cpqHeTemperatureThreshold: 87
cpqHeTemperatureCondition: unknown
info: 7 memory_bd temperature is 32C (87 max)
[TEMP_8]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 8
cpqHeTemperatureLocale: power_supply_bay
cpqHeTemperatureCelsius: 34
cpqHeTemperatureThreshold: 90
cpqHeTemperatureCondition: unknown
info: 8 power_supply_bay temperature is 34C (90 max)
[TEMP_9]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 9
cpqHeTemperatureLocale: power_supply_bay
cpqHeTemperatureCelsius: 29
cpqHeTemperatureThreshold: 65
cpqHeTemperatureCondition: unknown
info: 9 power_supply_bay temperature is 29C (65 max)
[TEMP_10]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 10
cpqHeTemperatureLocale: system_bd
cpqHeTemperatureCelsius: 39
cpqHeTemperatureThreshold: 90
cpqHeTemperatureCondition: unknown
info: 10 system_bd temperature is 39C (90 max)
[TEMP_11]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 11
cpqHeTemperatureLocale: system_bd
cpqHeTemperatureCelsius: 30
cpqHeTemperatureThreshold: 70
cpqHeTemperatureCondition: unknown
info: 11 system_bd temperature is 30C (70 max)
[TEMP_12]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 12
cpqHeTemperatureLocale: system_bd
cpqHeTemperatureCelsius: 38
cpqHeTemperatureThreshold: 90
cpqHeTemperatureCondition: unknown
info: 12 system_bd temperature is 38C (90 max)
[TEMP_13]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 13
cpqHeTemperatureLocale: i/o_zone
cpqHeTemperatureCelsius: 26
cpqHeTemperatureThreshold: 70
cpqHeTemperatureCondition: unknown
info: 13 i/o_zone temperature is 26C (70 max)
[TEMP_14]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 14
cpqHeTemperatureLocale: i/o_zone
cpqHeTemperatureCelsius: 30
cpqHeTemperatureThreshold: 70
cpqHeTemperatureCondition: unknown
info: 14 i/o_zone temperature is 30C (70 max)
[TEMP_15]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 15
cpqHeTemperatureLocale: i/o_zone
cpqHeTemperatureCelsius: 30
cpqHeTemperatureThreshold: 70
cpqHeTemperatureCondition: unknown
info: 15 i/o_zone temperature is 30C (70 max)
[TEMP_19]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 19
cpqHeTemperatureLocale: system_bd
cpqHeTemperatureCelsius: 24
cpqHeTemperatureThreshold: 70
cpqHeTemperatureCondition: unknown
info: 19 system_bd temperature is 24C (70 max)
[TEMP_20]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 20
cpqHeTemperatureLocale: system_bd
cpqHeTemperatureCelsius: 25
cpqHeTemperatureThreshold: 70
cpqHeTemperatureCondition: unknown
info: 20 system_bd temperature is 25C (70 max)
[TEMP_21]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 21
cpqHeTemperatureLocale: system_bd
cpqHeTemperatureCelsius: 27
cpqHeTemperatureThreshold: 80
cpqHeTemperatureCondition: unknown
info: 21 system_bd temperature is 27C (80 max)
[TEMP_22]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 22
cpqHeTemperatureLocale: system_bd
cpqHeTemperatureCelsius: 27
cpqHeTemperatureThreshold: 80
cpqHeTemperatureCondition: unknown
info: 22 system_bd temperature is 27C (80 max)
[TEMP_23]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 23
cpqHeTemperatureLocale: system_bd
cpqHeTemperatureCelsius: 32
cpqHeTemperatureThreshold: 77
cpqHeTemperatureCondition: unknown
info: 23 system_bd temperature is 32C (77 max)
[TEMP_24]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 24
cpqHeTemperatureLocale: system_bd
cpqHeTemperatureCelsius: 30
cpqHeTemperatureThreshold: 70
cpqHeTemperatureCondition: unknown
info: 24 system_bd temperature is 30C (70 max)
[TEMP_25]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 25
cpqHeTemperatureLocale: system_bd
cpqHeTemperatureCelsius: 26
cpqHeTemperatureThreshold: 70
cpqHeTemperatureCondition: unknown
info: 25 system_bd temperature is 26C (70 max)
[TEMP_26]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 26
cpqHeTemperatureLocale: system_bd
cpqHeTemperatureCelsius: 27
cpqHeTemperatureThreshold: 70
cpqHeTemperatureCondition: unknown
info: 26 system_bd temperature is 27C (70 max)
[TEMP_27]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 27
cpqHeTemperatureLocale: i/o_zone
cpqHeTemperatureCelsius: 30
cpqHeTemperatureThreshold: 70
cpqHeTemperatureCondition: unknown
info: 27 i/o_zone temperature is 30C (70 max)
[TEMP_29]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 29
cpqHeTemperatureLocale: scsi_backplane_zone
cpqHeTemperatureCelsius: 35
cpqHeTemperatureThreshold: 60
cpqHeTemperatureCondition: unknown
info: 29 scsi_backplane_zone temperature is 35C (60 max)
[TEMP_30]
cpqHeTemperatureChassis: 1
cpqHeTemperatureIndex: 30
cpqHeTemperatureLocale: system_bd
cpqHeTemperatureCelsius: 58
cpqHeTemperatureThreshold: 110
cpqHeTemperatureCondition: unknown
info: 30 system_bd temperature is 58C (110 max)
dimm module 1:3 (module 3 @ cartridge 1) is ok
dimm module 1:6 (module 6 @ cartridge 1) is ok
dimm module 1:9 (module 9 @ cartridge 1) is ok
dimm module 2:3 (module 3 @ cartridge 2) is ok
dimm module 2:6 (module 6 @ cartridge 2) is ok
dimm module 2:9 (module 9 @ cartridge 2) is ok
i dump the memory
car 01 mod 03 siz 8589934592 sta present con ok typ
car 01 mod 06 siz 8589934592 sta present con ok typ
car 01 mod 09 siz 8589934592 sta present con ok typ
car 02 mod 03 siz 8589934592 sta present con ok typ
car 02 mod 06 siz 8589934592 sta present con ok typ
car 02 mod 09 siz 8589934592 sta present con ok typ
[EVENT_5]
cpqHeEventLogEntryNumber: 5
cpqHeEventLogEntrySeverity: info
cpqHeEventLogEntryCount: 1
cpqHeEventLogInitialTime: Tue Mar 20 15:44:00 2012
cpqHeEventLogUpdateTime: Tue Mar 20 15:44:00 2012
cpqHeEventLogErrorDesc: IML Cleared (iLO 3 user:root).
info: Event: 5 Added: 1332254640 Class: (Maintenance Note) info IML Cleared (iLO 3 user:root).
OK - System: 'proliant dl380 g7', S/N: 'CZ31513SBR', ROM: 'P67 05/05/2011', hardware working fine, cpu_0=ok cpu_1=ok ps_1=ok ps_2=ok fan_1=29% fan_2=29% fan_3=45% fan_4=53% fan_5=53% fan_6=13% temp_1=18 temp_2=40 temp_3=40 temp_4=28 temp_5=30 temp_6=28 temp_7=32 temp_8=34 temp_9=29 temp_10=39 temp_11=30 temp_12=38 temp_13=26 temp_14=30 temp_15=30 temp_19=24 temp_20=25 temp_21=27 temp_22=27 temp_23=32 temp_24=30 temp_25=26 temp_26=27 temp_27=30 temp_29=35 temp_30=58
checking cpus
cpu 0 is ok
cpu 1 is ok
checking power supplies
powersupply 1 is ok
powersupply 2 is ok
checking fans
fan 1 is present, speed is normal, pctmax is 29%, location is system, redundance is notRedundant, partner is 0
fan 2 is present, speed is normal, pctmax is 29%, location is system, redundance is notRedundant, partner is 0
fan 3 is present, speed is normal, pctmax is 45%, location is system, redundance is notRedundant, partner is 0
fan 4 is present, speed is normal, pctmax is 53%, location is system, redundance is notRedundant, partner is 0
fan 5 is present, speed is normal, pctmax is 53%, location is system, redundance is notRedundant, partner is 0
fan 6 is present, speed is normal, pctmax is 13%, location is system, redundance is notRedundant, partner is 0
checking temperatures
1 ambient temperature is 18C (41 max)
2 cpu#1 temperature is 40C (82 max)
3 cpu#2 temperature is 40C (82 max)
4 memory_bd temperature is 28C (87 max)
5 memory_bd temperature is 30C (87 max)
6 memory_bd temperature is 28C (87 max)
7 memory_bd temperature is 32C (87 max)
8 power_supply_bay temperature is 34C (90 max)
9 power_supply_bay temperature is 29C (65 max)
10 system_bd temperature is 39C (90 max)
11 system_bd temperature is 30C (70 max)
12 system_bd temperature is 38C (90 max)
13 i/o_zone temperature is 26C (70 max)
14 i/o_zone temperature is 30C (70 max)
15 i/o_zone temperature is 30C (70 max)
19 system_bd temperature is 24C (70 max)
20 system_bd temperature is 25C (70 max)
21 system_bd temperature is 27C (80 max)
22 system_bd temperature is 27C (80 max)
23 system_bd temperature is 32C (77 max)
24 system_bd temperature is 30C (70 max)
25 system_bd temperature is 26C (70 max)
26 system_bd temperature is 27C (70 max)
27 i/o_zone temperature is 30C (70 max)
29 scsi_backplane_zone temperature is 35C (60 max)
30 system_bd temperature is 58C (110 max)
checking memory
dimm module 1:3 (module 3 @ cartridge 1) is ok
dimm module 1:6 (module 6 @ cartridge 1) is ok
dimm module 1:9 (module 9 @ cartridge 1) is ok
dimm module 2:3 (module 3 @ cartridge 2) is ok
dimm module 2:6 (module 6 @ cartridge 2) is ok
dimm module 2:9 (module 9 @ cartridge 2) is ok
checking disk subsystem
checking ASR
checking events
Event: 5 Added: 1332254640 Class: (Maintenance Note) info IML Cleared (iLO 3 user:root). | fan_1=29% fan_2=29% fan_3=45% fan_4=53% fan_5=53% fan_6=13% temp_1=18;41;41 temp_2=40;82;82 temp_3=40;82;82 temp_4=28;87;87 temp_5=30;87;87 temp_6=28;87;87 temp_7=32;87;87 temp_8=34;90;90 temp_9=29;65;65 temp_10=39;90;90 temp_11=30;70;70 temp_12=38;90;90 temp_13=26;70;70 temp_14=30;70;70 temp_15=30;70;70 temp_19=24;70;70 temp_20=25;70;70 temp_21=27;80;80 temp_22=27;80;80 temp_23=32;77;77 temp_24=30;70;70 temp_25=26;70;70 temp_26=27;70;70 temp_27=30;70;70 temp_29=35;60;60 temp_30=58;110;110
RAID 1 status by Nagios
More... Close
/usr/lib64/nagios/plugins/check_cciss-1.9 -s -v -p -d
### Check if "HP Smart Array" (/proc/driver/cciss/cciss) is present >>>\ncat: /proc/driver/cciss/cciss*: No such file or directory\n
### Check if "HP Smart Array" (/proc/scsi/scsi) is present >>>\nAttached devices: Host: scsi0 Channel: 03 Id: 00 Lun: 00 Vendor: HP Model: P410i Rev: 5.70 Type: RAID ANSI SCSI revision: 05 Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: HP Model: LOGICAL VOLUME Rev: 5.70 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi1 Channel: 00 Id: 00 Lun: 00 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi1 Channel: 00 Id: 00 Lun: 01 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi1 Channel: 00 Id: 00 Lun: 02 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi1 Channel: 00 Id: 00 Lun: 03 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi1 Channel: 00 Id: 00 Lun: 04 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi1 Channel: 00 Id: 00 Lun: 05 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi1 Channel: 00 Id: 00 Lun: 06 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi1 Channel: 00 Id: 00 Lun: 07 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi1 Channel: 00 Id: 00 Lun: 08 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi1 Channel: 00 Id: 00 Lun: 09 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi1 Channel: 00 Id: 00 Lun: 10 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi1 Channel: 00 Id: 00 Lun: 11 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi1 Channel: 00 Id: 00 Lun: 12 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi2 Channel: 00 Id: 00 Lun: 00 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi2 Channel: 00 Id: 00 Lun: 01 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi2 Channel: 00 Id: 00 Lun: 02 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi2 Channel: 00 Id: 00 Lun: 03 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi2 Channel: 00 Id: 00 Lun: 04 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi2 Channel: 00 Id: 00 Lun: 05 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi2 Channel: 00 Id: 00 Lun: 06 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi2 Channel: 00 Id: 00 Lun: 07 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi2 Channel: 00 Id: 00 Lun: 08 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi2 Channel: 00 Id: 00 Lun: 09 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi2 Channel: 00 Id: 00 Lun: 10 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi2 Channel: 00 Id: 00 Lun: 11 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi2 Channel: 00 Id: 00 Lun: 12 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi3 Channel: 00 Id: 00 Lun: 00 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi3 Channel: 00 Id: 00 Lun: 01 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi3 Channel: 00 Id: 00 Lun: 02 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi3 Channel: 00 Id: 00 Lun: 03 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi3 Channel: 00 Id: 00 Lun: 04 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi3 Channel: 00 Id: 00 Lun: 05 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi3 Channel: 00 Id: 00 Lun: 06 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi3 Channel: 00 Id: 00 Lun: 07 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi3 Channel: 00 Id: 00 Lun: 08 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi3 Channel: 00 Id: 00 Lun: 09 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi3 Channel: 00 Id: 00 Lun: 10 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi3 Channel: 00 Id: 00 Lun: 11 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi3 Channel: 00 Id: 00 Lun: 12 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi4 Channel: 00 Id: 00 Lun: 00 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi4 Channel: 00 Id: 00 Lun: 01 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi4 Channel: 00 Id: 00 Lun: 02 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi4 Channel: 00 Id: 00 Lun: 03 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi4 Channel: 00 Id: 00 Lun: 04 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi4 Channel: 00 Id: 00 Lun: 05 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi4 Channel: 00 Id: 00 Lun: 06 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi4 Channel: 00 Id: 00 Lun: 07 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi4 Channel: 00 Id: 00 Lun: 08 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi4 Channel: 00 Id: 00 Lun: 09 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi4 Channel: 00 Id: 00 Lun: 10 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi4 Channel: 00 Id: 00 Lun: 11 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi4 Channel: 00 Id: 00 Lun: 12 Vendor: LSI Model: INF-01-00 Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 00 Lun: 01 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 00 Lun: 02 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 00 Lun: 03 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 00 Lun: 04 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 00 Lun: 05 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 00 Lun: 06 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 00 Lun: 07 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 00 Lun: 08 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 00 Lun: 09 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 00 Lun: 10 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 00 Lun: 11 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 00 Lun: 12 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 01 Lun: 01 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 01 Lun: 02 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 01 Lun: 03 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 01 Lun: 04 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 01 Lun: 05 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 01 Lun: 06 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 01 Lun: 07 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 01 Lun: 08 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 01 Lun: 09 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 01 Lun: 10 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 01 Lun: 11 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05 Host: scsi5 Channel: 00 Id: 01 Lun: 12 Vendor: LSI Model: VirtualDisk Rev: 0786 Type: Direct-Access ANSI SCSI revision: 05\n
### Check if "HP Array Utility CLI" is present >>>\n/usr/sbin/hpacucli\n
### Check if "HP Controller" work correctly >>>\n Smart Array P410i in Slot 0 (Embedded) Controller Status: OK Cache Status: OK Battery/Capacitor Status: OK\n
### Get "Slot" & exclude slot not needed >>>\n0\n
### Get "logicaldrive" for slot >>>\n Smart Array P410i in Slot 0 (Embedded) array A logicaldrive 1 (279.4 GB, RAID 1, OK)\n
### Get "physicaldrive" for slot >>>\n physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 300 GB, OK) physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 300 GB, OK)\n
### Get "Chassis" & exclude chassis not needed >>>\n\n
### Check STATUS >>>
RAID OK: Smart Array P410i in Slot 0 (Embedded) array A logicaldrive 1 (279.4 GB, RAID 1, OK) physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 300 GB, OK) physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 300 GB, OK) [Controller Status: OK Cache Status: OK Battery/Capacitor Status: OK]
The servers own a Dual Channel 10Gbit/s Emulex card type
HP Model Number 614203-B21, that owns a
pci-express x8 bus like highlighted by this
lspci -vvv
output:
08:00.0 Ethernet controller: ServerEngines Corp. Emulex OneConnect 10Gb NIC (be3) (rev 01)
...
LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- <-----------------------
...
Kernel driver in use: be2net
Kernel modules: be2net
According to the
IBM Paper about 10Gbit/s links and Linux we've stopped the
irqbalance daemon and inside
/etc/sysctl.conf
we've configured:
More... Close
# cat /etc/sysctl.conf
# Puppet Managed File
#
# Kernel sysctl configuration file for Red Hat Linux
#
# For binary values, 0 is disabled, 1 is enabled. See sysctl(8) and
# sysctl.conf(5) for more details.
# Controls IP packet forwarding
net.ipv4.ip_forward = 0
# Controls source route verification
net.ipv4.conf.default.rp_filter = 1
# Do not accept source routing
net.ipv4.conf.default.accept_source_route = 0
# Controls the System Request debugging functionality of the kernel
kernel.sysrq = 0
# Controls whether core dumps will append the PID to the core filename.
# Useful for debugging multi-threaded applications.
kernel.core_uses_pid = 1
# Controls the use of TCP syncookies
net.ipv4.tcp_syncookies = 1
# Disable netfilter on bridges.
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0
# by martinelli according to IBM Tuning 10Gb network cards on Linux
# http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cts=1331221888369&ved=0CCcQFjAA&url=http%3A%2F%2Fkernel.org%2Fdoc%2Fols%2F2009%2Fols2009-pages-169-184.pdf&ei=edVYT9bRNPPc4QSugZHWDw&usg=AFQjCNGl95lznOgwBSFzpuZ3QlXohXM1Xw&sig2=ZlZTvy_XY4eTGHam_JflIw
net.core.rmem_max = 16777216
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_rmem = 4096 87380 3526656
net.ipv4.tcp_wmem = 4096 87380 3526656
net.ipv4.tcp_sack = 1 <------ DON'T SWITCH THIS TO 0 !!
net.core.netdev_max_backlog = 300000
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_fin_timeout = 20
net.ipv4.tcp_moderate_rcvbuf = 1
dCache 2.6
Please look the Puppet recipes:
-
SL6_fs13fs14.pp
-
SL6_grid
-
SL6.pp
-
tier3-baseclasses.pp
CSCS dCache 2.6 page
LCGTier2/ServiceDcache
Important files in a nutshell
find /etc/dcache/
/etc/dcache/dcache.conf <-- main dCache conf, it should be the same on each node
/etc/dcache/logback.xml <-- to tune the logging verbosity
/etc/logrotate.d/dcache
/etc/dcache/layouts
/etc/dcache/layouts/t3fs13.conf <-- specific node conf
# dCache Logs
/var/log/dcache/
/var/log/dcache/t3fs13-Domain-dcap.log
/var/log/dcache/t3fs13-Domain-gridftp.log
/var/log/dcache/t3fs13-Domain-gsidcap.log
/var/log/dcache/t3fs13-Domain-pool.log
/etc/dcache/dcache.conf
The same as
NodeTypeStorageElement#etc_dcache_dcache_conf
/etc/dcache/layouts/t3fs13.conf
More... Close
# Puppet Managed File
[${host.name}-Domain-pool]
# t3fs13_cms
[${host.name}-Domain-pool/pool]
name=t3fs13_cms
path=/mnt/data06/t3fs13_cms/pool
waitForFiles=${path}/data
# t3fs13_cms_1
[${host.name}-Domain-pool/pool]
name=t3fs13_cms_1
path=/mnt/data01/t3fs13_cms/pool
waitForFiles=${path}/data
# t3fs13_cms_2
[${host.name}-Domain-pool/pool]
name=t3fs13_cms_2
path=/mnt/data02/t3fs13_cms/pool
waitForFiles=${path}/data
# t3fs13_cms_3
[${host.name}-Domain-pool/pool]
name=t3fs13_cms_3
path=/mnt/data03/t3fs13_cms/pool
waitForFiles=${path}/data
# t3fs13_cms_4
[${host.name}-Domain-pool/pool]
name=t3fs13_cms_4
path=/mnt/data04/t3fs13_cms/pool
waitForFiles=${path}/data
# t3fs13_cms_5
[${host.name}-Domain-pool/pool]
name=t3fs13_cms_5
path=/mnt/data05/t3fs13_cms/pool
waitForFiles=${path}/data
# t3fs13_cms_6
[${host.name}-Domain-pool/pool]
name=t3fs13_cms_6
path=/mnt/data07/t3fs13_cms/pool
waitForFiles=${path}/data
# t3fs13_cms_7
[${host.name}-Domain-pool/pool]
name=t3fs13_cms_7
path=/mnt/data08/t3fs13_cms/pool
waitForFiles=${path}/data
# t3fs13_cms_8
[${host.name}-Domain-pool/pool]
name=t3fs13_cms_8
path=/mnt/data09/t3fs13_cms/pool
waitForFiles=${path}/data
# t3fs13_cms_9
[${host.name}-Domain-pool/pool]
name=t3fs13_cms_9
path=/mnt/data10/t3fs13_cms/pool
waitForFiles=${path}/data
# t3fs13_cms_10
[${host.name}-Domain-pool/pool]
name=t3fs13_cms_10
path=/mnt/data11/t3fs13_cms/pool
waitForFiles=${path}/data
# t3fs13_cms_11
[${host.name}-Domain-pool/pool]
name=t3fs13_cms_11
path=/mnt/data12/t3fs13_cms/pool
waitForFiles=${path}/data
# t3fs13_cms_0
[${host.name}-Domain-pool/pool]
name=t3fs13_cms_0
path=/mnt/data00/t3fs13_cms/pool
waitForFiles=${path}/data
# t3fs13_ops
[${host.name}-Domain-pool/pool]
name=t3fs13_ops
path=/mnt/data06/t3fs13_ops/pool
waitForFiles=${path}/data
# t3fs13_ops_1
[${host.name}-Domain-pool/pool]
name=t3fs13_ops_1
path=/mnt/data01/t3fs13_ops/pool
waitForFiles=${path}/data
# t3fs13_ops_2
[${host.name}-Domain-pool/pool]
name=t3fs13_ops_2
path=/mnt/data02/t3fs13_ops/pool
waitForFiles=${path}/data
# t3fs13_ops_3
[${host.name}-Domain-pool/pool]
name=t3fs13_ops_3
path=/mnt/data03/t3fs13_ops/pool
waitForFiles=${path}/data
# t3fs13_ops_4
[${host.name}-Domain-pool/pool]
name=t3fs13_ops_4
path=/mnt/data04/t3fs13_ops/pool
waitForFiles=${path}/data
# t3fs13_ops_5
[${host.name}-Domain-pool/pool]
name=t3fs13_ops_5
path=/mnt/data05/t3fs13_ops/pool
waitForFiles=${path}/data
# t3fs13_ops_6
[${host.name}-Domain-pool/pool]
name=t3fs13_ops_6
path=/mnt/data07/t3fs13_ops/pool
waitForFiles=${path}/data
# t3fs13_ops_7
[${host.name}-Domain-pool/pool]
name=t3fs13_ops_7
path=/mnt/data08/t3fs13_ops/pool
waitForFiles=${path}/data
# t3fs13_ops_8
[${host.name}-Domain-pool/pool]
name=t3fs13_ops_8
path=/mnt/data09/t3fs13_ops/pool
waitForFiles=${path}/data
# t3fs13_ops_9
[${host.name}-Domain-pool/pool]
name=t3fs13_ops_9
path=/mnt/data10/t3fs13_ops/pool
waitForFiles=${path}/data
# t3fs13_ops_10
[${host.name}-Domain-pool/pool]
name=t3fs13_ops_10
path=/mnt/data11/t3fs13_ops/pool
waitForFiles=${path}/data
# t3fs13_ops_11
[${host.name}-Domain-pool/pool]
name=t3fs13_ops_11
path=/mnt/data12/t3fs13_ops/pool
waitForFiles=${path}/data
[${host.name}-Domain-dcap]
[${host.name}-Domain-dcap/dcap]
[${host.name}-Domain-gridftp]
[${host.name}-Domain-gridftp/gridftp]
[${host.name}-Domain-gsidcap]
[${host.name}-Domain-gsidcap/gsidcap]
dCache gap tuning according to the type of pool
[root@t3fs13 ~]# grep gap /mnt/data01/t3fs13_cms/pool/setup
set gap 4g
[root@t3fs13 ~]# grep gap /mnt/data01/t3fs13_ops/pool/setup
set gap 10485760
T3 Site Logs about these servers
Services
Listening
[root@t3fs13 ~]# netstat -tpln
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:33131 0.0.0.0:* LISTEN 3247/java
tcp 0 0 0.0.0.0:33133 0.0.0.0:* LISTEN 3247/java
tcp 0 0 0.0.0.0:22125 0.0.0.0:* LISTEN 7515/java
tcp 0 0 0.0.0.0:33134 0.0.0.0:* LISTEN 3247/java
tcp 0 0 0.0.0.0:33135 0.0.0.0:* LISTEN 3247/java
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 2614/rpcbind
tcp 0 0 0.0.0.0:33136 0.0.0.0:* LISTEN 3247/java
tcp 0 0 0.0.0.0:22128 0.0.0.0:* LISTEN 7653/java
tcp 0 0 0.0.0.0:33137 0.0.0.0:* LISTEN 3247/java
tcp 0 0 0.0.0.0:33138 0.0.0.0:* LISTEN 3247/java
tcp 0 0 0.0.0.0:33139 0.0.0.0:* LISTEN 3247/java
tcp 0 0 0.0.0.0:33140 0.0.0.0:* LISTEN 3247/java
tcp 0 0 0.0.0.0:33141 0.0.0.0:* LISTEN 3247/java
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 2892/sshd
tcp 0 0 0.0.0.0:33143 0.0.0.0:* LISTEN 3247/java
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 2780/cupsd
tcp 0 0 0.0.0.0:33144 0.0.0.0:* LISTEN 3247/java
tcp 0 0 0.0.0.0:33145 0.0.0.0:* LISTEN 3247/java
tcp 0 0 127.0.0.1:6010 0.0.0.0:* LISTEN 27103/0
tcp 0 0 0.0.0.0:33115 0.0.0.0:* LISTEN 3247/java
tcp 0 0 0.0.0.0:2811 0.0.0.0:* LISTEN 7582/java
tcp 0 0 0.0.0.0:33116 0.0.0.0:* LISTEN 3247/java
tcp 0 0 0.0.0.0:924 0.0.0.0:* LISTEN 2865/qlremote
tcp 0 0 0.0.0.0:33117 0.0.0.0:* LISTEN 3247/java
tcp 0 0 0.0.0.0:33120 0.0.0.0:* LISTEN 3247/java
tcp 0 0 0.0.0.0:7937 0.0.0.0:* LISTEN 27712/nsrexecd
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 15784/nrpe
tcp 0 0 0.0.0.0:33122 0.0.0.0:* LISTEN 3247/java
tcp 0 0 0.0.0.0:7938 0.0.0.0:* LISTEN 27712/nsrexecd
tcp 0 0 0.0.0.0:33123 0.0.0.0:* LISTEN 3247/java
tcp 0 0 0.0.0.0:7939 0.0.0.0:* LISTEN 27712/nsrexecd
tcp 0 0 0.0.0.0:33124 0.0.0.0:* LISTEN 3247/java
tcp 0 0 0.0.0.0:7940 0.0.0.0:* LISTEN 27712/nsrexecd
tcp 0 0 0.0.0.0:33125 0.0.0.0:* LISTEN 3247/java
tcp 0 0 0.0.0.0:33126 0.0.0.0:* LISTEN 3247/java
tcp 0 0 0.0.0.0:33127 0.0.0.0:* LISTEN 3247/java
tcp 0 0 127.0.0.1:199 0.0.0.0:* LISTEN 2878/snmpd
tcp 0 0 0.0.0.0:33128 0.0.0.0:* LISTEN 3247/java
tcp 0 0 0.0.0.0:33129 0.0.0.0:* LISTEN 3247/java
tcp 0 0 0.0.0.0:33130 0.0.0.0:* LISTEN 3247/java
[root@t3fs13 ~]# netstat -upln
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
udp 0 0 0.0.0.0:111 0.0.0.0:* 2614/rpcbind
udp 0 0 0.0.0.0:631 0.0.0.0:* 2780/cupsd
udp 0 0 192.33.123.53:123 0.0.0.0:* 2902/ntpd
udp 0 0 127.0.0.1:123 0.0.0.0:* 2902/ntpd
udp 0 0 0.0.0.0:123 0.0.0.0:* 2902/ntpd
udp 0 0 0.0.0.0:7938 0.0.0.0:* 27712/nsrexecd
udp 0 0 127.0.0.1:514 0.0.0.0:* 2527/syslog-ng
udp 0 0 0.0.0.0:47879 0.0.0.0:* 7582/java
udp 0 0 0.0.0.0:40729 0.0.0.0:* 7653/java
udp 0 0 0.0.0.0:665 0.0.0.0:* 2614/rpcbind
udp 0 0 0.0.0.0:922 0.0.0.0:* 2865/qlremote
udp 0 0 0.0.0.0:161 0.0.0.0:* 2878/snmpd
udp 0 0 0.0.0.0:34484 0.0.0.0:* 7515/java
udp 0 0 0.0.0.0:34114 0.0.0.0:* 2769/avahi-daemon:
udp 0 0 0.0.0.0:68 0.0.0.0:* 2440/dhclient
udp 0 0 0.0.0.0:43351 0.0.0.0:* 3247/java
udp 0 0 0.0.0.0:5353 0.0.0.0:* 2769/avahi-daemon:
dCache
[root@t3fs13 ~]# dcache services
DOMAIN SERVICE CELL LOG
t3fs13-Domain-pool pool t3fs13_cms /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_cms_1 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_cms_2 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_cms_3 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_cms_4 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_cms_5 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_cms_6 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_cms_7 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_cms_8 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_cms_9 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_cms_10 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_cms_11 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_cms_0 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_ops /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_ops_1 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_ops_2 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_ops_3 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_ops_4 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_ops_5 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_ops_6 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_ops_7 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_ops_8 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_ops_9 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_ops_10 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-pool pool t3fs13_ops_11 /var/log/dcache/t3fs13-Domain-pool.log
t3fs13-Domain-dcap dcap DCap-t3fs13 /var/log/dcache/t3fs13-Domain-dcap.log
t3fs13-Domain-gridftp gridftp GFTP-t3fs13 /var/log/dcache/t3fs13-Domain-gridftp.log
t3fs13-Domain-gsidcap gsidcap DCap-gsi-t3fs13 /var/log/dcache/t3fs13-Domain-gsidcap.log
Backups
Both
t3fs13,14
are protected by the
PSI Legato backup infrastructure for partitions:
- 13201_div.pdf: Manual of - Smart Array Controller FBWC 1GB Flash Backed Cache 8 only to P410i
- c03479393.pdf: Linux best practices using HP Service Pack for ProLiant (SPP) and Software Delivery Repository (SDR)