Arrow left Go to previous page / next page of Tier3 site log MOVED TO...

10. 03. 2014 Lost Sensor n.60 on each SUN Thor fileserver

For some reason today in parallel we got this error; after 1h of investigation I could reproduce the error but I couldn't understand its cause;
I'll disable the Nagios check of Temp sensor n. 60 on the servers t3fs[07-11].

/opt/nagios/check_ipmi_sensor Nagios invocation

[root@t3admin01 ~]# /opt/nagios/check_ipmi_sensor -vvv  -f /opt/nagios/check_ipmi_sensor.user.pwd.privilege -H rmfs07 -O '--interpret-oem-data --ignore-not-available-sensors  --non-abbreviated-units  --record-ids=34,35,38,45,48,49,60'
------------- begin of debug output (-vvv is set): ------------
 script was executed with the following parameters:
   /opt/nagios/check_ipmi_sensor -vvv -f /opt/nagios/check_ipmi_sensor.user.pwd.privilege -H rmfs07 -O --interpret-oem-data --ignore-not-available-sensors  --non-abbreviated-units  --record-ids=34,35,38,45,48,49,60
 check_ipmi_sensor version:
   3.1 2012-05-24
 FreeIPMI version:
   ipmi-sensors - 1.3.4
 FreeIPMI was executed with the following parameters:
   /usr/sbin/ipmi-sensors -h rmfs07 --config-file /opt/nagios/check_ipmi_sensor.user.pwd.privilege --interpret-oem-data --ignore-not-available-sensors --non-abbreviated-units --record-ids=34,35,38,45,48,49,60 --quiet-cache --sdr-cache-recreate --interpret-oem-data --output-sensor-state --ignore-not-available-sensors
 FreeIPMI return code: 0
 output of FreeIPMI:
ID | Name             | Type            | State    | Reading    | Units     | Event
34 | PROC/FRONT_T_AMB | Temperature     | Nominal  | 33.00      | degrees C | 'OK'
35 | PROC/REAR_T_AMB  | Temperature     | Nominal  | 31.00      | degrees C | 'OK'
38 | P0/T_CORE        | Temperature     | Nominal  | 17.00      | degrees C | 'OK'
45 | P1/T_CORE        | Temperature     | Nominal  | 19.00      | degrees C | 'OK'
48 | IO/REAR_T_AMB    | Temperature     | Nominal  | 54.00      | degrees C | 'OK'
49 | IO/FRONT_T_AMB   | Temperature     | Nominal  | 45.00      | degrees C | 'OK'
60 | PCIE0/F20C/PRSNT | Entity Presence | Critical | N/A        | N/A       | 'Device Removed/Device Absent'

--------------------- end of debug output ---------------------
IPMI Status: Critical [PCIE0/F20C/PRSNT = Critical ('Device Removed/Device Absent')] | 'PROC/FRONT_T_AMB'=33.00 'PROC/REAR_T_AMB'=31.00 'P0/T_CORE'=17.00 'P1/T_CORE'=19.00 'IO/REAR_T_AMB'=54.00 'IO/FRONT_T_AMB'=45.00
PROC/FRONT_T_AMB = 33.00 (Status: Nominal)
PROC/REAR_T_AMB = 31.00 (Status: Nominal)
P0/T_CORE = 17.00 (Status: Nominal)
P1/T_CORE = 19.00 (Status: Nominal)
IO/REAR_T_AMB = 54.00 (Status: Nominal)
IO/FRONT_T_AMB = 45.00 (Status: Nominal)
PCIE0/F20C/PRSNT = 'Device Removed/Device Absent' (Status: Critical)

/usr/sbin/ipmi-sensors direct invocation

[root@t3admin01 ~]# for i in 07 08 09 10 11 ; do echo rmfs$i ; /usr/sbin/ipmi-sensors -h rmfs$i  --config-file /opt/nagios/check_ipmi_sensor.user.pwd.privilege --interpret-oem-data --ignore-not-available-sensors --non-abbreviated-units --record-ids=34,35,38,45,48,49,60 --sdr-cache-recreate --interpret-oem-data --output-sensor-state --ignore-not-available-sensors ; done 
rmfs07
Caching SDR repository information: /root/.freeipmi/sdr-cache/sdr-cache-t3admin01.rmfs07
Caching SDR record 396 of 396 (current record ID 396) 
ID | Name             | Type            | State    | Reading    | Units     | Event
34 | PROC/FRONT_T_AMB | Temperature     | Nominal  | 33.00      | degrees C | 'OK'
35 | PROC/REAR_T_AMB  | Temperature     | Nominal  | 31.00      | degrees C | 'OK'
38 | P0/T_CORE        | Temperature     | Nominal  | 17.00      | degrees C | 'OK'
45 | P1/T_CORE        | Temperature     | Nominal  | 19.00      | degrees C | 'OK'
48 | IO/REAR_T_AMB    | Temperature     | Nominal  | 54.00      | degrees C | 'OK'
49 | IO/FRONT_T_AMB   | Temperature     | Nominal  | 45.00      | degrees C | 'OK'
60 | PCIE0/F20C/PRSNT | Entity Presence | Critical | N/A        | N/A       | 'Device Removed/Device Absent'

rmfs08
Caching SDR repository information: /root/.freeipmi/sdr-cache/sdr-cache-t3admin01.rmfs08
Caching SDR record 396 of 396 (current record ID 396) 
ID | Name             | Type            | State    | Reading    | Units     | Event
34 | PROC/FRONT_T_AMB | Temperature     | Nominal  | 33.00      | degrees C | 'OK'
35 | PROC/REAR_T_AMB  | Temperature     | Nominal  | 32.00      | degrees C | 'OK'
38 | P0/T_CORE        | Temperature     | Nominal  | 17.00      | degrees C | 'OK'
45 | P1/T_CORE        | Temperature     | Nominal  | 19.00      | degrees C | 'OK'
48 | IO/REAR_T_AMB    | Temperature     | Nominal  | 54.00      | degrees C | 'OK'
49 | IO/FRONT_T_AMB   | Temperature     | Nominal  | 46.00      | degrees C | 'OK'
60 | PCIE0/F20C/PRSNT | Entity Presence | Critical | N/A        | N/A       | 'Device Removed/Device Absent'

rmfs09
Caching SDR repository information: /root/.freeipmi/sdr-cache/sdr-cache-t3admin01.rmfs09
Caching SDR record 396 of 396 (current record ID 396) 
ID | Name             | Type            | State    | Reading    | Units     | Event
34 | PROC/FRONT_T_AMB | Temperature     | Nominal  | 34.00      | degrees C | 'OK'
35 | PROC/REAR_T_AMB  | Temperature     | Nominal  | 33.00      | degrees C | 'OK'
38 | P0/T_CORE        | Temperature     | Nominal  | 17.00      | degrees C | 'OK'
45 | P1/T_CORE        | Temperature     | Nominal  | 23.00      | degrees C | 'OK'
48 | IO/REAR_T_AMB    | Temperature     | Nominal  | 54.00      | degrees C | 'OK'
49 | IO/FRONT_T_AMB   | Temperature     | Nominal  | 47.00      | degrees C | 'OK'
60 | PCIE0/F20C/PRSNT | Entity Presence | Critical | N/A        | N/A       | 'Device Removed/Device Absent'

rmfs10
Caching SDR repository information: /root/.freeipmi/sdr-cache/sdr-cache-t3admin01.rmfs10
Caching SDR record 396 of 396 (current record ID 396) 
ID | Name             | Type            | State    | Reading    | Units     | Event
34 | PROC/FRONT_T_AMB | Temperature     | Nominal  | 36.00      | degrees C | 'OK'
35 | PROC/REAR_T_AMB  | Temperature     | Nominal  | 37.00      | degrees C | 'OK'
38 | P0/T_CORE        | Temperature     | Nominal  | 21.00      | degrees C | 'OK'
45 | P1/T_CORE        | Temperature     | Nominal  | 21.00      | degrees C | 'OK'
48 | IO/REAR_T_AMB    | Temperature     | Nominal  | 58.00      | degrees C | 'OK'
49 | IO/FRONT_T_AMB   | Temperature     | Nominal  | 49.00      | degrees C | 'OK'
60 | PCIE0/F20C/PRSNT | Entity Presence | Critical | N/A        | N/A       | 'Device Removed/Device Absent'

rmfs11
Caching SDR repository information: /root/.freeipmi/sdr-cache/sdr-cache-t3admin01.rmfs11
Caching SDR record 396 of 396 (current record ID 396) 
ID | Name             | Type            | State    | Reading    | Units     | Event
34 | PROC/FRONT_T_AMB | Temperature     | Nominal  | 33.00      | degrees C | 'OK'
35 | PROC/REAR_T_AMB  | Temperature     | Nominal  | 33.00      | degrees C | 'OK'
38 | P0/T_CORE        | Temperature     | Nominal  | 18.00      | degrees C | 'OK'
45 | P1/T_CORE        | Temperature     | Nominal  | 19.00      | degrees C | 'OK'
48 | IO/REAR_T_AMB    | Temperature     | Nominal  | 55.00      | degrees C | 'OK'
49 | IO/FRONT_T_AMB   | Temperature     | Nominal  | 48.00      | degrees C | 'OK'
60 | PCIE0/F20C/PRSNT | Entity Presence | Critical | N/A        | N/A       | 'Device Removed/Device Absent'
[root@t3admin01 ~]# 


Arrow left Go to previous page / next page of Tier3 site log MOVED TO...


This topic: CmsTier3 > WebHome > CMSTier3Log > CMSTier3Log62
Topic revision: r2 - 2014-03-10 - FabioMartinelli
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback