(Derek) Notice that this machine still reboots unexplainedly about once per month. The machine comes up well, but the dcache pools are not started and need manual restart.
Firewall requirements
2811/tcp |
* |
gridftp control connection |
20000-25000/tcp |
* |
Globus port range for gridftp/xrootd data streams |
t3nfs02 configuration (status 16.08.17)
T3NFS02 - a host that contain zfs-backup of home directories and the same time is a server node to 248TB storage (
NetApp E2760).
Original script for backup is located on
t3nfs01:/opt/zfssnap/zfssnap .
In this directory there is file
zfssnap-day with content like
zfssnap-day-20170803-012839
The data from zfssnap-day is Reference Point from which the backup should start
(is used in backup script as
oldlabel).
It's overrided when script is completed.
Backup issued by daily cron job:
t3nfs01:/etc/cron.daily/zfssnap
PERIOD="day"
MAXSNAPS=2
BACKUPSERVER="t3nfs02.psi.ch"
LOG="/var/log/zfssnap.$PERIOD.log"
echo "" >> $LOG
echo "- new run -" >> $LOG
/opt/zfssnap/zfssnap $PERIOD $MAXSNAPS $BACKUPSERVER >>$LOG 2>&1
echo "- end run -" >> $LOG
echo "" >> $LOG
Here is a description of t3nfs02 configuration steps that were done after rhel-7.3 installation to receive a backup from t3nfs01.
ZFS pool on t3nfs02 dedicated to collect zfs shapshots from
t3nfs01:/zfs/shome.
1. zfs pool data01 created as nested RAIDZ1-0 device from 12*3TB disks associated with HP Smart Array Controller P440 (enabled in HBA-mode):
# zpool create data01 raidz1 /dev/sda /dev/sdb /dev/sdc raidz1 /dev/sdd /dev/sde /dev/sdf raidz1 /dev/sdg /dev/sdh /dev/sdi raidz1 /dev/sdj /dev/sdk /dev/sdl
2. create a dataset t3nfs01_data01:
# zfs create -o mountpoint=/zfs data01/t3nfs01_data01
3. add zfs user with home directory /opt/zfssnap
# groupadd --gid 337 zfs
# useradd zfs -u 337 -g 337 -s /bin/bash -d /opt/zfssnap
# chown -R :zfs /opt/zfssnap/
# chown -R zfs:zfs .ssh
4. create
/opt/zfssnap/.ssh/authorized_keys with a key from
t3nfs01:/root/.ssh/id_rsa.pub
5. in
/etc/security/access.conf add the following line:
+ : zfs : t3nfs01.psi.ch
6. in
/etc/hosts.allow add
sshd: t3admin01.psi.ch t3admin02.psi.ch wmgt01.psi.ch wmgt02.psi.ch localhost t3nfs01.psi.ch
7. SUDOERS for user zfs:
in
/etc/sudoers add
#includedir /etc/sudoers.d
Defaults:zfs !requiretty
and in
/etc/sudoers.d/zfs add lines corresponding commands from script /opt/zfssnap/zfssnap
zfs ALL=NOPASSWD: C_ZFS
## Cmnd alias specification
Cmnd_Alias C_ZFS = /usr/sbin/zfs list, /usr/sbin/zfs list *, /usr/sbin/zfs list -H -t snapshot *
/usr/sbin/zfs recv -dv data01/t3nfs01_data01,
/usr/sbin/zfs recv -dvF data01/t3nfs01_data01
ZFS Slides
ZFS Slides
Warranty
http://h20565.www2.hpe.com/hpsc/wc/public/
More... Close
Remarque : Le caractères dièse [#], le cas échéant peut masquer les numéros de contrat et de garantie ou autres données sensibles.
Les informations figurant sur cette page correspondent aux détails développés de :
Produit : HP DL380 Gen9 12LFF CTO Server
Numéro de série : CZJ5390FSB
Numéro de produit : 719061-B21
1. Accord de support: #####
Type de garantie: Contrat
Type de service: HP Foundation Care NBD Service
Type de service: HP Hardware Maintenance Onsite Support*
Statut: Actif
Date de début: 7 oct. 2015
Date de fin: 31 oct. 2020
Type de service: HP Software Technical Unlimited Support
Statut: Actif
Date de début: 7 oct. 2015
Date de fin: 31 oct. 2020
Type de service: HP Collaborative Remote Support
Statut: Actif
Date de début: 7 oct. 2015
Date de fin: 31 oct. 2020
2. Garantie HP: CZJ5390FSB
Les garanties de base avec des composants actifs peuvent être liées à votre profil en visitant la page Lier des garanties. Si votre garantie a expiré, vous pouvez acheter un HP Care Pack post-garantie à ladresse HP Care Pack Services.
Type de garantie: Garantie de base
Type de service: Wty: HP HW Maintenance Onsite Support*
Statut: Actif
Date de début: 29 sept. 2015
Date de fin: 28 oct. 2018
Niveau de service: Standard Material Handling
Global Coverage
NextAvail TechResource Remote
Std Office Hrs Std Office Days
NextAvail TechResource Onsite
No Usage Limitation
Next Cov Day Onsite Response
Standard Parts Logistics
Éléments à livrer: Onsite Support
Parts and Material provided
Hardware Problem Diagnosis
Type de service: Wty: HP Support for Initial Setup
Statut: Actif
Date de début: 29 sept. 2015
Date de fin: 26 janv. 2016
Niveau de service: NextAvail TechResource Remote
Std Office Hrs Std Office Days
2 Hr Remote Response
Unlimited Named Callers
Éléments à livrer: Initial Setup Assistance
*Remarque : Selon les termes de service HP de maintencance du matérial hors site ; HP peut à sa seule discrétion décider si un défaut est réparable :
À distance
À l'aide d'une pièce de réparation par le client
Par une demande d'intervention à l'emplacement de l'appareil défectueux
Pour plus de détails consultez le document « Garantie limitée et assistance technique internationales » qui a été livré avec le produit.
---+ Regular Maintenance work
Emergency Measures
Installation
HW
See
NFSServerZFS about HW installation
HP P441 and P841 controllers conf
- hpssacli.slot.3: hpssacli slot 3
- hpssacli.slot.4: hpssacli slot 4, be aware of SAS Address info
- hpssacli.slot.5: hpssacli slot 5, be aware of SAS Address info
- shows.netapp.luns.sh: Linux disks to NetApp E2760 LUNs mapping More... Close
[root@t3nfs02 ~]# fdisk -l 2>/dev/null | grep sd | grep -o "/dev/sd[o-z]" | uniq | xargs -iI echo /usr/lib/udev/scsi_id -g -v I | bash -x
+ /usr/lib/udev/scsi_id -g -v /dev/sdp
3600a098000a87a7b000001005805ae91
+ /usr/lib/udev/scsi_id -g -v /dev/sdo
3600a098000a87a7b000001005805ae91
+ /usr/lib/udev/scsi_id -g -v /dev/sds
3600a098000a87a7b000001005805ae91
+ /usr/lib/udev/scsi_id -g -v /dev/sdu
3600a098000a87a7b000000fe5805ae3a
+ /usr/lib/udev/scsi_id -g -v /dev/sdq
3600a098000a87a7b000000fe5805ae3a
+ /usr/lib/udev/scsi_id -g -v /dev/sdr
3600a098000a87a7b000000fe5805ae3a
+ /usr/lib/udev/scsi_id -g -v /dev/sdt
3600a098000a87a7b000001005805ae91
+ /usr/lib/udev/scsi_id -g -v /dev/sdv
3600a098000a87a7b000000fe5805ae3a
10Gb/s LACP network setup
Portchannel 201 Cable 13-18908 and 13-18909
Portchannel 202 Cable 13-18910 and 13-18911
Portchannel 203 Cable 13-18912 and 13-18913
Portchannel 204 Cable 13-18914 and 13-18915
Portchannel 205 Cable 13-18916 and 13-18917
VLAN 410
RHEL7 Doc
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Networking_Guide/
parallel hdparm -t --direct /dev/sd*
Tot ~2GB/s
More... Close
[root@t3nfs02 iozone]# lsscsi | grep MB3000FCWDH | awk '{print $6}' | parallel -iI hdparm -t --direct I
/dev/sdg:
Timing O_DIRECT disk reads: 556 MB in 3.01 seconds = 184.87 MB/sec
/dev/sdb:
Timing O_DIRECT disk reads: 548 MB in 3.00 seconds = 182.38 MB/sec
/dev/sdd:
Timing O_DIRECT disk reads: 558 MB in 3.01 seconds = 185.60 MB/sec
/dev/sdf:
Timing O_DIRECT disk reads: 582 MB in 3.00 seconds = 193.71 MB/sec
/dev/sdh:
Timing O_DIRECT disk reads: 568 MB in 3.01 seconds = 188.79 MB/sec
/dev/sdi:
Timing O_DIRECT disk reads: 550 MB in 3.00 seconds = 183.03 MB/sec
/dev/sdj:
Timing O_DIRECT disk reads: 538 MB in 3.00 seconds = 179.11 MB/sec
/dev/sda:
Timing O_DIRECT disk reads: 542 MB in 3.01 seconds = 180.31 MB/sec
/dev/sdc:
Timing O_DIRECT disk reads: 520 MB in 3.00 seconds = 173.29 MB/sec
/dev/sde:
Timing O_DIRECT disk reads: 546 MB in 3.00 seconds = 181.77 MB/sec
/dev/sdk:
Timing O_DIRECT disk reads: 554 MB in 3.00 seconds = 184.37 MB/sec
/dev/sdl:
Timing O_DIRECT disk reads: 538 MB in 3.01 seconds = 178.93 MB/sec
Services
Backups
10Gbs Dual Copper Cards
t3nfs01-2-10GbsCard-SASController.pdf: t3nfs01-2-10GbsCard-SASController.pdf
ZFS update 0.6.5.7 > 0.6.5.8
More... Close
[root@t3nfs02 ~]# yum update --disableplugin=*
EGI-trustanchors | 2.5 kB 00:00:00
Tier3 | 2.9 kB 00:00:00
base | 3.6 kB 00:00:00
cern | 4.1 kB 00:00:00
extras | 3.4 kB 00:00:00
updates | 3.8 kB 00:00:00
zfs | 2.9 kB 00:00:00
Resolving Dependencies
--> Running transaction check
---> Package libnvpair1.x86_64 0:0.6.5.7-1.el7.centos will be updated
---> Package libnvpair1.x86_64 0:0.6.5.8-1.el7.centos will be an update
---> Package libuutil1.x86_64 0:0.6.5.7-1.el7.centos will be updated
---> Package libuutil1.x86_64 0:0.6.5.8-1.el7.centos will be an update
---> Package libzfs2.x86_64 0:0.6.5.7-1.el7.centos will be updated
---> Package libzfs2.x86_64 0:0.6.5.8-1.el7.centos will be an update
---> Package libzpool2.x86_64 0:0.6.5.7-1.el7.centos will be updated
---> Package libzpool2.x86_64 0:0.6.5.8-1.el7.centos will be an update
---> Package spl.x86_64 0:0.6.5.7-1.el7.centos will be updated
---> Package spl.x86_64 0:0.6.5.8-1.el7.centos will be an update
---> Package spl-dkms.noarch 0:0.6.5.7-1.el7.centos will be updated
---> Package spl-dkms.noarch 0:0.6.5.8-1.el7.centos will be an update
---> Package zfs.x86_64 0:0.6.5.7-1.el7.centos will be updated
---> Package zfs.x86_64 0:0.6.5.8-1.el7.centos will be an update
---> Package zfs-dkms.noarch 0:0.6.5.7-1.el7.centos will be updated
---> Package zfs-dkms.noarch 0:0.6.5.8-1.el7.centos will be an update
--> Finished Dependency Resolution
Dependencies Resolved
==================================================================================================================================================================================================================
Package Arch Version Repository Size
==================================================================================================================================================================================================================
Updating:
libnvpair1 x86_64 0.6.5.8-1.el7.centos zfs 35 k
libuutil1 x86_64 0.6.5.8-1.el7.centos zfs 41 k
libzfs2 x86_64 0.6.5.8-1.el7.centos zfs 123 k
libzpool2 x86_64 0.6.5.8-1.el7.centos zfs 423 k
spl x86_64 0.6.5.8-1.el7.centos zfs 29 k
spl-dkms noarch 0.6.5.8-1.el7.centos zfs 443 k
zfs x86_64 0.6.5.8-1.el7.centos zfs 334 k
zfs-dkms noarch 0.6.5.8-1.el7.centos zfs 1.9 M
Transaction Summary
==================================================================================================================================================================================================================
Upgrade 8 Packages
Total download size: 3.3 M
Is this ok [y/d/N]: y
Downloading packages:
No Presto metadata available for zfs
(1/8): libuutil1-0.6.5.8-1.el7.centos.x86_64.rpm | 41 kB 00:00:01
(2/8): libnvpair1-0.6.5.8-1.el7.centos.x86_64.rpm | 35 kB 00:00:01
(3/8): libzfs2-0.6.5.8-1.el7.centos.x86_64.rpm | 123 kB 00:00:00
(4/8): spl-0.6.5.8-1.el7.centos.x86_64.rpm | 29 kB 00:00:00
(5/8): libzpool2-0.6.5.8-1.el7.centos.x86_64.rpm | 423 kB 00:00:01
(6/8): zfs-0.6.5.8-1.el7.centos.x86_64.rpm | 334 kB 00:00:00
(7/8): spl-dkms-0.6.5.8-1.el7.centos.noarch.rpm | 443 kB 00:00:01
(8/8): zfs-dkms-0.6.5.8-1.el7.centos.noarch.rpm | 1.9 MB 00:00:01
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 693 kB/s | 3.3 MB 00:00:04
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Updating : libuutil1-0.6.5.8-1.el7.centos.x86_64 1/16
Updating : libnvpair1-0.6.5.8-1.el7.centos.x86_64 2/16
Updating : libzpool2-0.6.5.8-1.el7.centos.x86_64 3/16
Updating : spl-dkms-0.6.5.8-1.el7.centos.noarch 4/16
Loading new spl-0.6.5.8 DKMS files...
Building for 3.10.0-327.22.2.el7.x86_64
Building initial module for 3.10.0-327.22.2.el7.x86_64
Done.
spl:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/3.10.0-327.22.2.el7.x86_64/extra/
splat.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/3.10.0-327.22.2.el7.x86_64/extra/
Adding any weak-modules
depmod....
DKMS: install completed.
Updating : spl-0.6.5.8-1.el7.centos.x86_64 5/16
Updating : zfs-dkms-0.6.5.8-1.el7.centos.noarch 6/16
Loading new zfs-0.6.5.8 DKMS files...
Building for 3.10.0-327.22.2.el7.x86_64
Building initial module for 3.10.0-327.22.2.el7.x86_64
Done.
zavl:
Running module version sanity check.
Good news! Module version 0.6.5.8-1 for zavl.ko
exactly matches what is already found in kernel 3.10.0-327.22.2.el7.x86_64.
DKMS will not replace this module.
You may override by specifying --force.
znvpair.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/3.10.0-327.22.2.el7.x86_64/extra/
zunicode.ko:
Running module version sanity check.
Good news! Module version 0.6.5.8-1 for zunicode.ko
exactly matches what is already found in kernel 3.10.0-327.22.2.el7.x86_64.
DKMS will not replace this module.
You may override by specifying --force.
zcommon.ko:
Running module version sanity check.
Good news! Module version 0.6.5.8-1 for zcommon.ko
exactly matches what is already found in kernel 3.10.0-327.22.2.el7.x86_64.
DKMS will not replace this module.
You may override by specifying --force.
zfs.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/3.10.0-327.22.2.el7.x86_64/extra/
zpios.ko:
Running module version sanity check.
Good news! Module version 0.6.5.8-1 for zpios.ko
exactly matches what is already found in kernel 3.10.0-327.22.2.el7.x86_64.
DKMS will not replace this module.
You may override by specifying --force.
Adding any weak-modules
modinfo: ERROR: Module /lib/modules/3.10.0-327.10.1.el7.x86_64/weak-updates/ not found.
modinfo: ERROR: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zavl.ko not found.
modprobe: FATAL: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zavl.ko not found.
Warning: Module zavl.ko from kernel has no modversions, so it cannot be reused for kernel 3.10.0-327.10.1.el7.x86_64
modinfo: ERROR: Module /lib/modules/3.10.0-327.10.1.el7.x86_64/weak-updates/ not found.
modinfo: ERROR: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zunicode.ko not found.
modprobe: FATAL: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zunicode.ko not found.
Warning: Module zunicode.ko from kernel has no modversions, so it cannot be reused for kernel 3.10.0-327.10.1.el7.x86_64
modinfo: ERROR: Module /lib/modules/3.10.0-327.10.1.el7.x86_64/weak-updates/ not found.
modinfo: ERROR: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zcommon.ko not found.
modprobe: FATAL: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zcommon.ko not found.
Warning: Module zcommon.ko from kernel has no modversions, so it cannot be reused for kernel 3.10.0-327.10.1.el7.x86_64
modinfo: ERROR: Module /lib/modules/3.10.0-327.10.1.el7.x86_64/weak-updates/ not found.
modinfo: ERROR: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zpios.ko not found.
modprobe: FATAL: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zpios.ko not found.
Warning: Module zpios.ko from kernel has no modversions, so it cannot be reused for kernel 3.10.0-327.10.1.el7.x86_64
modinfo: ERROR: Module /lib/modules/3.10.0-327.18.2.el7.x86_64/weak-updates/ not found.
modinfo: ERROR: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zavl.ko not found.
modprobe: FATAL: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zavl.ko not found.
Warning: Module zavl.ko from kernel has no modversions, so it cannot be reused for kernel 3.10.0-327.18.2.el7.x86_64
modinfo: ERROR: Module /lib/modules/3.10.0-327.18.2.el7.x86_64/weak-updates/ not found.
modinfo: ERROR: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zunicode.ko not found.
modprobe: FATAL: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zunicode.ko not found.
Warning: Module zunicode.ko from kernel has no modversions, so it cannot be reused for kernel 3.10.0-327.18.2.el7.x86_64
modinfo: ERROR: Module /lib/modules/3.10.0-327.18.2.el7.x86_64/weak-updates/ not found.
modinfo: ERROR: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zcommon.ko not found.
modprobe: FATAL: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zcommon.ko not found.
Warning: Module zcommon.ko from kernel has no modversions, so it cannot be reused for kernel 3.10.0-327.18.2.el7.x86_64
modinfo: ERROR: Module /lib/modules/3.10.0-327.18.2.el7.x86_64/weak-updates/ not found.
modinfo: ERROR: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zpios.ko not found.
modprobe: FATAL: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zpios.ko not found.
Warning: Module zpios.ko from kernel has no modversions, so it cannot be reused for kernel 3.10.0-327.18.2.el7.x86_64
modinfo: ERROR: Module /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/ not found.
modinfo: ERROR: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zavl.ko not found.
modprobe: FATAL: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zavl.ko not found.
Warning: Module zavl.ko from kernel has no modversions, so it cannot be reused for kernel 3.10.0-327.28.3.el7.x86_64
modinfo: ERROR: Module /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/ not found.
modinfo: ERROR: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zunicode.ko not found.
modprobe: FATAL: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zunicode.ko not found.
Warning: Module zunicode.ko from kernel has no modversions, so it cannot be reused for kernel 3.10.0-327.28.3.el7.x86_64
modinfo: ERROR: Module /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/ not found.
modinfo: ERROR: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zcommon.ko not found.
modprobe: FATAL: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zcommon.ko not found.
Warning: Module zcommon.ko from kernel has no modversions, so it cannot be reused for kernel 3.10.0-327.28.3.el7.x86_64
modinfo: ERROR: Module /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/ not found.
modinfo: ERROR: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zpios.ko not found.
modprobe: FATAL: Module /lib/modules/3.10.0-327.22.2.el7.x86_64/zpios.ko not found.
Warning: Module zpios.ko from kernel has no modversions, so it cannot be reused for kernel 3.10.0-327.28.3.el7.x86_64
depmod....
DKMS: install completed.
Updating : libzfs2-0.6.5.8-1.el7.centos.x86_64 7/16
Updating : zfs-0.6.5.8-1.el7.centos.x86_64 8/16
Cleanup : zfs-0.6.5.7-1.el7.centos.x86_64 9/16
Uninstall of zfs module (version 0.6.5.7) beginning:
-------- Uninstall Beginning --------
Module: zfs
Version: 0.6.5.7
Kernel: 3.10.0-327.10.1.el7.x86_64 (x86_64)
-------------------------------------
Status: Before uninstall, this module version was ACTIVE on this kernel.
Removing any linked weak-modules
rmdir: failed to remove '.': Invalid argument
rmdir: failed to remove '.': Invalid argument
rmdir: failed to remove '.': Invalid argument
zavl.ko:
- Uninstallation
- Deleting from: /lib/modules/3.10.0-327.10.1.el7.x86_64/
rmdir: failed to remove ‘’: No such file or directory
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.
znvpair.ko:
- Uninstallation
- Deleting from: /lib/modules/3.10.0-327.10.1.el7.x86_64/
rmdir: failed to remove ‘’: No such file or directory
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.
zunicode.ko:
- Uninstallation
- Deleting from: /lib/modules/3.10.0-327.10.1.el7.x86_64/
rmdir: failed to remove ‘’: No such file or directory
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.
zcommon.ko:
- Uninstallation
- Deleting from: /lib/modules/3.10.0-327.10.1.el7.x86_64/
rmdir: failed to remove ‘’: No such file or directory
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.
zfs.ko:
- Uninstallation
- Deleting from: /lib/modules/3.10.0-327.10.1.el7.x86_64/extra/
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.
zpios.ko:
- Uninstallation
- Deleting from: /lib/modules/3.10.0-327.10.1.el7.x86_64/
rmdir: failed to remove ‘’: No such file or directory
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.
depmod....
DKMS: uninstall completed.
------------------------------
Deleting module version: 0.6.5.7
completely from the DKMS tree.
------------------------------
Done.
Cleanup : zfs-dkms-0.6.5.7-1.el7.centos.noarch 10/16
Cleanup : libzfs2-0.6.5.7-1.el7.centos.x86_64 11/16
Cleanup : libzpool2-0.6.5.7-1.el7.centos.x86_64 12/16
Cleanup : libnvpair1-0.6.5.7-1.el7.centos.x86_64 13/16
Cleanup : spl-0.6.5.7-1.el7.centos.x86_64 14/16
Uninstall of spl module (version 0.6.5.7) beginning:
-------- Uninstall Beginning --------
Module: spl
Version: 0.6.5.7
Kernel: 3.10.0-327.10.1.el7.x86_64 (x86_64)
-------------------------------------
Status: Before uninstall, this module version was ACTIVE on this kernel.
Removing any linked weak-modules
rmdir: failed to remove '.': Invalid argument
depmod: WARNING: /lib/modules/3.10.0-327.10.1.el7.x86_64/extra/zpios.ko needs unknown symbol dmu_tx_hold_write
depmod: WARNING: /lib/modules/3.10.0-327.10.1.el7.x86_64/extra/zpios.ko needs unknown symbol dmu_read
depmod: WARNING: /lib/modules/3.10.0-327.10.1.el7.x86_64/extra/zpios.ko needs unknown symbol dmu_tx_assign
depmod: WARNING: /lib/modules/3.10.0-327.10.1.el7.x86_64/extra/zpios.ko needs unknown symbol dmu_tx_create
depmod: WARNING: /lib/modules/3.10.0-327.10.1.el7.x86_64/extra/zpios.ko needs unknown symbol dmu_object_alloc
depmod: WARNING: /lib/modules/3.10.0-327.10.1.el7.x86_64/extra/zpios.ko needs unknown symbol dmu_object_free
depmod: WARNING: /lib/modules/3.10.0-327.10.1.el7.x86_64/extra/zpios.ko needs unknown symbol dmu_objset_own
depmod: WARNING: /lib/modules/3.10.0-327.10.1.el7.x86_64/extra/zpios.ko needs unknown symbol dsl_destroy_head
depmod: WARNING: /lib/modules/3.10.0-327.10.1.el7.x86_64/extra/zpios.ko needs unknown symbol dmu_write
depmod: WARNING: /lib/modules/3.10.0-327.10.1.el7.x86_64/extra/zpios.ko needs unknown symbol dmu_objset_disown
depmod: WARNING: /lib/modules/3.10.0-327.10.1.el7.x86_64/extra/zpios.ko needs unknown symbol dmu_tx_commit
depmod: WARNING: /lib/modules/3.10.0-327.10.1.el7.x86_64/extra/zpios.ko needs unknown symbol dmu_tx_wait
depmod: WARNING: /lib/modules/3.10.0-327.10.1.el7.x86_64/extra/zpios.ko needs unknown symbol dmu_tx_abort
depmod: WARNING: /lib/modules/3.10.0-327.10.1.el7.x86_64/extra/zpios.ko needs unknown symbol dmu_object_set_blocksize
depmod: WARNING: /lib/modules/3.10.0-327.10.1.el7.x86_64/extra/zpios.ko needs unknown symbol dmu_objset_create
depmod: WARNING: /lib/modules/3.10.0-327.10.1.el7.x86_64/extra/zpios.ko needs unknown symbol dmu_tx_hold_free
depmod: ERROR: fstatat(4, zfs.ko): No such file or directory
depmod: WARNING: /lib/modules/3.10.0-327.18.2.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_tx_hold_write
depmod: WARNING: /lib/modules/3.10.0-327.18.2.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_read
depmod: WARNING: /lib/modules/3.10.0-327.18.2.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_tx_assign
depmod: WARNING: /lib/modules/3.10.0-327.18.2.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_tx_create
depmod: WARNING: /lib/modules/3.10.0-327.18.2.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_object_alloc
depmod: WARNING: /lib/modules/3.10.0-327.18.2.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_object_free
depmod: WARNING: /lib/modules/3.10.0-327.18.2.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_objset_own
depmod: WARNING: /lib/modules/3.10.0-327.18.2.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dsl_destroy_head
depmod: WARNING: /lib/modules/3.10.0-327.18.2.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_write
depmod: WARNING: /lib/modules/3.10.0-327.18.2.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_objset_disown
depmod: WARNING: /lib/modules/3.10.0-327.18.2.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_tx_commit
depmod: WARNING: /lib/modules/3.10.0-327.18.2.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_tx_wait
depmod: WARNING: /lib/modules/3.10.0-327.18.2.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_tx_abort
depmod: WARNING: /lib/modules/3.10.0-327.18.2.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_object_set_blocksize
depmod: WARNING: /lib/modules/3.10.0-327.18.2.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_objset_create
depmod: WARNING: /lib/modules/3.10.0-327.18.2.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_tx_hold_free
depmod: ERROR: fstatat(4, zfs.ko): No such file or directory
depmod: ERROR: fstatat(4, zfs.ko): No such file or directory
depmod: WARNING: /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_tx_hold_write
depmod: WARNING: /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_read
depmod: WARNING: /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_tx_assign
depmod: WARNING: /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_tx_create
depmod: WARNING: /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_object_alloc
depmod: WARNING: /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_object_free
depmod: WARNING: /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_objset_own
depmod: WARNING: /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dsl_destroy_head
depmod: WARNING: /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_write
depmod: WARNING: /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_objset_disown
depmod: WARNING: /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_tx_commit
depmod: WARNING: /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_tx_wait
depmod: WARNING: /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_tx_abort
depmod: WARNING: /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_object_set_blocksize
depmod: WARNING: /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_objset_create
depmod: WARNING: /lib/modules/3.10.0-327.28.3.el7.x86_64/weak-updates/zpios.ko needs unknown symbol dmu_tx_hold_free
spl.ko:
- Uninstallation
- Deleting from: /lib/modules/3.10.0-327.10.1.el7.x86_64/
rmdir: failed to remove ‘’: No such file or directory
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.
splat.ko:
- Uninstallation
- Deleting from: /lib/modules/3.10.0-327.10.1.el7.x86_64/extra/
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.
depmod....
DKMS: uninstall completed.
------------------------------
Deleting module version: 0.6.5.7
completely from the DKMS tree.
------------------------------
Done.
Cleanup : spl-dkms-0.6.5.7-1.el7.centos.noarch 15/16
Cleanup : libuutil1-0.6.5.7-1.el7.centos.x86_64 16/16
Verifying : libnvpair1-0.6.5.8-1.el7.centos.x86_64 1/16
Verifying : libzfs2-0.6.5.8-1.el7.centos.x86_64 2/16
Verifying : zfs-0.6.5.8-1.el7.centos.x86_64 3/16
Verifying : spl-0.6.5.8-1.el7.centos.x86_64 4/16
Verifying : libuutil1-0.6.5.8-1.el7.centos.x86_64 5/16
Verifying : zfs-dkms-0.6.5.8-1.el7.centos.noarch 6/16
Verifying : libzpool2-0.6.5.8-1.el7.centos.x86_64 7/16
Verifying : spl-dkms-0.6.5.8-1.el7.centos.noarch 8/16
Verifying : spl-0.6.5.7-1.el7.centos.x86_64 9/16
Verifying : zfs-0.6.5.7-1.el7.centos.x86_64 10/16
Verifying : libzfs2-0.6.5.7-1.el7.centos.x86_64 11/16
Verifying : libnvpair1-0.6.5.7-1.el7.centos.x86_64 12/16
Verifying : libuutil1-0.6.5.7-1.el7.centos.x86_64 13/16
Verifying : spl-dkms-0.6.5.7-1.el7.centos.noarch 14/16
Verifying : zfs-dkms-0.6.5.7-1.el7.centos.noarch 15/16
Verifying : libzpool2-0.6.5.7-1.el7.centos.x86_64 16/16
Updated:
libnvpair1.x86_64 0:0.6.5.8-1.el7.centos libuutil1.x86_64 0:0.6.5.8-1.el7.centos libzfs2.x86_64 0:0.6.5.8-1.el7.centos libzpool2.x86_64 0:0.6.5.8-1.el7.centos spl.x86_64 0:0.6.5.8-1.el7.centos
spl-dkms.noarch 0:0.6.5.8-1.el7.centos zfs.x86_64 0:0.6.5.8-1.el7.centos zfs-dkms.noarch 0:0.6.5.8-1.el7.centos
Complete!
[root@t3nfs02 ~]#
Crashes
Crash 2015-11-12
Symptom
Uncorrectable Machine Check Exception
More... Close
EVENT (11 Nov 23:19): Uncorrectable Machine Check Exception (Board 0, Processor 1, APIC ID 0x00000000, Bank 0x00000011, Status 0xFE200000'000C110A, Address 0x00000000'80102000, Misc 0xA4FFE016'06100086)
Integrated Management Log Severity: CRITICAL
iLO IP: https://192.168.2.82
iLO Name: ILOCZJ5390FSB
| ProLiant Gen9, P89 07/20/2015
Server UUID: 30393137-3136-5A43-4A35-333930465342
[root@t3nfs02 ~]# hplog -v
ID Severity Initial Time Update Time Count
-------------------------------------------------------------
...
0003 Critical 22:19 11/11/2015 22:19 11/11/2015 0001
LOG: Uncorrectable Machine Check Exception (Board 0, Processor 1, APIC ID 0x00000000, Bank 0x00000011, Status 0xFE200000'000C110A, Address 0x00000000'80102000, Misc 0xA4FFE016'06100086)
Reaction - case number
4652815076
Crash 2016-05-02
More... Close
[235196.883815] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[235196.883815] CR2: 0000000000875e28 CR3: 0000000f46f1a000 CR4: 00000000001407f0
[235196.883816] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[235196.883817] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[235196.883817] Stack:
[235196.883821] 0000000100000000 ffffffff81a68260 ffffffff819c3680 ffffffff81065d30
[235196.883824] 0000000000000000 ffff8804c99a3a00 000000000000ce1c ffff8804c99a3958
[235196.883828] ffffffff810e6f9d ffffffff819c3680 ffff8804c99a39a8 ffff8804c99a39f8
[235196.883828] Call Trace:
[235196.883831] [] ? flush_tlb_func+0xb0/0xb0
[235196.883833] [] on_each_cpu+0x2d/0x60
[235196.883835] [] flush_tlb_kernel_range+0x59/0xa0
[235196.883838] [] __purge_vmap_area_lazy+0x1a0/0x210
[235196.883840] [] free_vmap_area_noflush+0x7c/0x90
[235196.883842] [] remove_vm_area+0x5e/0x70
[235196.883844] [] __vunmap+0x2a/0x100
[235196.883847] [] vfree+0x36/0x70
[235196.883852] [] spl_kmem_free_impl+0x35/0x40 [spl]
[235196.883856] [] spl_vmem_free+0xe/0x10 [spl]
[235196.883874] [] dmu_recv_stream+0x145/0xb90 [zfs]
[235196.883880] [] ? nvlist_common.part.102+0x10a/0x210 [znvpair]
[235197.191537] BUG: soft lockup - CPU#20 stuck for 22s! [migration/20:189]
[235203.227005] md: delaying data-check of md4 until md3 has finished (they share one or more physical units)
[235203.227006] md: delaying data-check of md0 until md3 has finished (they share one or more physical units)
[235203.227008] md: delaying data-check of md2 until md3 has finished (they share one or more physical units)
[235203.227027] md: delaying data-check of md8 until md3 has finished (they share one or more physical units)
[235203.227034] md: delaying data-check of md1 until md3 has finished (they share one or more physical units)
[235203.438837] md: delaying data-check of md0 until md3 has finished (they share one or more physical units)
[235203.438846] md: delaying data-check of md2 until md3 has finished (they share one or more physical units)
[235203.438847] md: delaying data-check of md4 until md3 has finished (they share one or more physical units)
[235203.438869] md: delaying data-check of md1 until md3 has finished (they share one or more physical units)
[235203.438876] md: delaying data-check of md8 until md3 has finished (they share one or more physical units)
[235212.117881] INFO: rcu_sched detected stalls on CPUs/tasks: { 17} (detected by 8, t=121563377 jiffies, g=520797, c=520796, q=0)
[235212.117882] sending NMI to all CPUs:
[235212.117884] NMI backtrace for cpu 1
[235212.117886] CPU: 1 PID: 93 Comm: migration/1 Tainted: P W OEL ------------ 3.10.0-327.10.1.el7.x86_64 #1
[235212.117886] Hardware name: HP ProLiant DL380 Gen9, BIOS P89 07/20/2015
[235212.117887] task: ffff880853de8b80 ti: ffff880853df0000 task.ti: ffff880853df0000
[235212.117890] RIP: 0010:[] [] multi_cpu_stop+0x83/0xf0
[235212.117890] RSP: 0000:ffff880853df3d90 EFLAGS: 00000293
[235212.117891] RAX: ffffffff81661260 RBX: ffff88083da97b90 RCX: dead000000200200
[235212.117892] RDX: 0000000000000001 RSI: 0000000000000286 RDI: ffff88083da97b90
[235212.117892] RBP: ffff880853df3db0 R08: 0000000000000000 R09: 0000000000000001
[235212.117893] R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000001
[235212.117894] R13: ffff88083da97b00 R14: 0000000000000286 R15: ffff880853df3fd8
[235212.117895] FS: 0000000000000000(0000) GS:ffff88085f840000(0000) knlGS:0000000000000000
[235212.117896] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[235212.117897] CR2: 00007ffd4062a108 CR3: 000000104656a000 CR4: 00000000001407e0
[235212.117904] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[235212.117905] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[235212.117905] Stack:
[235212.117910] ffff88083da97bb8 ffff88085f84dd00 ffff88083da97b90 ffffffff81103270
[235212.117914] ffff880853df3e78 ffffffff811034f8 ffff88085f84dd08 0000000000000000
[235212.117917] 0000000000000000 ffff88085f854780 ffff8810511ee400 0000000000000000
[235212.117917] Call Trace:
[235212.117920] [] ? cpu_stop_should_run+0x50/0x50
[235212.117922] [] cpu_stopper_thread+0x88/0x160
[235212.117925] [] ? __schedule+0x2d8/0x900
[235212.117928] [] smpboot_thread_fn+0xff/0x1a0
[235212.117930] [] ? schedule+0x29/0x70
[235212.117932] [] ? lg_double_unlock+0x90/0x90
[235212.117935] [] kthread+0xcf/0xe0
[235212.117938] [] ? kthread_create_on_node+0x140/0x140
[235212.117940] [] ret_from_fork+0x58/0x90
[235212.117943] [] ? kthread_create_on_node+0x140/0x140
[235212.117960] Code: ed 75 65 f0 ff 4b 24 0f 94 c1 84 c9 44 89 e2 74 0f 8b 43 20 8b 73 10 8d 48 01 89 73 24 89 4b 20 83 fa 04 74 23 f3 90 44 8b 63 20 <41> 39 d4 74 f0 41 83 fc 02 75 c2 fa 66 0f 1f 44 00 00 eb c4 66
[235212.117961] NMI backtrace for cpu 3
[235212.117962] CPU: 3 PID: 103 Comm: migration/3 Tainted: P W OEL ------------ 3.10.0-327.10.1.el7.x86_64 #1
[235212.117963] Hardware name: HP ProLiant DL380 Gen9, BIOS P89 07/20/2015
[235212.117964] task: ffff880853fd0000 ti: ffff880853fc4000 task.ti: ffff880853fc4000
[235212.117966] RIP: 0010:[] [] multi_cpu_stop+0x7f/0xf0
[235212.117967] RSP: 0000:ffff880853fc7d90 EFLAGS: 00000293
[235212.117968] RAX: ffffffff81661260 RBX: ffff880499d9fb90 RCX: dead000000200200
[235212.117968] RDX: 0000000000000001 RSI: 0000000000000286 RDI: ffff880499d9fb90
[235212.117969] RBP: ffff880853fc7db0 R08: 0000000000000000 R09: 0000000000000001
[235212.117969] R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000001
[235212.117970] R13: ffff880499d9fb00 R14: 0000000000000286 R15: ffff880853fc7fd8
[235212.117971] FS: 0000000000000000(0000) GS:ffff88085f8c0000(0000) knlGS:0000000000000000
[235212.117972] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[235212.117972] CR2: 00007ffe2555ea58 CR3: 0000000eeab13000 CR4: 00000000001407e0
[235212.117973] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[235212.117974] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[235212.117974] Stack:
[235212.117978] ffff880499d9fbb8 ffff88085f8cdd00 ffff880499d9fb90 ffffffff81103270
[235212.117981] ffff880853fc7e78 ffffffff811034f8 ffff88085f8cdd08 0000000000000000
[235212.117984] 0000000000000000 ffff88085f8d4780 ffff881051305780 0000000000000000
[235212.117985] Call Trace:
[235212.117987] [] ? cpu_stop_should_run+0x50/0x50
[235212.117990] [] cpu_stopper_thread+0x88/0x160
[235212.117992] [] ? __schedule+0x2d8/0x900
[235212.117995] [] smpboot_thread_fn+0xff/0x1a0
[235212.117997] [] ? schedule+0x29/0x70
[235212.117999] [] ? lg_double_unlock+0x90/0x90
[235212.118002] [] kthread+0xcf/0xe0
[235212.118005] [] ? kthread_create_on_node+0x140/0x140
[235212.118007] [] ret_from_fork+0x58/0x90
[235212.118010] [] ? kthread_create_on_node+0x140/0x140
[235212.118027] Code: 75 05 45 84 ed 75 65 f0 ff 4b 24 0f 94 c1 84 c9 44 89 e2 74 0f 8b 43 20 8b 73 10 8d 48 01 89 73 24 89 4b 20 83 fa 04 74 23 f3 90 <44> 8b 63 20 41 39 d4 74 f0 41 83 fc 02 75 c2 fa 66 0f 1f 44 00
[235212.118027] NMI backtrace for cpu 23
[235212.118028] CPU: 23 PID: 204 Comm: migration/23 Tainted: P W OEL ------------ 3.10.0-327.10.1.el7.x86_64 #1
[235212.118029] Hardware name: HP ProLiant DL380 Gen9, BIOS P89 07/20/2015
[235212.118889] R13: ffff8804923f3b00 R14: 0000000000000286 R15: ffff8808538b7fd8
[235212.119223] RSP: 0018:ffff880853dbbe10 EFLAGS: 00000046
[235212.119224] RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001
[235212.119225] RDX: 0000000000000000 RSI: ffff880853dbbfd8 RDI: 000000000000001d
[235212.119225] RBP: ffff880853dbbe40 R08: 0000000000000ab6 R09: 0000000000000018
[235212.119226] R10: 0000000000000b44 R11: 0000000000002e00 R12: ffff880853dbbfd8
[235212.119227] R13: 0000000000000004 R14: 0000000000000020 R15: ffffffff819fdeb8
[235212.119228] FS: 0000000000000000(0000) GS:ffff88085fcc0000(0000) knlGS:0000000000000000
[235212.119229] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[235212.119229] CR2: 00000000021b2e78 CR3: 000000000194a000 CR4: 00000000001407e0
[235212.119230] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[235212.119231] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[235212.119231] Stack:
[235212.119235] 0000001d53dbbe40 df3373102ff725f9 ffffe8f8004c0200 ffffffff819fdd40
[235212.119239] 0000d617405ac97e 0000000000000004 ffff880853dbbe78 ffffffff814d4600
[235212.119242] ffffe8f8004c0200 0000000000000004 0000000000000004 ffffffff819fdd40
[235212.119243] Call Trace:
[235212.119246] [] cpuidle_enter_state+0x40/0xc0
[235212.119249] [] cpuidle_idle_call+0xd9/0x210
[235212.119252] [] arch_cpu_idle+0xe/0x30
[235212.119254] [] cpu_startup_entry+0x245/0x290
[235212.119256] [] start_secondary+0x1ba/0x230
[235212.119274] Code: 31 d2 65 48 8b 34 25 b8 b7 00 00 48 89 d1 48 8d 86 38 c0 ff ff 0f 01 c8 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 f0 0f 01 c9 <65> 48 8b 04 25 b8 b7 00 00 f0 80 a0 3a c0 ff ff 7f 85 1d 7a fe
[235212.119275] NMI backtrace for cpu 9
[235212.119276] CPU: 9 PID: 133 Comm: migration/9 Tainted: P W OEL ------------ 3.10.0-327.10.1.el7.x86_64 #1
[235212.119277] Hardware name: HP ProLiant DL380 Gen9, BIOS P89 07/20/2015
[235212.119278] task: ffff8808538cdc00 ti: ffff88085392c000 task.ti: ffff88085392c000
[235212.119281] RIP: 0010:[] [] multi_cpu_stop+0x83/0xf0
[235212.119281] RSP: 0000:ffff88085392fd90 EFLAGS: 00000293
[235212.119282] RAX: ffffffff81661260 RBX: ffff88048801fb90 RCX: dead000000200200
[235212.119283] RDX: 0000000000000001 RSI: 0000000000000286 RDI: ffff88048801fb90
[235212.119283] RBP: ffff88085392fdb0 R08: 0000000000000000 R09: 0000000000000001
[235212.119284] R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000001
[235212.119284] R13: ffff88048801fb00 R14: 0000000000000286 R15: ffff88085392ffd8
[235212.119285] FS: 0000000000000000(0000) GS:ffff88085fa40000(0000) knlGS:0000000000000000
[235212.119286] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[235212.119287] CR2: 00007ffe93741308 CR3: 0000000e6d5aa000 CR4: 00000000001407e0
[235212.119287] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[235212.119288] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[235212.119288] Stack:
[235212.119292] ffff88048801fbb8 ffff88085fa4dd00 ffff88048801fb90 ffffffff81103270
[235212.119295] ffff88085392fe78 ffffffff811034f8 ffff88085fa4dd08 0000000000000000
[235212.119299] 0000000000000000 ffff88085fa54780 ffff881051306a40 0000000000000000
[235212.119299] Call Trace:
[235212.119302] [] ? cpu_stop_should_run+0x50/0x50
[235212.119304] [] cpu_stopper_thread+0x88/0x160
[235212.119307] [] ? __schedule+0x2d8/0x900
[235212.119310] [] smpboot_thread_fn+0xff/0x1a0
[235212.119312] [] ? schedule+0x29/0x70
[235212.119314] [] ? lg_double_unlock+0x90/0x90
[235212.119317] [] kthread+0xcf/0xe0
[235212.119319] [] ? kthread_create_on_node+0x140/0x140
[235212.119322] [] ret_from_fork+0x58/0x90
[235212.119324] [] ? kthread_create_on_node+0x140/0x140
[235212.119339] Code: ed 75 65 f0 ff 4b 24 0f 94 c1 84 c9 44 89 e2 74 0f 8b 43 20 8b 73 10 8d 48 01 89 73 24 89 4b 20 83 fa 04 74 23 f3 90 44 8b 63 20 <41> 39 d4 74 f0 41 83 fc 02 75 c2 fa 66 0f 1f 44 00 00 eb c4 66
[235212.119340] NMI backtrace for cpu 17
[235212.119342] CPU: 17 PID: 27871 Comm: md6_resync Tainted: P W OEL ------------ 3.10.0-327.10.1.el7.x86_64 #1
[235212.119343] Hardware name: HP ProLiant DL380 Gen9, BIOS P89 07/20/2015
[235212.119344] task: ffff88056bf12280 ti: ffff880f4ad84000 task.ti: ffff880f4ad84000
[235212.119347] RIP: 0010:[] [] native_read_tsc+0x6/0x20
[235212.119348] RSP: 0018:ffff880f4ad87a78 EFLAGS: 00000046
[235212.119349] RAX: 00000000961f1d88 RBX: 00000000961f1919 RCX: 0000000000000000
[235212.119349] RDX: 00000000000231b4 RSI: 00000000000002fd RDI: 0000000000000a2a
[235212.119350] RBP: ffff880f4ad87a78 R08: ffffffff81a67fe0 R09: 0000000000000000
[235212.119351] R10: 0000000000000000 R11: ffff880f4ad879c6 R12: 0000000000000a2a
[235212.119351] R13: 0000000000000011 R14: ffffffff81ca99e4 R15: 0000000000000044
[235212.119353] FS: 0000000000000000(0000) GS:ffff88105f1c0000(0000) knlGS:0000000000000000
[235212.119353] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[235212.119354] CR2: 00007fd88bdc4000 CR3: 000000000194a000 CR4: 00000000001407e0
[235212.119355] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[235212.119355] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[235212.119356] Stack:
[235212.119360] ffff880f4ad87aa0 ffffffff8130053a ffffffff81f17328 0000000000002335
[235212.119364] 0000000000000020 ffff880f4ad87ab0 ffffffff81300488 ffff880f4ad87ad8
[235212.119367] ffffffff813d0a20 ffffffff81f17328 0000000000000070 ffffffff81f17328
[235212.119368] Call Trace:
[235212.119370] [] delay_tsc+0x4a/0x80
[235212.119373] [] __const_udelay+0x28/0x30
[235212.119375] [] wait_for_xmitr+0x30/0xa0
[235212.119378] [] serial8250_console_putchar+0x1c/0x30
[235212.119380] [] ? serial8250_co
Crash 2016-10-08
More... Close
[422740.330753] BUG: soft lockup - CPU#13 stuck for 22s! [khugepaged:305]
[422740.330777] Modules linked in: binfmt_misc bonding zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) intel_powerclamp spl(OE) coretemp vfat fat zlib_deflate intel_rapl kvm_intel kvm ipmi_ssif crc32_pclmul iTCO_wdt ghash_clmulni_intel iTCO_vendor_support aesni_intel lrw ses gf128mul enclosure glue_helper ipmi_si sb_edac ablk_helper hpwdt pcspkr lpc_ich hpilo sg cryptd pcc_cpufreq i2c_i801 ioatdma ipmi_msghandler edac_core mfd_core shpchp wmi acpi_power_meter nfsd auth_rpcgss nfs_acl lockd grace openafs(POE) sunrpc ip_tables xfs libcrc32c raid1 sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit ixgbe drm_kms_helper mdio ttm tg3 crct10dif_pclmul dca crct10dif_common drm crc32c_intel ptp i2c_core hpsa pps_core
[422740.330779] CPU: 13 PID: 305 Comm: khugepaged Tainted: P W OEL ------------ 3.10.0-327.36.1.el7.x86_64 #1
[422740.330779] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 06/02/2016
[422740.330780] task: ffff8808515a3980 ti: ffff8808515a8000 task.ti: ffff8808515a8000
[422740.330782] RIP: 0010:[] [] smp_call_function_many+0x202/0x260
[422740.330782] RSP: 0018:ffff8808515abbb8 EFLAGS: 00000202
[422740.330783] RAX: 000000000000000a RBX: 000000280000000d RCX: ffff88105f01a9d8
[422740.330783] RDX: 000000000000000a RSI: 0000000000000028 RDI: 0000000000000000
[422740.330784] RBP: ffff8808515abbf0 R08: ffff881053d15000 R09: ffff88105f0d9620
[422740.330938] R10: ffffea0006feb600 R11: ffffffff812f2a59 R12: 000000fc812f29af
[422740.330939] R13: 0000000000000296 R14: 0000000000000296 R15: ffff8808515abb68
[422740.330940] FS: 0000000000000000(0000) GS:ffff88105f0c0000(0000) knlGS:0000000000000000
[422740.330941] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[422740.330941] CR2: 00007efd14a97000 CR3: 000000000194a000 CR4: 00000000001407e0
[422740.330941] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[422740.330942] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[422740.330942] Stack:
[422740.330944] 00000001000001fe ffffffff81e86580 ffffffff81e86580 ffffffff81171d80
[422740.330946] 0000000000000000 000000000000000d ffff88087ffda000 ffff8808515abc20
[422740.330948] ffffffff810e71ca 0000000000000027 0000000000000028 0000000000000028
[422740.330949] Call Trace:
[422740.330951] [] ? drain_pages+0xb0/0xb0
[422740.330953] [] on_each_cpu_mask+0x2a/0x60
[422740.330954] [] drain_all_pages+0xb5/0xc0
[422740.330956] [] __alloc_pages_nodemask+0x8a2/0xba0
[422740.330959] [] khugepaged_scan_mm_slot+0x419/0xc60
[422740.330960] [] ? schedule_timeout+0x17d/0x2d0
[422740.330962] [] khugepaged+0x257/0x480
[422740.330966] [] ? khugepaged_scan_mm_slot+0xc60/0xc60
[422768.158422] task: ffff880d0f2a0b80 ti: ffff880d9e458000 task.ti: ffff880d9e458000
[422768.158791] [] system_call_fastpath+0x16/0x1b
[422768.158808] Code: 48 63 35 96 37 98 00 89 c2 39 f0 0f 8d 86 fe ff ff 48 98 49 8b 0f 48 03 0c c5 20 c8 a5 81 f6 41 20 01 74 cd 0f 1f 44 00 00 f3 90 41 20 01 75 f8 48 63 35 65 37 98 00 eb b7 0f b6 4d cc 4c 89
[422768.184374] BUG: soft lockup - CPU#8 stuck for 22s! [migration/8:128]
[422768.184390] Modules linked in: binfmt_misc bonding zfs(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) intel_powerclamp spl(OE) coretemp vfat fat zlib_deflate intel_rapl kvm_intel kvm ipmi_ssif crc32_pclmul iTCO_wdt ghash_clmulni_intel iTCO_vendor_support aesni_intel lrw ses gf128mul enclosure glue_helper ipmi_si sb_edac ablk_helper hpwdt pcspkr lpc_ich hpilo sg cryptd pcc_cpufreq i2c_i801 ioatdma ipmi_msghandler edac_core mfd_core shpchp wmi acpi_power_meter nfsd auth_rpcgss nfs_acl lockd grace openafs(POE) sunrpc ip_tables xfs libcrc32c raid1 sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit ixgbe drm_kms_helper mdio ttm tg3 crct10dif_pclmul dca crct10dif_common drm crc32c_intel ptp i2c_core hpsa pps_core
[422768.184391] CPU: 8 PID: 128 Comm: migration/8 Tainted: P W OEL ------------ 3.10.0-327.36.1.el7.x86_64 #1
[422768.184392] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 06/02/2016
[422768.184393] task: ffff8808538d2280 ti: ffff8808538f4000 task.ti: ffff8808538f4000
[422768.184396] RIP: 0010:[] [] multi_cpu_stop+0x83/0xf0
[422768.184397] RSP: 0000:ffff8808538f7d88 EFLAGS: 00000293
[422768.184397] RAX: ffffffff81661260 RBX: 00000000000167c0 RCX: dead000000200200
[422768.184398] RDX: 0000000000000001 RSI: 0000000000000282 RDI: ffff881052ee3b80
[422768.184398] RBP: ffff8808538f7da8 R08: 0000000000000000 R09: 0000000000000001
[422768.184399] R10: 000000000000beec R11: 0000000000000002 R12: 00000000000167c0
[422768.184399] R13: ffff88085345a000 R14: ffff88085f41a000 R15: 0000000000000000
[422768.184400] FS: 0000000000000000(0000) GS:ffff88085fa00000(0000) knlGS:0000000000000000
[422768.184400] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[422768.184401] CR2: 00007ffcaddf1f18 CR3: 000000000194a000 CR4: 00000000001407e0
[422768.184402] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[422768.184402] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[422768.184402] Stack:
[422768.184405] ffff88085fa0fce8 ffff88085fa0fce0 ffff881052ee3b80 ffff881052ee3ba8
[422768.184407] ffff8808538f7e78 ffffffff811036c6 ffff8808538f7fd8 ffff88085fa0fcf0
[422768.184409] 0000000000000000 0000000000000000 ffff88085fa167c0 ffff88105085be80
[422768.184409] Call Trace:
[422768.184412] [] cpu_stopper_thread+0x96/0x170
[422768.184414] [] ? __schedule+0x2d8/0x900
[422768.184416] [] smpboot_thread_fn+0xff/0x1a0
[422768.184418] [] ? schedule+0x29/0x70
[422768.184420] [] ? lg_double_unlock+0x90/0x90
[422768.184422] [] kthread+0xcf/0xe0
[422768.184424] [] ? kthread_create_on_node+0x140/0x140
[422768.184426] [] ret_from_fork+0x58/0x90
[422768.184428] [] ? kthread_create_on_node+0x140/0x140
[422768.184438] Code: ed 75 65 f0 ff 4b 24 0f 94 c1 84 c9 44 89 e2 74 0f 8b 43 20 8b 73 10 8d 48 01 89 73 24 89 4b 20 83 fa 04 74 23 f3 90 44 8b 63 20 <41> 39 d4 74 f0 41 83 fc 02 75 c2 fa 66 0f 1f 44 00 00 eb c4 66
Crash 2017-01-17
Intentionally causing a multipath crash on this server
Here we put
offline
a E2760 RAID Controller and we observe the multipath reaction ; regreatably even if we put back
online
that same RAID controller the multipath driver will never recover
[87389.683943] scsi 2:0:8:0: alua: port group 00 state N non-preferred supports TolUsNA
[87389.683944] scsi 2:0:8:0: alua: Attached
[87389.684099] sd 2:0:8:0: Attached scsi generic sg25 type 0
[87389.684100] sd 2:0:8:0: Embedded Enclosure Device
[87391.104496] sd 2:0:8:0: [sdv] 223620156621 512-byte logical blocks: (114 TB/104 TiB)
[87391.104787] sd 1:0:4:0: alua: rtpg failed with 8000002
[87391.104983] sd 1:0:4:0: alua: port group 01 state A non-preferred supports TolUsNA
[87392.091326] sd 2:0:8:0: [sdv] 4096-byte physical blocks
[87392.345337] sd 2:0:8:0: [sdv] Write Protect is off
[87392.577500] sd 2:0:8:0: [sdv] Mode Sense: 83 00 10 08
[87392.577956] sd 2:0:8:0: [sdv] Write cache: enabled, read cache: enabled, supports DPO and FUA
[87392.580431] sds: unknown partition table
[87392.583077] sd 1:0:3:0: alua: rtpg failed with 8000002
[87392.583426] sd 1:0:3:0: alua: port group 01 state A non-preferred supports TolUsNA
[87392.583983] sd 1:0:4:0: alua: rtpg failed with 8000002
[87392.584294] sd 1:0:4:0: alua: port group 01 state A non-preferred supports TolUsNA
[87392.584403] sdt: unknown partition table
[87392.587752] sdu: unknown partition table
[87392.590777] sd 1:0:1:0: alua: rtpg failed with 8000002
[87392.591164] sd 1:0:1:0: alua: port group 00 state A non-preferred supports TolUsNA
[87392.591827] sd 1:0:2:0: alua: rtpg failed with 8000002
[87392.591981] sds: unknown partition table
[87392.592133] sd 1:0:2:0: alua: port group 00 state A non-preferred supports TolUsNA
[87392.600123] sdt: unknown partition table
[87392.605096] sdu: unknown partition table
[87396.619717] sdv: unknown partition table
[87396.815220] sd 2:0:8:0: [sdv] Attached SCSI disk
[87397.047678] sd 1:0:1:0: alua: rtpg failed with 8000002
[87397.296956] sd 1:0:1:0: alua: port group 00 state A non-preferred supports TolUsNA
[87397.660170] sd 1:0:2:0: alua: rtpg failed with 8000002
[87397.909081] sd 1:0:2:0: alua: port group 00 state A non-preferred supports TolUsNA
[87945.133985] hpsa 0000:84:00.0: CDB 880000000001985e14b0000002000000 was aborted with status 0x0
[87945.555234] hpsa 0000:84:00.0: CDB 880000000001985e16b0000002000000 was aborted with status 0x0
[87947.499349] device-mapper: multipath: Failing path 65:80.
[87947.777697] sd 1:0:1:0: [sdo] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[87948.101420] sd 1:0:3:0: Parameters changed
[87948.101454] sd 1:0:4:0: Parameters changed
[87948.549230] sd 1:0:1:0: [sdo] Sense Key : Illegal Request [current]
[87948.856690] sd 1:0:1:0: [sdo] Add. Sense: Logical unit not supported
[87949.164304] sd 1:0:1:0: [sdo] CDB: Read(16) 88 00 00 00 00 34 10 cc cc 00 00 00 00 08 00 00
[87949.568990] blk_update_request: I/O error, dev sdo, sector 223620156416
[87949.889148] device-mapper: multipath: Failing path 8:224.
[87950.150840] sd 1:0:2:0: [sdp] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[87950.525757] sd 1:0:2:0: [sdp] Sense Key : Illegal Request [current]
[87950.833541] sd 1:0:2:0: [sdp] Add. Sense: Logical unit not supported
[87951.141320] sd 1:0:2:0: [sdp] CDB: Read(16) 88 00 00 00 00 34 10 cc cc 00 00 00 00 08 00 00
[87951.547279] blk_update_request: I/O error, dev sdp, sector 223620156416
[87951.869052] device-mapper: multipath: Failing path 8:240.
[87952.131972] sd 2:0:7:0: alua: rtpg failed with 8000002
[87952.381715] sd 2:0:7:0: alua: rtpg sense code 05/25/00
[87952.631811] device-mapper: multipath: Failing path 65:64.
[87954.282018] hpsa 0000:84:00.0: Acknowledging event: 0x80000012 (HP SSD Smart Path configuration change)
[87954.739264] hpsa 0000:88:00.0: Acknowledging event: 0x80000012 (HP SSD Smart Path configuration change)
[87955.204886] hpsa 0000:84:00.0: removed scsi 1:0:1:0: Direct-Access NETAPP INF-01-00 PHYS DRV SSDSmartPathCap- En- Exp=1 qd=58
[87955.852274] hpsa 0000:84:00.0: removed scsi 1:0:2:0: Direct-Access NETAPP INF-01-00 PHYS DRV SSDSmartPathCap- En- Exp=1 qd=58
[87956.498097] hpsa 0000:84:00.0: replaced scsi 1:0:3:0: Direct-Access NETAPP INF-01-00 PHYS DRV SSDSmartPathCap- En- Exp=1 qd=58
[87957.143959] hpsa 0000:84:00.0: replaced scsi 1:0:4:0: Direct-Access NETAPP INF-01-00 PHYS DRV SSDSmartPathCap- En- Exp=1 qd=58
[87957.797809] hpsa 0000:88:00.0: removed scsi 2:0:7:0: Direct-Access NETAPP INF-01-00 PHYS DRV SSDSmartPathCap- En- Exp=1 qd=58
[87958.442633] hpsa 0000:88:00.0: removed scsi 2:0:8:0: Direct-Access NETAPP INF-01-00 PHYS DRV SSDSmartPathCap- En- Exp=1 qd=58
[87959.088123] sd 1:0:1:0: [sdo] Synchronizing SCSI cache
[87959.336990] sd 1:0:1:0: [sdo] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[87960.087922] sd 2:0:7:0: [sdu] Synchronizing SCSI cache
[87960.338126] sd 2:0:7:0: [sdu] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[87961.832018] sd 2:0:8:0: [sdv] Synchronizing SCSI cache
[87962.080709] sd 2:0:8:0: [sdv] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
HP Smart Array Firmware update
[root@t3nfs02 hp-firmware-smartarray-ea3138d8e8-3.52-1.1]# ./hpsetup
Supplemental Update / Online ROM Flash Component for Linux (x64) - Smart Array H240ar, H240nr, H240, H241, H244br, P240nr, P244br, P246br, P440ar, P440, P441, P542D, P741m, P840, P840ar, and P841 (3.52), searching...
1) Smart Array P440 Smart Array P440 in Slot 3 (3.00)
2) Smart Array P841 Smart Array P841 in Slot 4 (4.52)
3) Smart Array P841 Smart Array P841 in Slot 5 (4.52)
Select which devices to flash [#,#-#,(A)ll,(N)one]> 1
Flashing Smart Array P440 in Slot 3 [ 3.00 -> 3.52 ]
Deferred flashes will be performed on next system reboot
============ Summary ============
Smart Component Finished
Summary Messages
================
User opted to not flash 2 devices
Reboot needed to activate 1 new FW image
Exit Status: 1
Deferred flashes will be performed on next system reboot
A reboot is required to complete update.
[root@t3nfs02 hp-firmware-smartarray-ea3138d8e8-3.52-1.1]#