If you find that a disk has died on an OSS, here is the procedure:
1st, you need to find out which disk is the one that went bad:
[root@oss31 ~]# grep -C2 _ /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md17 : active raid6 sdcf[0] sdco[9] sdcn[8] sdcm[7] sdcl[6] sdck[5] sdcj[10](F) sdci[3] sdch[2] sdcg[1]
3907091456 blocks level 6, 128k chunk, algorithm 2 [10/9] [UUUU_UUUUU]
in: 24771488 reads, 38246047 writes; out: 3561687537 reads, 1769535592 writes
2236207063 in raid5d, 131583 out of stripes, 3545221835 handle called
[root@oss31 ~]# fdisk -l /dev/sdcj
[root@oss31 ~]#
Here, we see that sdcj is not working, the system knows nothing about it any more.
(from /var/log/messages)
May 14 13:27:44 oss31 kernel: scsi 2:0:39:0: rejecting I/O to dead device
May 14 13:27:44 oss31 last message repeated 2 times
May 14 13:27:44 oss31 kernel: raid5: Disk failure on sdcj, disabling device. Operation continuing on 9 devices
May 14 13:27:44 oss31 kernel: scsi 2:0:39:0: rejecting I/O to dead device
May 14 13:27:44 oss31 kernel: raid5:md17: read error not correctable (sector 821658128 on sdcj)
[root@oss31 ~]# mdadm --detail md17
mdadm: cannot open md17: No such file or directory
[root@oss31 ~]# mdadm --detail /dev/md17
/dev/md17:
Version : 0.90
Creation Time : Tue Mar 2 18:03:16 2010
Raid Level : raid6
Array Size : 3907091456 (3726.09 GiB 4000.86 GB)
Used Dev Size : 488386432 (465.76 GiB 500.11 GB)
Raid Devices : 10
Total Devices : 10
Preferred Minor : 17
Persistence : Superblock is persistent
Intent Bitmap : /lustre/scratch/bmp17/bitmap
Update Time : Tue May 18 10:19:07 2010
State : clean, degraded
Active Devices : 9
Working Devices : 9
Failed Devices : 1
Spare Devices : 0
Chunk Size : 128K
UUID : 22affd18:9e22b048:7d19f379:8ed3dc17
Events : 0.419992
Number Major Minor RaidDevice State
0 69 48 0 active sync /dev/sdcf
1 69 64 1 active sync /dev/sdcg
2 69 80 2 active sync /dev/sdch
3 69 96 3 active sync /dev/sdci
4 0 0 4 removed
5 69 128 5 active sync /dev/sdck
6 69 144 6 active sync /dev/sdcl
7 69 160 7 active sync /dev/sdcm
8 69 176 8 active sync /dev/sdcn
9 69 192 9 active sync /dev/sdco
10 69 112 - faulty spare
Now, to make sure the system has truly disabled the disk, we need to run some more commands:
[root@oss31 ~]# mdadm -f /dev/md17 /dev/sdcj
mdadm: cannot find /dev/sdcj: No such file or directory
Since the system has disabled the device, mdadm is not aware of it anymore either.
Next, remove the faulted device from the array, since it didn't let go of it yet:
[root@oss31 ~]# mdadm /dev/md17 --remove failed
mdadm: hot removed 69:112
Now we need to make sure that it is truly disabled. This is done by echoing a string into /proc/scsi/scsi. First, however, you need to know the scsi address of the disk, which we can see in the above output in the messages log:
oss31 kernel: scsi 2:0:39:0: rejecting I/O to dead device
This means our command will look like this:
[root@oss31 ~]# echo "scsi remove-single-device" 2 0 39 0 > /proc/scsi/scsi
-bash: echo: write error: No such device or address
Now we can be confident that if we remove the disk, nothing ugly should happen.
Now, we physically replace the disk with a new one.
When you get back to your terminal, check out dmesg to see what the system thinks the new disk is:
[root@oss31 ~]# dmesg|tail -14
mptsas: ioc2: attaching sata device, channel 0, id 14, phy 14
Vendor: ATA Model: HITACHI HUA7250S Rev: AC4A
Type: Direct-Access ANSI SCSI revision: 05
SCSI device sdct: 976773168 512-byte hdwr sectors (500108 MB)
sdct: Write Protect is off
sdct: Mode Sense: 73 00 00 08
SCSI device sdct: drive cache: write through
SCSI device sdct: 976773168 512-byte hdwr sectors (500108 MB)
sdct: Write Protect is off
sdct: Mode Sense: 73 00 00 08
SCSI device sdct: drive cache: write through
sdct: unknown partition table
sd 2:0:50:0: Attached scsi disk sdct
sd 2:0:50:0: Attached scsi generic sg93 type 0
So now we know that the new disk is /dev/sdct. We don't reference the disks by their sd names, rather by their unique disk-ids that are provided by the disk manufacturer. Now we need to add the new disk to the original raid, as well as change the /etc/mdadm.conf.oss and /dev/mdadm.conf.local. This is done best in this manner:
[root@oss31 ~]# for i in /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJYZF
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6G6M6XF /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXK9MF
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJEGF /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJU7F
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXK5LF /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJ6BF
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJZPF /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXK0WF
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJX2F;do echo $i;ls -l $i;done
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJYZF
lrwxrwxrwx 1 root root 10 May 10 15:59 /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJYZF -> ../../sdcf
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6G6M6XF
lrwxrwxrwx 1 root root 10 May 10 15:59 /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6G6M6XF -> ../../sdcg
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXK9MF
lrwxrwxrwx 1 root root 10 May 10 15:59 /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXK9MF -> ../../sdch
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJEGF
lrwxrwxrwx 1 root root 10 May 10 15:59 /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJEGF -> ../../sdci
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJU7F
ls: /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJU7F: No such file or directory
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXK5LF
lrwxrwxrwx 1 root root 10 May 10 15:59 /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXK5LF -> ../../sdck
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJ6BF
lrwxrwxrwx 1 root root 10 May 10 15:59 /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJ6BF -> ../../sdcl
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJZPF
lrwxrwxrwx 1 root root 10 May 10 15:59 /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJZPF -> ../../sdcm
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXK0WF
lrwxrwxrwx 1 root root 10 May 10 15:59 /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXK0WF -> ../../sdcn
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJX2F
lrwxrwxrwx 1 root root 10 May 10 15:59 /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJX2F -> ../../sdco
The list in the for command is taken from the md17 DEVICE definition in the mdadm.conf.oss. So now we know that the old, bad disk was the one we can't see now, where it says 'No such file or directory' above - the string ending in
XJU7F. Now we will replace that device with the new disk id, which can be determined this way:
[root@oss31 ~]# ls -l /dev/disk/by-id/|grep sdct
lrwxrwxrwx 1 root root 10 May 18 10:56 scsi-SATA_HITACHI_HUA7250GTF402P6GS4Z4F -> ../../sdct
Now we want to replace the device in both of the mdadm conf files.
DO THIS ON BOTH OSS SERVERS - both mdadm.conf.local and mdam.conf.oss on the server it failed from, AND the mdadm.conf.oss on the OTHER OSS. DO NOT FAIL TO DO THIS!!!
OLD:
#/dev/md17
DEVICE dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJYZF /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6G6M6XF
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXK9MF /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJEGF
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJU7F /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXK5LF
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJ6BF /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJZPF
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXK0WF /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJX2F
NEW:
#/dev/md17
DEVICE /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJYZF /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6G6M6XF
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXK9MF /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJEGF
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GS4Z4F /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXK5LF
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJ6BF /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJZPF
/dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXK0WF /dev/disk/by-id/scsi-SATA_HITACHI_HUA7250GTF402P6GXJX2F
Now we will add the device to the array:
[root@oss31 ~]# mdadm /dev/md17 -a /dev/sdct
mdadm: added /dev/sdct
Next, we want to check out what is happening in /proc/mdstat
[root@oss31 ~]# grep recovery /proc/mdstat
[>....................] recovery = 2.2% (10960204/488386432) finish=126.8min speed=62742K/sec
There you go.
Fotis & Peter suggest as alternatives to the above, on the broken OSS do:
# ./RunmeWhenDiskHasBeenReplaced.sh
You should now run on both OSSs: ./RunmeWhenDiskHasBeenReplaced.sh scsi-SATA_HITACHI_HUA7250GTF402P6GXK0HF scsi-SATA_HITACHI_HUA7250GTF402P6GXJT9F
And just do what it tells you
--
JasonTemple - 2010-05-17