Tags:
create new tag
view all tags

IssueSolarisPatchFail1

Symptoms

Summary: Machine unbootable due to patch 137138-09

Occurrences

At what times did this problem occur (used to estimate frequency):

2008-12-19

Observations

On two x4500 thumpers, applying the patch 137138-09 after reboot (or singel user mode) resulted in a corrupt kernel and a corrupted filesystem.

The last lines from t3fs04 for the update process were

Installing updatesInstalling update 127128-11 Succeeded
Installing update 137138-09
I had initiated a "init 0" on the system console as recommended by the update process. A remote ssh session had remained alive in both cases. The redirected system console stopped working. The machine could not be shut down in a clean way. Needed a forceful shutdown! I had made sure that no smpatch related processes were running any more (and waited 2 hours in the second case). The machine was not able to boot and the kernel messages pointed to unresolved symbols! What an utter mess.

Solution or Workaround

Sun themselves have issued some warnings about this patch. They document problems for two system configurations, but ours as well as others on the net have experienced problems with other setups. The issue has been known since November, but unbelievably, SUN has not pulled this patch back or fixed it.

I tried some tips from patch 137138-09 discussion:

Boot in failsafe mode (From the similarly configured machine t3fs01 I saw that the boot device is c5t0d0p0)

# mkdir /var/tmp/mnt
# mount -F ufs /dev/dsk/c5t0d0s0 /var/tmp/mnt
# bootadm update-archive -R /var/tmp/mnt
   Creating boot_archive for /tmp/root/var/tmp/mnt
   updating /tmp/root/var/tmp/mnt/platform/i86pc/boot_archive

#sync
#umount /var/tmp/mnt
#reboot              # not into failsafe, but into normal kernel
  NOTICE: /: unexpected free inode 99933, run fsck(1M) -o f
  WARNING: /: unexpected allocated inode 104911, run fsck(1M) -o f
I shut the system down and left. Next day I discovered that it had been hanging all the time at the boot prompt. Booting of the system failed now:
root (hd0,0,a)
 Filesystem type is ufs, partition type 0xbf
kernel /platform/i86pc/multiboot
   [Multiboot-elf, <0x1000000:0x141eb:0x128f5>, shtab=0x1027258, entry=0x100000
0]
module /platform/i86pc/boot_archive
  Error 28: Selected item cannot fit into memory
   Booting 'Solaris 10 11/06 s10x_u3wos_10 X86'

I booted in failsafe mode and did an fsck, which took quite some time with lots of messages

# fsck -y -F ufs /dev/dsk/c5t0d0s0
# reboot

trying to boot into the normal system at the boot prompt froze the console screen the boot image seems to have been destroyed again... ok

#mount -F ufs /dev/dsk/c5t0d0s0 /mnt
#bootadm update-archive -R /mnt
  Creating boot_archive for /mnt
  updating /mnt/platform/i86pc/boot_archive
#sync
#reboot     # .... the system came up again with some errors
  SunOS Release 5.10 Version Generic_137138-09 64-bit
  Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
  Use is subject to license terms.
  e1000g0: DL_BIND_REQ failed: DL_SYSERR (errno 16)
                                                 e1000g0: DL_UNBIND_REQ failed: DL_OUTSTATE
                                                                                           Failed to plumb IPv4 interface(s): e1000g0

again booted in failsafe mode, did same fsck multiple times until no more errors showed up

# fsck -y -F ufs /dev/dsk/c5t0d0s0
..
# mount -F ufs /dev/dsk/c5t0d0s0 /mnt
# bootadm update-archive -R /mnt

panic[cpu3]/thread=bcda5800: alloccgblk: can't find blk in cyl, pos:0, i:380, fs:/mnt bno: 300


ae5b7b54 genunix:vcmn_err+13 (3, feca0900, ae5b7b)
ae5b7b74 ufs:real_panic_v+47 (0, feca0900, ae5b7b)
ae5b7b9c ufs:ufs_fault_v+19f (bc652f00, feca0900,)
ae5b7bb0 ufs:ufs_fault+12 (bc652f00, feca0900,)
ae5b7c08 ufs:alloccgblk+28f (b3fd4500, bc6f7000,)
ae5b7c50 ufs:alloccg+3f3 (bc75ab18, 11, ce320)
ae5b7c7c ufs:hashalloc+2b (bc75ab18, 11, ce320)
ae5b7cbc ufs:alloc+120 (bc75ab18, ce320, 20)
ae5b7dac ufs:bmap_write+a9a (bc75ab18, 8000, 0, )
ae5b7e68 ufs:wrip+397 (bc75ab18, ae5b7f3c,)
ae5b7ecc ufs:ufs_write+492 (ae2290c0, ae5b7f3c,)
ae5b7f04 genunix:fop_write+2a (ae2290c0, ae5b7f3c,)
ae5b7f84 genunix:write+29a (4, 80840f8, 148cc, )

syncing file systems... [1] 1 [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] done (not all i/o completed)
skipping system dump - no dump device configured
rebooting...

I again booted into failsafe and did the fsck, mount, bootarchive sequence again. Now the bootarchive writing worked. The system came up halfway with some network related errors as before, but again, I never got a login prompt.

I was not able to save the installation.

-- DerekFeichtinger - 20 Dec 2008

IssueForm
Affected Service Solaris System
Symptom summary Machine unbootable due to patch 137138-09
Reason Understood yes
Solution Exists no
Obsolete yes
Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r3 - 2010-01-07 - DerekFeichtinger
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback