cancel
Showing results for 
Search instead for 
Did you mean: 

Server crashing due to inconsistency in opt (managed by VxvM)

Socrates
Level 2

We have a Sun Server running Solaris 10 and Veritas Cluster Server. The RAID Volumes in the Server (/ , swap, opt, var, usr) are managed by VxVm and UFS is grown on all these volumes.

Lately the system has been crashing due to an inconsistency in the opt filesystem. Upon reboot we did a fsck on the the opt multiple times and booted the system to multiuser mode. But again the system is crashing once the cluster is ok. The following is the panic message:-

 

panic[cpu1]/thread=3000d19a6c0: alloccgblk: can't find blk in cyl, pos:0, i:377, fs:/opt bno: 300
 
 
000002a102c50fb0 ufs:real_panic_v+60 (0, 19017f8, 2a102c51250, 30003bea000, 0, 600080e7d40)
  %l0-3: 000006000832a000 0000000000090000 000006000e5d64c0 0000000000000300
  %l4-7: 0000000000000180 0000000000000000 0000000000000064 0000000001826c00
000002a102c51060 ufs:ufs_fault_v+c8 (600085c7180, 19017f8, 2a102c51250, 6000d957648, 60006b2a2a8, 0)
  %l0-3: 000006000832a000 0000000000090000 000006000e5d64c0 0000000000000300
  %l4-7: 0000000000000180 0000000000000000 0000060006b2a200 0000000000000000
000002a102c51110 ufs:ufs_fault+1c (600085c7180, 19017f8, 0, 179, 6000832a0d4, 300)
  %l0-3: 000006000832a000 0000000000090000 000006000e5d64c0 0000000000000300
  %l4-7: 0000000000000180 0000000000000479 0000000000000179 000006000832a560
000002a102c511c0 ufs:alloccgblk+4c8 (1901400, 6000e5d6000, 0, 6000d957648, 2188, 0)
  %l0-3: 000006000832a000 0000000000090000 000006000e5d64c0 0000000000000300
  %l4-7: 0000000000000180 0000000000000479 0000000000000179 000006000832a560
000002a102c51270 ufs:alloccg+144 (90000, 60006b2a2a8, 662188, 2000, 90255, 6000e5d64c0)
  %l0-3: 000006000e5d6000 000006000d957648 000006000832a2d8 0000000000000880
  %l4-7: 0000000000000088 0000060006b2a200 000006000832a000 0000000000090255
000002a102c51320 ufs:hashalloc+24 (6000e8db878, 88, 662188, 2000, 122e8d0, 2a102c51480)
  %l0-3: 0000060006b2a200 000006000e8db878 000006000832a000 0000000000000003
  %l4-7: 0000060006b2a200 0000000000002000 0000000000000088 0000000000000088
000002a102c513d0 ufs:alloc+128 (0, 662188, 34d4f00, 2a102c51690, 600004040b8, 6000832a000)
  %l0-3: 0000060006b2a200 000006000e8db878 0000000001e6a130 0000000000000003
  %l4-7: 0000000000000000 0000000000002000 0000000000002000 0000000000000010
000002a102c51490 ufs:bmap_write+c40 (0, 2000, 2a102c515e8, 10, 0, 6000e8db878)
  %l0-3: 0000060006b2a200 0000000000000000 0000000000662188 000002a102c515e8
  %l4-7: 000000000000001c 0000000000661f48 0000000000000007 000006000d237d10
000002a102c516a0 ufs:wrip+448 (0, 2a102c51a98, ffffffffff, 2000, 6000e8db878, 8000)
  %l0-3: 0000000000026000 0000000000000001 0000000000000000 0000000000000000
  %l4-7: 0000060006b2a2a8 0000000000028000 0000000000000000 0000000000002000
000002a102c51810 ufs:ufs_write+580 (6000e8cdb80, 2a102c51a98, 8, 60006b2a248, 1, 6000e8db878)
  %l0-3: 000006000e8db898 000006000e8db958 000006000e8db960 0000000000000001
  %l4-7: 00000000019004f4 000006000e8db9b8 0000060006b2a200 0000000000000000
000002a102c51930 genunix:fop_write+20 (6000e8cdb80, 2a102c51a98, 8, 600004040b8, 0, 123ed74)
  %l0-3: 0000000000002000 000006000e8cdb80 0000000000000000 000000000104db10
  %l4-7: 0000000000002000 0000000000026000 0000000000000008 000000000000210a
000002a102c519e0 genunix:write+268 (1, 8058, 600155cb008, 2000, 210a, 1)
  %l0-3: 0000000000000000 000006000e8cdb80 0000000000000000 000000000104db10
  %l4-7: 0000000000002000 0000000000026000 0000000000000008 000000000000210a
 
syncing file systems... [1] 34 [1] 28 [1] 5 [1] 5 [1] 5 [1] 5 [1] 5 [1] 5 [1] 5 [1] 5 [1] 5 [1] 5 [1] 5 [1] 5 [1] 5 [1] 5 [1] 5 [1] 5 [1] 5 [1] 5 [1] 5 [1] 5 [1] 5 done (not all i/o completed)
 
From the vxprint -th output i see some parameter of the 2 plexes of opt has some different value as shown below:-
 
dm rootdisk     c0t0d0s2     auto     20351    143328960 -
dm rootmirror   c0t1d0s2     auto     9919     143328960 -
 
v  opt          -            ENABLED  SYNC     110796288 ROUND    -        fsgen
pl opt-01       opt          ENABLED  ACTIVE   110796288 CONCAT   -        RW
sd rootdisk-03  opt-01       rootdisk 32532672 110796288 0        c0t0d0   ENA
pl opt-02       opt          ENABLED  ACTIVE   110796288 CONCAT   -        RW
sd rootmirror-05 opt-02      rootmirror 32512320 110796288 0      c0t1d0   ENA
 
 
Please help me with a fix. 
 
 
5 REPLIES 5

arangari
Level 5

what is the error given by VCS? 

Socrates
Level 2

Hi Amit

            I dont see any error coming from VCS. But after we did   fsck on opt and repaired it, and when VCS was started few resource wasnt online. But the problem is hanging of the system. Everytime the OPT goes to "needs sync" state. 

If you require any specific output i can produce immediately. As you can see below its unable to read from the opt and it crashes. And opt is managed by VxVm with UFS grown over it. 

 

panic[cpu1]/thread=3000d19a6c0: alloccgblk: can't find blk in cyl, pos:0, i:377, fs:/opt bno: 300

Socrates
Level 2

Also i forgot to mention that there is no scsi errors or anything from the internal disks, please see the iostat output:-

 

 

sd1       Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: FUJITSU  Product: MAY2073RCSUN72G  Revision: 0501 Serial No: 0727S0C038 
Size: 73.40GB <73400057856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 
sd2       Soft Errors: 1 Hard Errors: 0 Transport Errors: 22 
Vendor: MATSHITA Product: CD-RW  CW-8124   Revision: DZ15 Serial No:  
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 1 Predictive Failure Analysis: 0 
sd3       Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: FUJITSU  Product: MAY2073RCSUN72G  Revision: 0501 Serial No: 0727S0C0FD 
Size: 73.40GB <73400057856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 

g_lee
Level 6

It sounds like this is a server crash issue (ie: the root issue is with the /opt ufs filesystem, which is on a VxVM volume, not with VCS itself) - moving to from Cluster Server to Storage Foundation forum (although from the messages provided so far, it appears the error may be related to the ufs fs rather than the volume)

For the server crash issue - do you have crash dumps enabled? If so, please provide the following mdb output for more information about the crash:

# mdb -k unix.0 vmcore.0
> ::stack
> ::msgbuf
> ::panicinfo

refer to the mdb manpage for additional options.

Regarding the supposed opt size discrepancy - the sizes are consistent - with -t option, the length is the 6th field (as seen in the header key)

V  NAME         RVG/VSET/CO  KSTATE   STATE    LENGTH   READPOL   PREFPLEX UTYPE
PL NAME         VOLUME       KSTATE   STATE    LENGTH   LAYOUT    NCOL/WID MODE
SD NAME         PLEX         DISK     DISKOFFS LENGTH   [COL/]OFF DEVICE   MODE

so based on your output:

dm rootdisk     c0t0d0s2     auto     20351    143328960 -
dm rootmirror   c0t1d0s2     auto     9919     143328960 -
 
v  opt          -            ENABLED  SYNC     110796288 ROUND    -        fsgen
pl opt-01       opt          ENABLED  ACTIVE   110796288 CONCAT   -        RW
sd rootdisk-03  opt-01       rootdisk 32532672 110796288 0        c0t0d0   ENA
pl opt-02       opt          ENABLED  ACTIVE   110796288 CONCAT   -        RW
sd rootmirror-05 opt-02      rootmirror 32512320 110796288 0      c0t1d0   ENA
 
the length of the volume, plexes and subdisks is consistent (all are 110796288). The difference you mentioned is the subdisk diskoffset which is the device offset in sectors (ie: where the subdisk starts on the device) - this can be due to various factors such as different disk geometry / different options used at disk setup / volumes created in different order - it would not cause a panic/issue with the volume.

Marianne
Level 6
Partner    VIP    Accredited Certified

"UFS is grown on all these volumes."

Not a good idea and not supported.

https://sort.symantec.com/public/documents/sf/5.0/solaris/html/vxvm_admin/ag_ch_disks_vm23.html

You cannot grow or shrink any volume (rootvol, usrvol, varvol, optvol, swapvol, and so on) that is associated with an encapsulated root disk. This is because these volumes map to physical partitions on the disk, and these partitions must be contiguous..