Solved: all vx* command hung

raunaz · ‎02-21-2012

Hi All,
I'm running vcs and vxfs on solaris 10 with DMP. Recently we had an issue which all vx* command were hung.
It cause the several service group was unable to start. Not much thing i can check because all vx* command is not responding.

From /var/adm/messages, the repeating error comes out. It comes from all disk.

Feb 22 03:05:15 xxxxxxxxxxx scsi: [ID 107833 kern.warning] WARNING: /pci@13,700000/SUNW,qlc@0/fp@0,0/ssd@w50060e80056f1178,28 (ssd146):

Feb 22 03:05:15 xxxxxxxxxxx i/o to invalid geometry

Feb 22 03:05:15 xxxxxxxxxxx scsi: [ID 107833 kern.warning] WARNING: /pci@3,700000/SUNW,qlc@0/fp@0,0/ssd@w50060e8015320c08,a (ssd75):

Feb 22 03:05:15 xxxxxxxxxxx reservation conflict

Feb 22 03:05:15 xxxxxxxxxxx scsi: [ID 107833 kern.warning] WARNING: /pci@13,700000/SUNW,qlc@0/fp@0,0/ssd@w50060e80056f1178,28 (ssd146):

Feb 22 03:05:15 xxxxxxxxxxx reservation conflict

from /var/adm/vx/dmpevents.log, continous error goes out.

Wed Feb 22 03:05:21.350: I/O retry(4) on Path c1t50060E80056F1168d40s2 belonging to Dmpnode c1t50060E80056F1168d40s2

Wed Feb 22 03:05:21.709: I/O error occured on Path c1t50060E80056F1168d40s2 belonging to Dmpnode c1t50060E80056F1168d40s2

Wed Feb 22 03:05:21.710: I/O analysis done as DMP_PATH_OKAY on Path c1t50060E80056F1168d40s2 belonging to Dmpnode c1t50060E80056F1168d40s2

Wed Feb 22 03:05:21.710: I/O retry(4) on Path c1t50060E80056F1168d40s2 belonging to Dmpnode c1t50060E80056F1168d40s2

Wed Feb 22 03:05:21.949: I/O error occured on Path c1t50060E80056F1168d40s2 belonging to Dmpnode c1t50060E80056F1168d40s2

Wed Feb 22 03:05:21.951: I/O analysis done as DMP_PATH_OKAY on Path c1t50060E80056F1168d40s2 belonging to Dmpnode c1t50060E80056F1168d40s2

Wed Feb 22 03:05:21.951: I/O retry(3) on Path c1t50060E80056F1168d40s2 belonging to Dmpnode c1t50060E80056F1168d40s2

The version detail as below:-

# modinfo | grep vx

36 1354270 4db00 324 1 vxdmp (VxVM 5.1RP2 DMP Driver)

38 7be00000 1fabe8 325 1 vxio (VxVM 5.1RP2 I/O driver)

40 7bfdc660 d40 326 1 vxspec (VxVM 5.1RP2 control/status driv)

231 7aff3180 be8 327 1 vxportal (VxFS 5.1_RP2.f portal driver)

232 7a600000 1cba60 21 1 vxfs (VxFS 5.1_RP2.f SunOS 5.10)

235 7aace000 64ce8 331 1 vxfen (VRTS Fence 5.1RP2)

# cat /etc/vxfenmode|grep -v "#"

vxfen_mode=scsi3

scsi3_disk_policy=dmp

Please help how can i solve the issue.

thanks

Gaurav_S · ‎02-22-2012

Hello,

1. which storage is connected to server ?

2. are you sure that storage connectivity is all OK & no errors are seen on storage or fabric switch ?

3. when u restarted the node, did u face the problem immediately or you were able to run commands for sometime ?

4. Is this a new config or was working before ?

also paste following outputs (if commands work for sometime):

# ps -ef | grep -i vxconfigd

# vxdmpadm listenclosure all

# vxdmpadm listctlr all

#

Usually these issues I have seen because of SAN connectivity in background. If still you are unable to diagnose, you might need to enable additional logs by putting vxconfigd in debug mode (vxdctl debug 9) if command works .... or if issue is happening with start of server, you will need to edit the rc file for vxconfigd startup to enable extra logging.

output may be helpful by support to analyze.

Gaurav

View solution in original post

raunaz · ‎02-21-2012

i have check the fc, looks fine. I have tried to restart vxconfigd. uisng vxconfigd -k. But after that the services didnt start. I have rebooted the server. After a while the same problem comes again.

bash-3.00# fcinfo hba-port

HBA Port WWN: 2100001b329c91b3

OS Device Name: /dev/cfg/c1

Manufacturer: QLogic Corp.

Model: 375-3355-02

Firmware Version: 05.01.02

FCode/BIOS Version: BIOS: 2.02; fcode: 2.01; EFI: 2.00;

Serial Number: 0402R00-1003805198

Driver Name: qlc

Driver Version: 20090929-2.32

Type: N-port

State: online

Supported Speeds: 1Gb 2Gb 4Gb

Current Speed: 4Gb

Node WWN: 2000001b329c91b3

HBA Port WWN: 2100001b329c57b5

OS Device Name: /dev/cfg/c3