cancel
Showing results for 
Search instead for 
Did you mean: 

all vx* command hung

raunaz
Level 4
Certified

Hi All, 
I'm running vcs and vxfs on solaris 10 with DMP. Recently we had an issue which all vx* command were hung.
It cause the several service group was unable to start. Not much thing i can check because all vx* command is not responding.

From /var/adm/messages, the repeating error comes out. It comes from all disk. 


Feb 22 03:05:15 xxxxxxxxxxx scsi: [ID 107833 kern.warning] WARNING: /pci@13,700000/SUNW,qlc@0/fp@0,0/ssd@w50060e80056f1178,28 (ssd146):
Feb 22 03:05:15 xxxxxxxxxxx    i/o to invalid geometry
Feb 22 03:05:15 xxxxxxxxxxx scsi: [ID 107833 kern.warning] WARNING: /pci@3,700000/SUNW,qlc@0/fp@0,0/ssd@w50060e8015320c08,a (ssd75):
Feb 22 03:05:15 xxxxxxxxxxx     reservation conflict
Feb 22 03:05:15 xxxxxxxxxxx scsi: [ID 107833 kern.warning] WARNING: /pci@13,700000/SUNW,qlc@0/fp@0,0/ssd@w50060e80056f1178,28 (ssd146):
Feb 22 03:05:15 xxxxxxxxxxx     reservation conflict
 
from /var/adm/vx/dmpevents.log, continous error goes out.
Wed Feb 22 03:05:21.350: I/O retry(4) on Path c1t50060E80056F1168d40s2 belonging to Dmpnode c1t50060E80056F1168d40s2
Wed Feb 22 03:05:21.709: I/O error occured on Path c1t50060E80056F1168d40s2 belonging to Dmpnode c1t50060E80056F1168d40s2
Wed Feb 22 03:05:21.710: I/O analysis done as DMP_PATH_OKAY on Path c1t50060E80056F1168d40s2 belonging to Dmpnode c1t50060E80056F1168d40s2
Wed Feb 22 03:05:21.710: I/O retry(4) on Path c1t50060E80056F1168d40s2 belonging to Dmpnode c1t50060E80056F1168d40s2
Wed Feb 22 03:05:21.949: I/O error occured on Path c1t50060E80056F1168d40s2 belonging to Dmpnode c1t50060E80056F1168d40s2
Wed Feb 22 03:05:21.951: I/O analysis done as DMP_PATH_OKAY on Path c1t50060E80056F1168d40s2 belonging to Dmpnode c1t50060E80056F1168d40s2
Wed Feb 22 03:05:21.951: I/O retry(3) on Path c1t50060E80056F1168d40s2 belonging to Dmpnode c1t50060E80056F1168d40s2
 

The version detail as below:-

 

# modinfo | grep vx
 36  1354270  4db00 324   1  vxdmp (VxVM 5.1RP2 DMP Driver)
 38 7be00000 1fabe8 325   1  vxio (VxVM 5.1RP2 I/O driver)
 40 7bfdc660    d40 326   1  vxspec (VxVM 5.1RP2 control/status driv)
231 7aff3180    be8 327   1  vxportal (VxFS 5.1_RP2.f portal driver)
232 7a600000 1cba60  21   1  vxfs (VxFS 5.1_RP2.f SunOS 5.10)
235 7aace000  64ce8 331   1  vxfen (VRTS Fence 5.1RP2)
# cat /etc/vxfenmode|grep -v "#"
vxfen_mode=scsi3
scsi3_disk_policy=dmp
 
Please help how can i solve the issue.
thanks
1 ACCEPTED SOLUTION

Accepted Solutions

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hello,

1. which storage is connected to server ?

2. are you sure that storage connectivity is all OK & no errors are seen on storage or fabric switch ?

3. when u restarted the node, did u face the problem immediately or you were able to run commands for sometime ?

4. Is this a new config or was working before ?

also paste following outputs (if commands work for sometime):

# ps -ef | grep -i vxconfigd

# vxdmpadm listenclosure all

# vxdmpadm listctlr all

#
 

Usually these issues I have seen because of SAN connectivity in background. If still you are unable to diagnose, you might need to enable additional logs by putting vxconfigd in debug mode (vxdctl debug 9) if command works .... or if issue is happening with start of server, you will need to edit the rc file for vxconfigd startup to enable extra logging.

output may be helpful by support to analyze.

 

Gaurav

View solution in original post

2 REPLIES 2

raunaz
Level 4
Certified

i have check the fc, looks fine. I have tried to restart vxconfigd. uisng vxconfigd -k. But after that the services didnt start. I have rebooted the server. After a while the same problem comes again. 

 

bash-3.00# fcinfo hba-port
HBA Port WWN: 2100001b329c91b3
        OS Device Name: /dev/cfg/c1
        Manufacturer: QLogic Corp.
        Model: 375-3355-02
        Firmware Version: 05.01.02
        FCode/BIOS Version:  BIOS: 2.02; fcode: 2.01; EFI: 2.00;
        Serial Number: 0402R00-1003805198
        Driver Name: qlc
        Driver Version: 20090929-2.32
        Type: N-port
        State: online
        Supported Speeds: 1Gb 2Gb 4Gb
        Current Speed: 4Gb
        Node WWN: 2000001b329c91b3
HBA Port WWN: 2100001b329c57b5
        OS Device Name: /dev/cfg/c3
        Manufacturer: QLogic Corp.
        Model: 375-3355-02
        Firmware Version: 05.01.02
        FCode/BIOS Version:  BIOS: 2.02; fcode: 2.01; EFI: 2.00;
        Serial Number: 0402R00-1003805318
        Driver Name: qlc
        Driver Version: 20090929-2.32
        Type: N-port
        State: online
        Supported Speeds: 1Gb 2Gb 4Gb
        Current Speed: 4Gb
        Node WWN: 2000001b329c57b5

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hello,

1. which storage is connected to server ?

2. are you sure that storage connectivity is all OK & no errors are seen on storage or fabric switch ?

3. when u restarted the node, did u face the problem immediately or you were able to run commands for sometime ?

4. Is this a new config or was working before ?

also paste following outputs (if commands work for sometime):

# ps -ef | grep -i vxconfigd

# vxdmpadm listenclosure all

# vxdmpadm listctlr all

#
 

Usually these issues I have seen because of SAN connectivity in background. If still you are unable to diagnose, you might need to enable additional logs by putting vxconfigd in debug mode (vxdctl debug 9) if command works .... or if issue is happening with start of server, you will need to edit the rc file for vxconfigd startup to enable extra logging.

output may be helpful by support to analyze.

 

Gaurav