Forum Discussion

kishored's avatar
kishored
Level 2
13 years ago

Existing disk is showing error in vxvm

Hi Team,

 

One of the disk is showing error in vxdisk list command.The disk is allready member of one disk group and also the disk contain volumes.

How can i clear the error flag in the state column.

  • I believe you have IOFencing issue here .... that is the reason you are unable to write, & also fencing keys are inconsistent ...

    You can register the keys manually but if possible, have a clean reboot of system after clearing the keys manually ... or atleast restart complete VCS after clearing the keys manually

    so here is what I would say:

    1. clear the IOFencing keys using "vxfenclearpre" command:

    https://sort.symantec.com/public/documents/sf/5.0/solaris/manpages/vcs/vxfenclearpre_1m.html

    2. shutdown all the applications in cluster

    3. shutdown VCS  (hastop -all)

    4. Shutdown Fencing (/etc/init.d/vxfen stop) ... need to run this on both the nodes.

    5. Ensure fencing module is unloaded ( modinfo | grep -i vxfen )

    6. Ensure from gabconfig that only port a exists in all the nodes ( gabconfig -a )

    7. Restart IOFencing on both the nodes (/etc/init.d/vxfen start)

    8. Start cluster one by one on both the nodes (hastart)

    Above procedure should help clear the keys & you should get the access back to the disk ...

    Gaurav
     

13 Replies

  • Hi Kishore,

    Please check the following :

    1. # prtvtoc <error_disk_path> ; Run this on both /dev/rdsk and /dev/rdmp path of the error_disk

    2. # dd if=<error_disk_path> of=/dev/null bs=1024 count=1024

         > While running the above command, do a tail -f /var/adm/messages from another term to see if any error occurs.

    3. # vxdisk list <error_disk>, check if all the paths are enabled.

  • Hi Har-D, 

    I haven't tried to relabel by erasing the VTOC, we will only do that as the last option. 

    Yes, we use scsi-3 fencing. I'm not sure how to check stale fence. 

    but i found it looks OK to me 

     

    Reading SCSI Registration Keys...
     
    Device Name: /dev/rdsk/c2t50060E80056F1178d27s2
    Total Number Of Keys: 2
    key[0]:
            [Numeric Format]:  67,86,67,83,0,0,0,0
            [Character Format]: CVCS
            [Node Format]: Cluster ID: unknown  Node ID: 2   Node Name: esesbuh0003
    key[1]:
            [Numeric Format]:  67,86,67,83,0,0,0,0
            [Character Format]: CVCS
            [Node Format]: Cluster ID: unknown  Node ID: 2   Node Name: esesbuh0003
     
    and some have no keys.
    Reading SCSI Registration Keys...
     
    Device Name: /dev/rdsk/c2t50060E80056F1178d20s2
    Total Number Of Keys: 0
    No keys...
     
    FYI, we have rebooted the machine few time. But the error still coming. 
     
     
     
  • I believe you have IOFencing issue here .... that is the reason you are unable to write, & also fencing keys are inconsistent ...

    You can register the keys manually but if possible, have a clean reboot of system after clearing the keys manually ... or atleast restart complete VCS after clearing the keys manually

    so here is what I would say:

    1. clear the IOFencing keys using "vxfenclearpre" command:

    https://sort.symantec.com/public/documents/sf/5.0/solaris/manpages/vcs/vxfenclearpre_1m.html

    2. shutdown all the applications in cluster

    3. shutdown VCS  (hastop -all)

    4. Shutdown Fencing (/etc/init.d/vxfen stop) ... need to run this on both the nodes.

    5. Ensure fencing module is unloaded ( modinfo | grep -i vxfen )

    6. Ensure from gabconfig that only port a exists in all the nodes ( gabconfig -a )

    7. Restart IOFencing on both the nodes (/etc/init.d/vxfen start)

    8. Start cluster one by one on both the nodes (hastart)

    Above procedure should help clear the keys & you should get the access back to the disk ...

    Gaurav