cancel
Showing results for 
Search instead for 
Did you mean: 

Existing disk is showing error in vxvm

kishored
Level 2

Hi Team,

 

One of the disk is showing error in vxdisk list command.The disk is allready member of one disk group and also the disk contain volumes.

How can i clear the error flag in the state column.

1 ACCEPTED SOLUTION

Accepted Solutions

Gaurav_S
Moderator
Moderator
   VIP    Certified

I believe you have IOFencing issue here .... that is the reason you are unable to write, & also fencing keys are inconsistent ...

You can register the keys manually but if possible, have a clean reboot of system after clearing the keys manually ... or atleast restart complete VCS after clearing the keys manually

so here is what I would say:

1. clear the IOFencing keys using "vxfenclearpre" command:

https://sort.symantec.com/public/documents/sf/5.0/solaris/manpages/vcs/vxfenclearpre_1m.html

2. shutdown all the applications in cluster

3. shutdown VCS  (hastop -all)

4. Shutdown Fencing (/etc/init.d/vxfen stop) ... need to run this on both the nodes.

5. Ensure fencing module is unloaded ( modinfo | grep -i vxfen )

6. Ensure from gabconfig that only port a exists in all the nodes ( gabconfig -a )

7. Restart IOFencing on both the nodes (/etc/init.d/vxfen start)

8. Start cluster one by one on both the nodes (hastart)

Above procedure should help clear the keys & you should get the access back to the disk ...

Gaurav
 

View solution in original post

13 REPLIES 13

CraigV
Moderator
Moderator
Partner    VIP    Accredited

...you've logged this in the wrong forum. What product is this and I will move it?

kishored
Level 2

Hi

 

This is related to veritas volume manager issue.

CraigV
Moderator
Moderator
Partner    VIP    Accredited

...should be in the correct forum now!

Yasuhisa_Ishika
Level 6
Partner Accredited Certified

What prlatform and what version of Storage Foundation are you using?
Also provide info about your OS, and output of following commands.

# vxdisk list
# vxprint -ht

Gaurav_S
Moderator
Moderator
   VIP    Certified

need more details to answer further..

1. how did issue happened ? anything you see in logs ?

2. Is it an internal disk or storage disk ?

3. OS version & vxvm version ?

4. paste following outputs:

# vxdisk list

# vxdisk -o alldgs -e list

# vxprint -qthg <diskgroup>

# modinfo | grep -i vx

 

Gaurav

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hi Kishore,

you had a chance to look in this ?

 

error state appears on a disk when volume manager is unable to read the label/disk geometry from the disk.. so the reason could be:

1. disk label has been lost

2. disk was replaced & new disk in place

If disk was in use & something happened to private region where vxvm is unable to read the disk private region, in this case, you should see disk going in "failed was" state in vxdisk list ..

In most of the case, you need to a bottoms up analysis & find what happened underneath to the disk .

 

Gaurav

raunaz
Level 4
Certified

Hi Gaurav, 

We also experience the same problem. disk label has been corrupted. Before this it was runnig fine. 
Right now vxvm is having performance issue and very slow respond when it is imported to vxvm. The bigger the load the slower it become. Please help on how we can resolve the issue. Tried to relabel back. Its getting error. 


Disk not labeled.  Label it now? y
Warning: error setting drive geometry.
Warning: error writing VTOC.
Warning: no backup labels
Write label failed

TonyGriffiths
Level 6
Employee Accredited Certified

Hi Raunaz

Your symptoms above indicate that the device may have issues with writes. Are there any entries in syslog from the disk driver for that disk (at the time of the re-label) ?

cheers

raunaz
Level 4
Certified

These are the errors.

 

Apr  4 08:13:51 xxxxxx vxdmp: [ID 443116 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 i/o error occured (errno=0x0) on dmpnode 324/0x2ba
 
BTW, we are seeing reservation conflict error continously.

Har-D
Level 4
Employee Certified

Hi Raunaz,

 

Are you talking about the same issue, we discussed about, in the following thread?:

https://www-secure.symantec.com/connect/forums/host-mode-option-xp24000-veritas-cluster-server

If yes, seems you'll need to erase the VTOC first, as it will not allow you to relabel unless you have good table of content.

 

BTW, we are seeing reservation conflict error continously.

>> Are you using fencing? If yes, could you check for any stale fence keys on the disk using vxfenadm (vxfenadm -s <disk_path>)

>> Do you also see any OS-SCSI errors?

>> Do these error (reservation conflict + vxdmp I/O error) occurs, when you try to label the disk

Har-D
Level 4
Employee Certified

Hi Kishore,

Please check the following :

1. # prtvtoc <error_disk_path> ; Run this on both /dev/rdsk and /dev/rdmp path of the error_disk

2. # dd if=<error_disk_path> of=/dev/null bs=1024 count=1024

     > While running the above command, do a tail -f /var/adm/messages from another term to see if any error occurs.

3. # vxdisk list <error_disk>, check if all the paths are enabled.

raunaz
Level 4
Certified

Hi Har-D, 

I haven't tried to relabel by erasing the VTOC, we will only do that as the last option. 

Yes, we use scsi-3 fencing. I'm not sure how to check stale fence. 

but i found it looks OK to me 

 

Reading SCSI Registration Keys...
 
Device Name: /dev/rdsk/c2t50060E80056F1178d27s2
Total Number Of Keys: 2
key[0]:
        [Numeric Format]:  67,86,67,83,0,0,0,0
        [Character Format]: CVCS
        [Node Format]: Cluster ID: unknown  Node ID: 2   Node Name: esesbuh0003
key[1]:
        [Numeric Format]:  67,86,67,83,0,0,0,0
        [Character Format]: CVCS
        [Node Format]: Cluster ID: unknown  Node ID: 2   Node Name: esesbuh0003
 
and some have no keys.
Reading SCSI Registration Keys...
 
Device Name: /dev/rdsk/c2t50060E80056F1178d20s2
Total Number Of Keys: 0
No keys...
 
FYI, we have rebooted the machine few time. But the error still coming. 
 
 
 

Gaurav_S
Moderator
Moderator
   VIP    Certified

I believe you have IOFencing issue here .... that is the reason you are unable to write, & also fencing keys are inconsistent ...

You can register the keys manually but if possible, have a clean reboot of system after clearing the keys manually ... or atleast restart complete VCS after clearing the keys manually

so here is what I would say:

1. clear the IOFencing keys using "vxfenclearpre" command:

https://sort.symantec.com/public/documents/sf/5.0/solaris/manpages/vcs/vxfenclearpre_1m.html

2. shutdown all the applications in cluster

3. shutdown VCS  (hastop -all)

4. Shutdown Fencing (/etc/init.d/vxfen stop) ... need to run this on both the nodes.

5. Ensure fencing module is unloaded ( modinfo | grep -i vxfen )

6. Ensure from gabconfig that only port a exists in all the nodes ( gabconfig -a )

7. Restart IOFencing on both the nodes (/etc/init.d/vxfen start)

8. Start cluster one by one on both the nodes (hastart)

Above procedure should help clear the keys & you should get the access back to the disk ...

Gaurav