Solved: I believe you have IOFencing

kishored · ‎03-05-2012

Hi Team,

One of the disk is showing error in vxdisk list command.The disk is allready member of one disk group and also the disk contain volumes.

How can i clear the error flag in the state column.

Gaurav_S · ‎04-05-2012

I believe you have IOFencing issue here .... that is the reason you are unable to write, & also fencing keys are inconsistent ...

You can register the keys manually but if possible, have a clean reboot of system after clearing the keys manually ... or atleast restart complete VCS after clearing the keys manually

so here is what I would say:

1. clear the IOFencing keys using "vxfenclearpre" command:

https://sort.symantec.com/public/documents/sf/5.0/solaris/manpages/vcs/vxfenclearpre_1m.html

2. shutdown all the applications in cluster

3. shutdown VCS (hastop -all)

4. Shutdown Fencing (/etc/init.d/vxfen stop) ... need to run this on both the nodes.

5. Ensure fencing module is unloaded ( modinfo | grep -i vxfen )

6. Ensure from gabconfig that only port a exists in all the nodes ( gabconfig -a )

7. Restart IOFencing on both the nodes (/etc/init.d/vxfen start)

8. Start cluster one by one on both the nodes (hastart)

Above procedure should help clear the keys & you should get the access back to the disk ...

Gaurav

View solution in original post

CraigV · ‎03-05-2012

...you've logged this in the wrong forum. What product is this and I will move it?

kishored · ‎03-05-2012

Hi

This is related to veritas volume manager issue.

CraigV · ‎03-05-2012

...should be in the correct forum now!

Yasuhisa_Ishika · ‎03-06-2012

What prlatform and what version of Storage Foundation are you using?
Also provide info about your OS, and output of following commands.

# vxdisk list
# vxprint -ht

Gaurav_S · ‎03-06-2012

need more details to answer further..

1. how did issue happened ? anything you see in logs ?

2. Is it an internal disk or storage disk ?

3. OS version & vxvm version ?

4. paste following outputs:

# vxdisk list

# vxdisk -o alldgs -e list

# vxprint -qthg <diskgroup>

# modinfo | grep -i vx

Gaurav

Gaurav_S · ‎03-30-2012

Hi Kishore,

you had a chance to look in this ?

error state appears on a disk when volume manager is unable to read the label/disk geometry from the disk.. so the reason could be:

1. disk label has been lost

2. disk was replaced & new disk in place

If disk was in use & something happened to private region where vxvm is unable to read the disk private region, in this case, you should see disk going in "failed was" state in vxdisk list ..

In most of the case, you need to a bottoms up analysis & find what happened underneath to the disk .

Gaurav

raunaz · ‎04-02-2012

Hi Gaurav,

We also experience the same problem. disk label has been corrupted. Before this it was runnig fine.
Right now vxvm is having performance issue and very slow respond when it is imported to vxvm. The bigger the load the slower it become. Please help on how we can resolve the issue. Tried to relabel back. Its getting error.

Disk not labeled. Label it now? y

Warning: error setting drive geometry.

Warning: error writing VTOC.

Warning: no backup labels

Write label failed

TonyGriffiths · ‎04-03-2012

Hi Raunaz

Your symptoms above indicate that the device may have issues with writes. Are there any entries in syslog from the disk driver for that disk (at the time of the re-label) ?

cheers

raunaz · ‎04-03-2012

These are the errors.

Apr 4 08:13:51 xxxxxx vxdmp: [ID 443116 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 i/o error occured (errno=0x0) on dmpnode 324/0x2ba

BTW, we are seeing reservation conflict error continously.

Har-D · ‎04-03-2012

Hi Raunaz,

Are you talking about the same issue, we discussed about, in the following thread?:

https://www-secure.symantec.com/connect/forums/host-mode-option-xp24000-veritas-cluster-server

If yes, seems you'll need to erase the VTOC first, as it will not allow you to relabel unless you have good table of content.

BTW, we are seeing reservation conflict error continously.

>> Are you using fencing? If yes, could you check for any stale fence keys on the disk using vxfenadm (vxfenadm -s <disk_path>)

>> Do you also see any OS-SCSI errors?

>> Do these error (reservation conflict + vxdmp I/O error) occurs, when you try to label the disk

Har-D · ‎04-03-2012

Hi Kishore,

Please check the following :

1. # prtvtoc <error_disk_path> ; Run this on both /dev/rdsk and /dev/rdmp path of the error_disk

2. # dd if=<error_disk_path> of=/dev/null bs=1024 count=1024

> While running the above command, do a tail -f /var/adm/messages from another term to see if any error occurs.

3. # vxdisk list <error_disk>, check if all the paths are enabled.

raunaz · ‎04-04-2012

Hi Har-D,

I haven't tried to relabel by erasing the VTOC, we will only do that as the last option.

Yes, we use scsi-3 fencing. I'm not sure how to check stale fence.

but i found it looks OK to me

Reading SCSI Registration Keys...

Device Name: /dev/rdsk/c2t50060E80056F1178d27s2

Total Number Of Keys: 2

key[0]:

[Numeric Format]: 67,86,67,83,0,0,0,0

[Character Format]: CVCS

[Node Format]: Cluster ID: unknown Node ID: 2 Node Name: esesbuh0003

key[1]:

[Numeric Format]: 67,86,67,83,0,0,0,0

[Character Format]: CVCS

[Node Format]: Cluster ID: unknown Node ID: 2 Node Name: esesbuh0003

and some have no keys.

Reading SCSI Registration Keys...

Device Name: /dev/rdsk/c2t50060E80056F1178d20s2

Total Number Of Keys: 0

No keys...

FYI, we have rebooted the machine few time. But the error still coming.

Gaurav_S · ‎04-05-2012

I believe you have IOFencing issue here .... that is the reason you are unable to write, & also fencing keys are inconsistent ...

You can register the keys manually but if possible, have a clean reboot of system after clearing the keys manually ... or atleast restart complete VCS after clearing the keys manually

so here is what I would say:

1. clear the IOFencing keys using "vxfenclearpre" command:

https://sort.symantec.com/public/documents/sf/5.0/solaris/manpages/vcs/vxfenclearpre_1m.html

2. shutdown all the applications in cluster

3. shutdown VCS (hastop -all)

4. Shutdown Fencing (/etc/init.d/vxfen stop) ... need to run this on both the nodes.

5. Ensure fencing module is unloaded ( modinfo | grep -i vxfen )

6. Ensure from gabconfig that only port a exists in all the nodes ( gabconfig -a )

7. Restart IOFencing on both the nodes (/etc/init.d/vxfen start)

8. Start cluster one by one on both the nodes (hastart)

Above procedure should help clear the keys & you should get the access back to the disk ...

Gaurav

VOX

Existing disk is showing error in vxvm