Solved: Odd problem with Sun Cluster and Veritas Volume Ma...

Bill_Ranck · ‎10-08-2010

As folks who have been helping with my questions in the past week or two will know, I am using Veritas mirroring to move Oracle RAC databases off of old SAN storage onto new SAN storage. The systems are 2 node Sun Clusters with Oracle RAC mounting raw VxVM disk groups.

I have been using "vxmirror -g <diskgroup> -a <newmedianame> &" which has worked well on 2 of the clusters. No problems were encountered. Now, on the final cluster it causes a very strange problem. The /dev/vx/rdsk/ocsrawdg/* and /dev/vx/dsk/ocsrawdg/* volume files disappear from one node or both when I do the mirror command or even a vxplex command. Oracle stops working, but vxdg list and vxprint -vg ocsrawdg both show the diskgroup and volumes to be enabled and active. The Veritas side shows nothing wrong. I can give more strange and contradictory symptoms.

Going into scsetup and using the "Synchronize volume information for a VxVM device group" option which issues the following command, "scconf -c -D name=ocsrawdg,sync" fixes the problem. All the volume names then show up again in the 2 /dev/vx directories.

I have not seen this on the other clusters. The only real difference is the one I'm working on now is a larger disk group with 5 LUNs in the new SAN storage (all the others had single LUNs) and it has a larger number of volumes in the disk group, 254, as compared to about 64 in the next largest. I thought these mirror processes took place in background and did not interfere with active and enabled volumes/groups. That is certainly how it worked on the other clusters. Oh, and I tried issuing a mirror for just a single volume with vxassist, but that caused the same problem.

Does anyone have any idea why this is happening, and more importantly, can I do a background mirroring process that won't kill the application?

Bill_Ranck · ‎10-14-2010

I wrote a short script to accomplish this task without taking down time.

for volname in $@; do
vxassist -g ocsrawdg -b mirror $volname ocsrawdisk1 ocsrawdisk2 ocsrawdisk3 ocsrawdisk4 ocsrawdisk5
scconf -c -D name=ocsrawdg,sync
vxtask -w 10 monitor
scconf -c -D name=ocsrawdg,sync
done

Basically, it just takes a list of all the volumes in the disk group and mirrors each one onto the new storage one at a time. As soon as the mirror process goes into background the scconf command syncs the disk group across the cluster. The real saving grace in this situation is the vxtask monitor function which will continue to run as long as the mirror is being built, then exits at the end, which gives me the perfect trigger to resync the disk group again, go back for the next volume in the diosk group, rinse and repeat until they are all done. It's time consuming, and the vxtask monitor is noisy, but the end users don't see any down time.

I will probably have to use a similar script when removing the plexes from the old storage, but that won't require waiting for the mirror to build. So, no need for the vxtask monitor.

View solution in original post

Marianne · ‎10-10-2010

Are you watching system resources while doing changes?

What is your SF version? There could be patches/hotfixes that we can look for if we know the current version.

Handy NetBackup Links

Bill_Ranck · ‎10-12-2010

What sort of system resources? Specifically, what should I be watching? Memory and swap seem fine.

SunOS nona 5.10 Generic_142900-15 sun4u sparc SUNW,Sun-Fire

vxlicrep says verison 4.1 for Volume Manager. I'm not sure how to extract more specific version info.

I did a little experiment this morning and found that issuing following command:

vxassist -g ocsrawdg -b mirror content_ifs_ctx_k_01_2049m.dbf ocsrawdisk1 ocsrawdisk2 ocsrawdisk3 ocsrawdisk4 ocsrawdisk5

caused all entries in /dev/vx/dsk/ocsrawdg on the other cluster node to apparently disappear. I then issued the cluster command "scconf -c -D name=ocsrawdg,sync" which lets the other node see everything again . . . until the mirror task finished. All the files disappeared again until a new scconf sync command. It appears to me that anything that does a plex command/change causes the problem. Things were fine during the time the mirroring task was running. Just the start and end seemed to cause trouble.

I was considering using a script to create the mirroring tasks and then issue an scconf immediately after each one, but that's a pretty kludgy work around.

Bill_Ranck · ‎10-14-2010

I wrote a short script to accomplish this task without taking down time.

for volname in $@; do
vxassist -g ocsrawdg -b mirror $volname ocsrawdisk1 ocsrawdisk2 ocsrawdisk3 ocsrawdisk4 ocsrawdisk5
scconf -c -D name=ocsrawdg,sync
vxtask -w 10 monitor
scconf -c -D name=ocsrawdg,sync
done

Basically, it just takes a list of all the volumes in the disk group and mirrors each one onto the new storage one at a time. As soon as the mirror process goes into background the scconf command syncs the disk group across the cluster. The real saving grace in this situation is the vxtask monitor function which will continue to run as long as the mirror is being built, then exits at the end, which gives me the perfect trigger to resync the disk group again, go back for the next volume in the diosk group, rinse and repeat until they are all done. It's time consuming, and the vxtask monitor is noisy, but the end users don't see any down time.

I will probably have to use a similar script when removing the plexes from the old storage, but that won't require waiting for the mirror to build. So, no need for the vxtask monitor.

VOX

Odd problem with Sun Cluster and Veritas Volume Manager