Solved: Disk groups stoping

solom · ‎03-27-2014

Hi

I installed veritas 6.1 on redhat 6.4 64bit and vom 6.0 and created disk groups and volumes and then service groups failover and mounted the volumes in the service groups.

When i restarted node1 the service groups failover good to node2 and when i restarted node2 the failover working good and failover but if i tray to do that mounal switch some service groups failover and others not failover and the state for diskgroups in host stoping and the volume too .

Please help

Regards

Marianne · ‎04-01-2014

You seem to have 'nested' mounts:

MountPoint = "/traksec/meam/live/tcanl"

MountPoint = "/traksec/meam/live/tcanl/jrn/alt"

MountPoint = "/traksec/meam/live/tcanl/jrn/pri"

MountPoint = "/traksec/meam/live/tcanl/wij"

MountPoint = "/traksec/meam/live/tcanl/app"

This means that /traksec/meam/live/tcanl must be mounted before all the other filesystems can be mounted.

Same with service group offline - all other filesystems must be unmounted before /traksec/meam/live/tcanl can be unmounted/offlined.

You need more dependencies:

TRAKSECVOL-TCANL-JRNALT requires TRAKSECVOL-TCANL
TRAKSECVOL-TCANL-JRNPRI requires TRAKSECVOL-TCANL
TRAKSECVOL-TCANLAPP requires TRAKSECVOL-TCANL

Your main.cf only shows TRAKSECVOL-TCANL-WIJ requires TRAKSECVOL-TCANL, none of the rest.

You are also missing TRAKSECVOL-TCANL dependency on the diskgroup:

TRAKSECVOL-TCANL requires TRAKSEC-INT

Please fix these dependancies, then use the dependency tree view in the Java GUI to check that dependencies are correct.

When you offline and online the SG, you will be able to see resources going up and down in the correct order.

Another great utility to test your config is the VCS Simulator.

Handy NetBackup Links

View solution in original post

Gaurav_S · ‎04-01-2014

Spot on Marianne ... Completely agree, you are facing issues because of nested mounts

You need to set the right dependency order as suggested above so that nested mounts go online/offline in correct order.

Download simulator from below link & see how to use it

https://www-secure.symantec.com/connect/forums/sfha-solutions-601-using-veritas-cluster-server-simulator

modify the configuration in simulator & test behavior. Once successfully tested, you can go ahead on Production

G

View solution in original post

Gaurav_S · ‎03-27-2014

Hi,

Please paste the snippet of engine_A.log for us to see what is happening

also, when it volumes are stopping, are you able to see any errors in messages file ?

are you able to run normal vx commands like "vxdisk list" or "vxtask list" when this issue happens ?

G

solom · ‎03-27-2014

I attached engine_A.log .

Just the errors messages on vom and i not run any commands i just working on vom .

Regards

solom · ‎03-30-2014

In messages log this errors come .

code:

Mar 30 12:05:45 TCSEC-CLU1 Had[23496]: VCS ERROR V-16-10031-1522 (TCSEC-CLU2) DiskGroup:TRAKSEC-LIVEANL:clean:Could not deport the disk group TRAKSEC-LIVEANL.
Mar 30 12:05:46 TCSEC-CLU1 Had[23496]: VCS ERROR V-16-2-13069 (TCSEC-CLU2) Resource(TRAKSEC-LIVEANL) - clean failed.
Mar 30 12:06:46 TCSEC-CLU1 Had[23496]: VCS ERROR V-16-2-13077 (TCSEC-CLU2) Agent is unable to offline resource(TRAKSEC-LIVEANL). Administrative intervention may be required.
Mar 30 12:06:47 TCSEC-CLU1 Had[23496]: VCS ERROR V-16-10031-1522 (TCSEC-CLU2) DiskGroup:TRAKSEC-LIVEANL:clean:Could not

mokkan · ‎03-30-2014

IS this a production or development configuration. Is it possible for you to stop VCS? And move the fileystem manually without ucing VCS?

Gaurav_S · ‎03-31-2014

Hi,

I am little confused with timestamps, you have pasted above timestamp of Mar 30 however the engine log you have attached has logs only till Mar 27

however lets see what happened on Mar 27

Manually initiated switch

2014/03/27 13:06:08 VCS INFO V-16-1-50135 User admin fired command: hagrp -switch TRAKPRI-LIVEINT TCPRI-CLU1 localclus from ::ffff:10.100.208.76

2014/03/27 13:06:08 VCS NOTICE V-16-1-10208 Initiating switch of group TRAKPRI-LIVEINT from system TCPRI-CLU2 to system TCPRI-CLU1
2014/03/27 13:06:08 VCS NOTICE V-16-1-10300 Initiating Offline of Resource ip3 (Owner: Unspecified, Group: TRAKPRI-LIVEINT) on System TCPRI-CLU2
2014/03/27 13:06:09 VCS INFO V-16-1-10305 Resource ip3 (Owner: Unspecified, Group: TRAKPRI-LIVEINT) is offline on TCPRI-CLU2 (VCS initiated)
2014/03/27 13:06:09 VCS NOTICE V-16-1-10300 Initiating Offline of Resource TRAKPRIVOL-INTJRNPRI (Owner: Unspecified, Group: TRAKPRI-LIVEINT) on System TCPRI-CLU2
2014/03/27 13:06:09 VCS NOTICE V-16-1-10300 Initiating Offline of Resource TRAKPRIVOL-INT (Owner: Unspecified, Group: TRAKPRI-LIVEINT) on System TCPRI-CLU2
2014/03/27 13:06:09 VCS NOTICE V-16-1-10300 Initiating Offline of Resource TRAKPRIVOL-INTJRNALT (Owner: Unspecified, Group: TRAKPRI-LIVEINT) on System TCPRI-CLU2
2014/03/27 13:06:10 VCS INFO V-16-2-13716 (TCPRI-CLU2) Resource(TRAKPRIVOL-INTJRNPRI): Output of the completed operation (offline)

Two resources reported error saying not mounted

2014/03/27 13:06:10 VCS INFO V-16-2-13716 (TCPRI-CLU2) Resource(TRAKPRIVOL-INTJRNPRI): Output of the completed operation (offline)
==============================================
umount: /trakpri/meam/live/int/jrn/pri: not mounted
==============================================

2014/03/27 13:06:10 VCS INFO V-16-1-10305 Resource TRAKPRIVOL-INTJRNPRI (Owner: Unspecified, Group: TRAKPRI-LIVEINT) is offline on TCPRI-CLU2 (VCS initiated)
2014/03/27 13:06:11 VCS INFO V-16-2-13716 (TCPRI-CLU2) Resource(TRAKPRIVOL-INTJRNALT): Output of the completed operation (offline)
==============================================
umount: /trakpri/meam/live/int/jrn/alt: not mounted
==============================================

vxvol reported issues for multiple volumes not able to stop

2014/03/27 13:06:11 VCS INFO V-16-1-10305 Resource TRAKPRIVOL-INTJRNALT (Owner: Unspecified, Group: TRAKPRI-LIVEINT) is offline on TCPRI-CLU2 (VCS initiated)
2014/03/27 13:06:11 VCS NOTICE V-16-1-10300 Initiating Offline of Resource TRAKPRI-LIVEINT (Owner: Unspecified, Group: TRAKPRI-LIVEINT) on System TCPRI-CLU2
2014/03/27 13:06:11 VCS WARNING V-16-10031-1521 (TCPRI-CLU2) DiskGroup:TRAKPRI-LIVEINT:offline:The command *vxvol -g TRAKPRI-LIVEINT stopall* failed. Doing a forced stop.
2014/03/27 13:06:12 VCS ERROR V-16-10031-1522 (TCPRI-CLU2) DiskGroup:TRAKPRI-LIVEINT:offline:Could not deport the disk group TRAKPRI-LIVEINT.
2014/03/27 13:06:12 VCS INFO V-16-2-13716 (TCPRI-CLU2) Resource(TRAKPRI-LIVEINT): Output of the completed operation (offline)
==============================================
VxVM vxvol ERROR V-5-1-1220 Volume TRAKPRIVOL-INTJRNPRI is currently open or mounted
VxVM vxvol ERROR V-5-1-1220 Volume TRAKPRIVOL-INTJRNALT is currently open or mounted
VxVM vxvol WARNING V-5-1-1220 Volume TRAKPRIVOL-INTJRNPRI is currently open or mounted
VxVM vxvol WARNING V-5-1-1220 Volume TRAKPRIVOL-INTJRNALT is currently open or mounted
VxVM vxdg ERROR V-5-1-584 Disk group TRAKPRI-LIVEINT: Some volumes in the disk group are in use
==============================================

Also, diskgroup went in disabled state

2014/03/27 15:11:27 VCS INFO V-16-2-13717 (TCPRI-CLU2) Output of the completed operation (imf_getnotification)
==============================================
Cannot continue monitoring event
Got notification for group: TRAKPRI-LIVETC

==============================================

2014/03/27 15:16:24 VCS CRITICAL V-16-10031-1533 (TCPRI-CLU2) DiskGroup:TRAKPRI-LIVEINT:monitor:**ADMINISTRATIVE HELP** required, disk group (TRAKPRI-LIVEINT) is *DISABLED* on the system .
2014/03/27 15:16:24 VCS WARNING V-16-10031-1521 (TCPRI-CLU2) DiskGroup:TRAKPRI-LIVEINT:clean:The command *vxvol -g TRAKPRI-LIVEINT stopall* failed. Doing a forced stop.
2014/03/27 15:16:24 VCS ERROR V-16-10031-1522 (TCPRI-CLU2) DiskGroup:TRAKPRI-LIVEINT:clean:Could not deport the disk group TRAKPRI-LIVEINT.
2014/03/27 15:16:25 VCS INFO V-16-2-13716 (TCPRI-CLU2) Resource(TRAKPRI-LIVEINT): Output of the completed operation (clean)

So with above, my understanding is

1. Check system messages for same time. Are you having any storage related issues during same time, a diskgroup going to disable state indicates volume manager was unable to make I/O private region of disks & hence configuration marked disabled which may be preventing further operations.

2. Second thing, verify the configuration, the first volume which gives error . I noticed the error in previous attempts as well, error starts from this volume only

2014/03/25 12:25:02 VCS INFO V-16-2-13716 (TCPRI-CLU1) Resource(TRAKPRIVOL-INTJRNPRI): Output of the completed operation (offline)
==============================================
umount: /trakpri/meam/live/int/jrn/pri: not mounted
==============================================

attach main.cf here once for review

G

solom · ‎04-01-2014

I attached main.cf

Thanks

Marianne · ‎04-01-2014

You seem to have 'nested' mounts:

MountPoint = "/traksec/meam/live/tcanl"

MountPoint = "/traksec/meam/live/tcanl/jrn/alt"

MountPoint = "/traksec/meam/live/tcanl/jrn/pri"

MountPoint = "/traksec/meam/live/tcanl/wij"

MountPoint = "/traksec/meam/live/tcanl/app"

This means that /traksec/meam/live/tcanl must be mounted before all the other filesystems can be mounted.

Same with service group offline - all other filesystems must be unmounted before /traksec/meam/live/tcanl can be unmounted/offlined.

You need more dependencies:

TRAKSECVOL-TCANL-JRNALT requires TRAKSECVOL-TCANL
TRAKSECVOL-TCANL-JRNPRI requires TRAKSECVOL-TCANL
TRAKSECVOL-TCANLAPP requires TRAKSECVOL-TCANL

Your main.cf only shows TRAKSECVOL-TCANL-WIJ requires TRAKSECVOL-TCANL, none of the rest.

You are also missing TRAKSECVOL-TCANL dependency on the diskgroup:

TRAKSECVOL-TCANL requires TRAKSEC-INT

Please fix these dependancies, then use the dependency tree view in the Java GUI to check that dependencies are correct.

When you offline and online the SG, you will be able to see resources going up and down in the correct order.

Another great utility to test your config is the VCS Simulator.

Handy NetBackup Links

Gaurav_S · ‎04-01-2014

Spot on Marianne ... Completely agree, you are facing issues because of nested mounts

You need to set the right dependency order as suggested above so that nested mounts go online/offline in correct order.

Download simulator from below link & see how to use it

https://www-secure.symantec.com/connect/forums/sfha-solutions-601-using-veritas-cluster-server-simulator

modify the configuration in simulator & test behavior. Once successfully tested, you can go ahead on Production

G

solom · ‎04-03-2014

TRAKSECVOL-TCANL-JRNALT requires TRAKSECVOL-TCANL
TRAKSECVOL-TCANL-JRNPRI requires TRAKSECVOL-TCANL
TRAKSECVOL-TCANLAPP requires TRAKSECVOL-TCANL

This problem was .

Thank you very mach for all . and thanks mr. Gaurav Sangamnerkar

Marianne · ‎04-04-2014

You also need this dependancy:

TRAKSECVOL-TCANL requires TRAKSEC-INT

Please verify the dependancies in rest of Service Groups as well, as you seem to have nested mounts in all of them.

PS:

I see that you had the same issue in July last year:
https://www-secure.symantec.com/connect/forums/faulted-node2

Handy NetBackup Links

VOX

Disk groups stoping