cancel
Showing results forΒ 
Search instead forΒ 
Did you mean:Β 

VCS is trying to bring online SG on node B even though it is online on node A

mcapler
Level 3

I have two nodes VCS on solaris. Two SGs are configured as dependet:

nssitdb01-zone_sg is parralel, Autostart is on on both nodes. Failover service group dbhost-app_sg is dependet on nssitdb01-zone_sg, Autostart is on on both nodes.

After clean reboot (init 6) of node B (sirius) VCS started nssitdb01-zone_sg and then is trying to bring online dbhost-app_sg too, even though it is online on node A(arcturus)!

Hier is some info from engine_A.log:

2013/07/05 10:16:13 VCS NOTICE V-16-1-10438 Group nssitdb01-zone_sg has been probed on system sirius
2013/07/05 10:16:13 VCS NOTICE V-16-1-10442 Initiating auto-start online of group nssitdb01-zone_sg on system sirius
2013/07/05 10:16:33 VCS NOTICE V-16-1-10447 Group nssitdb01-zone_sg is online on system sirius
2013/07/05 10:16:33 VCS WARNING V-16-1-50045 Initiating online of parent group dbhost-app_sg, PM will select the best node

2013/07/05 10:16:33 VCS INFO V-16-1-10493 Evaluating sirius as potential target node for group dbhost-app_sg
2013/07/05 10:16:33 VCS INFO V-16-1-10162 Group dbhost-app_sg has not been fully probed on system sirius
2013/07/05 10:16:33 VCS INFO V-16-1-10493 Evaluating arcturus as potential target node for group dbhost-app_sg
2013/07/05 10:16:33 VCS INFO V-16-1-50010 Group dbhost-app_sg is online or faulted on system arcturus
2013/07/05 10:16:53 VCS NOTICE V-16-1-10438 Group dbhost-app_sg has been probed on system sirius
2013/07/05 10:16:53 VCS INFO V-16-1-50007 Initiating auto-start online of group dbhost-app_sg
2013/07/05 10:16:53 VCS INFO V-16-1-10493 Evaluating arcturus as potential target node for group dbhost-app_sg
2013/07/05 10:16:53 VCS INFO V-16-1-50010 Group dbhost-app_sg is online or faulted on system arcturus
2013/07/05 10:16:53 VCS INFO V-16-1-10493 Evaluating sirius as potential target node for group dbhost-app_sg
2013/07/05 10:16:53 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group dbhost-app_sg on all nodes
2013/07/05 10:20:16 VCS ERROR V-16-1-10205 Group dbhost-app_sg is faulted on system sirius
2013/07/05 10:20:16 VCS NOTICE V-16-1-10446 Group dbhost-app_sg is offline on system sirius
2013/07/05 10:20:16 VCS INFO V-16-1-10493 Evaluating sirius as potential target node for group dbhost-app_sg
2013/07/05 10:20:16 VCS INFO V-16-1-50010 Group dbhost-app_sg is online or faulted on system sirius
2013/07/05 10:20:16 VCS INFO V-16-1-10493 Evaluating arcturus as potential target node for group dbhost-app_sg
2013/07/05 10:20:16 VCS INFO V-16-1-50010 Group dbhost-app_sg is online or faulted on system arcturus
 

 

Only zfs pools and IP prevent to go online of dbhost-app_sg on Node B:

..

2013/07/05 10:17:44 VCS WARNING V-16-10001-20002 (sirius) Zpool:zpool_limsdb-admin:online:zpool import limsdb-admin failed. Try again using the force import -f option
2013/07/05 10:17:45 VCS WARNING V-16-10001-20002 (sirius) Zpool:zpool_limsdb-archivedata:online:zpool import limsdb-archivedata failed. Try again using the force import -f option
2013/07/05 10:17:48 VCS WARNING V-16-10001-20002 (sirius) Zpool:zpool_limsdb-datafiles:online:zpool import limsdb-datafiles failed. Try again using the force import -f option
2013/07/05 10:17:54 VCS WARNING V-16-10001-20002 (sirius) Zpool:zpool_limsdb-indexfiles:online:zpool import limsdb-indexfiles failed. Try again using the force import -f option
..

013/07/05 10:16:53 VCS ERROR V-16-10001-5013 (sirius) IPMultiNICB:dbhost_ipmultinicb_VLAN10:online:This IP address is configured elsewhere. Will not online
2013/07/05 10:17:53 VCS ERROR V-16-10001-5013 (sirius) IPMultiNICB:dbhost_ipmultinicb_VLAN10:online:This IP address is configured elsewhere. Will not online

..

 

main.cf:

-------------------------------------------------------

group dbhost-app_sg (
    SystemList = { sirius = 1, arcturus = 0 }
    ContainerInfo @sirius = { Name = nssitdb01-zone, Type = Zone, Enabled = 1 }
    ContainerInfo @arcturus = { Name = nssitdb01-zone, Type = Zone, Enabled = 1 }
    AutoStartList = { arcturus, sirius }
    Administrators = { z_nssitdb01-zone_arcturus, z_nssitdb01-zone_sirius }
    )
...

requires group nssitdb01-zone_sg online local firm
--------------------------------------------------------

group nssitdb01-zone_sg (
        SystemList = { arcturus = 0, sirius = 1 }
    ContainerInfo @arcturus = { Name = nssitdb01-zone, Type = Zone, Enabled = 1 }
    ContainerInfo @sirius = { Name = nssitdb01-zone, Type = Zone, Enabled = 1 }
    Parallel = 1
    AutoStartList = { arcturus, sirius }
    Administrators = { z_nssitdb01-zone_sirius, z_nssitdb01-zone_arcturus }
    )

        FileNone nssitdb01-zone-root_FileNone (
        PathName = "/export/home/nssitdb01-zone/root/.vcs-FileNone-agent"
        )

        Zone nssitdb01-zone (
        Critical = 0
        DetachZonePath = 0
        )

        nssitdb01-zone requires nssitdb01-zone-root_FileNone
 

---------------------------------------------------------------------------------

1 ACCEPTED SOLUTION

Accepted Solutions

arangari
Level 5

I agree with you that the online of the failover group should not be evaluated. You may want to try this with latest version of VCS if you have.

i think this issue must be already fixed in later versions - you may want to check the same on https://sort.symantec.com/documents

 

View solution in original post

8 REPLIES 8

arangari
Level 5

what version of VCS are you running? also can you provide the logs where the failover group is declared as ONLINE.

 

mcapler
Level 3

It is "relatively" complex VCS configuration. I've attached some screenshots, if it helps :)

I'v cuted engine_A.log to have only data from last reboot of both nodes (not at once) on 2013/07/03 till today.

IPs are changed from aaa.bbb.*.* to XXX.YYY.*.*

Version of VCS:

SPARC64, Solaris 10 8/11 (Update 10)

# pkginfo -l VRTSvcs                    
   PKGINST:  VRTSvcs
      NAME:  Veritas Cluster Server by Symantec
  CATEGORY:  system
      ARCH:  sparc
   VERSION:  5.1
   BASEDIR:  /
    VENDOR:  Symantec Corporation
      DESC:  Veritas Cluster Server by Symantec
    PSTAMP:  5.1.103.000-5.1SP1RP3-2012-09-13_16.00.00
  INSTDATE:  Oct 02 2012 16:17
    STATUS:  completely installed
     FILES:      284 installed pathnames
                  26 shared pathnames
                  61 directories
                 105 executables
              237190 blocks used (approx)

 

 

arangari
Level 5

I agree with you that the online of the failover group should not be evaluated. You may want to try this with latest version of VCS if you have.

i think this issue must be already fixed in later versions - you may want to check the same on https://sort.symantec.com/documents

 

mcapler
Level 3

I do no agree with status of this as Solved! It is not solved all. goto https://sort.symantec.com/documents or goto google.com is not a solution.

 

Thank you.

Daniel_Matheus
Level 4
Employee Accredited Certified

Hi mcapler,

 

I agree google is not a solution.

 

The Message:

2013/07/05 10:16:33 VCS INFO V-16-1-50010 Group dbhost-app_sg is online or faulted on system arcturus

Is completely normal during failover as VCS checks all cluster nodes as possible failover targets.

 

Regarding the zpool import error.

Can you please check whether the zpool mountpoint automount is set to legacy?

Please see the bundled agents guide for details (page 72)

http://sfdoccentral.symantec.com/sf/5.1SP1/solaris/pdf/vcs_bundled_agents_51sp1_sol.pdf

Can you import the zpool manually on the command line?

Are there any zpool related errors logged in the system log?

If you try to import manually, do you get any error?

 

Regarding the IP resource error.

This is quite straigt forward, you try to online an IP address that is already in use on another node.

You need to make sure to use unique IP addresses.

 

Cheers,
Daniel

mcapler
Level 3
Hello Daniel yes all mount points are set to legacy. Zpools are not a problem but benefit. Zpool and MultinicB prevent SG to go online twice. nssitdb01-zone_sg is parallel SG - online on both Nodes dbhost-app_sg is failover SG depend on nssitdb01-zone_sg Autostart for dbhost-app_sg is NodeA,NodeB, dbhost-app_sg is online on NodeA NodeB is rebooted an comes up: 2013/07/05 10:16:33 VCS NOTICE V-16-1-10447 Group nssitdb01-zone_sg is online on system NodeB 2013/07/05 10:16:33 VCS WARNING V-16-1-50045 Initiating online of parent group dbhost-app_sg, PM will select the best node 2013/07/05 10:16:33 VCS INFO V-16-1-10493 Evaluating NodeB as potential target node for group dbhost-app_sg 2013/07/05 10:16:33 VCS INFO V-16-1-10162 Group dbhost-app_sg has not been fully probed on system NodeB 2013/07/05 10:16:33 VCS INFO V-16-1-10493 Evaluating NodeA as potential target node for group dbhost-app_sg 2013/07/05 10:16:33 VCS INFO V-16-1-50010 Group dbhost-app_sg is online or faulted on system NodeA 2013/07/05 10:16:53 VCS NOTICE V-16-1-10438 Group dbhost-app_sg has been probed on system NodeB 2013/07/05 10:16:53 VCS INFO V-16-1-50007 Initiating auto-start online of group dbhost-app_sg 2013/07/05 10:16:53 VCS INFO V-16-1-10493 Evaluating NodeA as potential target node for group dbhost-app_sg 2013/07/05 10:16:53 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group dbhost-app_sg on all nodes ..... 2013/07/05 10:16:53 VCS NOTICE V-16-1-10301 Initiating Online of Resource listener_32 (Owner: Unspecified, Group: dbhost-app_sg) on System NodeB ... 2013/07/05 10:16:53 VCS NOTICE V-16-1-10301 Initiating Online of Resource zpool_dbhost-appdata (Owner: Unspecified, Group: dbhost-app_sg) on System NodeB 2013/07/05 10:16:53 VCS NOTICE V-16-1-10301 Initiating Online of Resource zpool_giedb-admin (Owner: Unspecified, Group: dbhost-app_sg) on System NodeB 2013/07/05 10:16:53 VCS NOTICE V-16-1-10301 Initiating Online of Resource zpool_giedb-archivedata (Owner: Unspecified, Group: dbhost-app_sg) on System NodeB .... 2013/07/05 10:17:13 VCS WARNING V-16-10001-20002 (NodeB) Zpool:zpool_giedb-admin:online:zpool import giedb-admin failed. Try again using the force import -f option 2013/07/05 10:17:14 VCS INFO V-16-1-10299 Resource nic_bge10002 (Owner: Unspecified, Group: dnshost-app_sg) is online on sirius (Not initiated by VCS) 2013/07/05 10:17:15 VCS WARNING V-16-10001-20002 (NodeB) Zpool:zpool_giedb-archivedata:online:zpool import giedb-archivedata failed. Try again using the force import -f option 2013/07/05 10:17:15 VCS INFO V-16-1-10299 Resource nic_bge21002 (Owner: Unspecified, Group: dnshost-app_sg) is online on sirius (Not initiated by VCS) ... 2013/07/05 10:17:23 VCS WARNING V-16-10001-20002 (NodeB) Zpool:zpool_dbhost-appdata:online:zpool import dbhost-appdata failed. Try again using the force import -f option 2013/07/05 10:17:27 VCS INFO V-16-2-13716 (NodeB) Resource(zpool_giedb-archivedata): Output of the completed operation (online) ============================================== cannot import 'giedb-archivedata': pool may be in use from other system, it was last accessed by NodeA (hostid: 0x809947b2) on Fri Jul 5 09:12:36 2013 use '-f' to import anyway ============================================== Yes, it is all Online on NodeA ! Have a nice day. mcapler

mikebounds
Level 6
Partner Accredited

My understanding of issue is that if failover service group dbhost-app_sg which is online local firm dependent on parallel group nssitdb01-zone_sg is online on arcturus, then when VCS starts on sirius, group dbhost-app_sg starts on sirius, but it shouldn't as dbhost-app_sg is a failover group and already online on arcturus.  I think I have seen something similar to this issue before, but I had PreOnline scripts, so a second node would try to online a group while that group was already onlining, but was in PreOnline script.

Could you provide more logs as there seems to be entries missing as I don't see the entry "Initiating online of group nssitdb01-zone_sg" and logs says "Group dbhost-app_sg is online or faulted on system arcturus" and if group is faulted on arcturus, then online of dbhost-app_sg on sirius is ok, so do the logs show that dbhost-app_sg is online on arcturus.

Another oddity if that I see entries:

2013/07/05 10:20:16 VCS NOTICE V-16-1-10446 Group dbhost-app_sg is offline on system sirius
2013/07/05 10:20:16 VCS INFO V-16-1-10493 Evaluating sirius as potential target node for group dbhost-app_sg
2013/07/05 10:20:16 VCS INFO V-16-1-50010 Group dbhost-app_sg is online or faulted on system sirius

So first VCS says group dbhost-app_sg is "offline" and then in the same second it says it is "online or faulted", but for the group to transition from offline to online or faulted a resource must have gone online, but this is not reported in the logs.

If your issue is a bug then it MAY be fixed in 6.0, but before upgrading to 6.0 you would want to know that this incident is fixed and I couldn't find anything in VCS 6.0 release notes to say this issue had been identified as an incident.

Mike

mcapler
Level 3

The logs are attached below.

There are no messages like "Initiating online of group nssitdb01-zone_sg"  but "auto-start online":

..

2013/07/05 10:16:13 VCS NOTICE V-16-1-10442 Initiating auto-start online of group nssitdb01-zone_sg on system sirius

...

 

mcapler.

PS.  I see it now it is not a question for community, it is question for Support. But I thought it will better to ask it here because It is almost very stressful to come throw First-Level Support.

BTW. I have in this moment a couple of Cases opened and I say you I'm tired. I'm tired to answer a beginner questions, the people trying to send you throw basics. of course it can work but not for my setup with zfs, 36 zpools over iSCSI from two S7320, VCS, Parallel und Failover Solaris Zones,  Clustered Nebackup, 5 Oracle DBs, one Samba-fileserver, BIND-DNS, DHCP on two Cluster Nodes.

Okay forget it.  Close the thread, but do not mark it as resolved.

In three weeks when I come from Holyday and then go throw the First-Second-Level-Support hell cause of this case, I will post the β€œSolution”. :)