β07-05-2013 02:58 AM
I have two nodes VCS on solaris. Two SGs are configured as dependet:
nssitdb01-zone_sg is parralel, Autostart is on on both nodes. Failover service group dbhost-app_sg is dependet on nssitdb01-zone_sg, Autostart is on on both nodes.
After clean reboot (init 6) of node B (sirius) VCS started nssitdb01-zone_sg and then is trying to bring online dbhost-app_sg too, even though it is online on node A(arcturus)!
Hier is some info from engine_A.log:
2013/07/05 10:16:13 VCS NOTICE V-16-1-10438 Group nssitdb01-zone_sg has been probed on system sirius
2013/07/05 10:16:13 VCS NOTICE V-16-1-10442 Initiating auto-start online of group nssitdb01-zone_sg on system sirius
2013/07/05 10:16:33 VCS NOTICE V-16-1-10447 Group nssitdb01-zone_sg is online on system sirius
2013/07/05 10:16:33 VCS WARNING V-16-1-50045 Initiating online of parent group dbhost-app_sg, PM will select the best node
2013/07/05 10:16:33 VCS INFO V-16-1-10493 Evaluating sirius as potential target node for group dbhost-app_sg
2013/07/05 10:16:33 VCS INFO V-16-1-10162 Group dbhost-app_sg has not been fully probed on system sirius
2013/07/05 10:16:33 VCS INFO V-16-1-10493 Evaluating arcturus as potential target node for group dbhost-app_sg
2013/07/05 10:16:33 VCS INFO V-16-1-50010 Group dbhost-app_sg is online or faulted on system arcturus
2013/07/05 10:16:53 VCS NOTICE V-16-1-10438 Group dbhost-app_sg has been probed on system sirius
2013/07/05 10:16:53 VCS INFO V-16-1-50007 Initiating auto-start online of group dbhost-app_sg
2013/07/05 10:16:53 VCS INFO V-16-1-10493 Evaluating arcturus as potential target node for group dbhost-app_sg
2013/07/05 10:16:53 VCS INFO V-16-1-50010 Group dbhost-app_sg is online or faulted on system arcturus
2013/07/05 10:16:53 VCS INFO V-16-1-10493 Evaluating sirius as potential target node for group dbhost-app_sg
2013/07/05 10:16:53 VCS NOTICE V-16-1-10233 Clearing Restart attribute for group dbhost-app_sg on all nodes
2013/07/05 10:20:16 VCS ERROR V-16-1-10205 Group dbhost-app_sg is faulted on system sirius
2013/07/05 10:20:16 VCS NOTICE V-16-1-10446 Group dbhost-app_sg is offline on system sirius
2013/07/05 10:20:16 VCS INFO V-16-1-10493 Evaluating sirius as potential target node for group dbhost-app_sg
2013/07/05 10:20:16 VCS INFO V-16-1-50010 Group dbhost-app_sg is online or faulted on system sirius
2013/07/05 10:20:16 VCS INFO V-16-1-10493 Evaluating arcturus as potential target node for group dbhost-app_sg
2013/07/05 10:20:16 VCS INFO V-16-1-50010 Group dbhost-app_sg is online or faulted on system arcturus
Only zfs pools and IP prevent to go online of dbhost-app_sg on Node B:
..
2013/07/05 10:17:44 VCS WARNING V-16-10001-20002 (sirius) Zpool:zpool_limsdb-admin:online:zpool import limsdb-admin failed. Try again using the force import -f option
2013/07/05 10:17:45 VCS WARNING V-16-10001-20002 (sirius) Zpool:zpool_limsdb-archivedata:online:zpool import limsdb-archivedata failed. Try again using the force import -f option
2013/07/05 10:17:48 VCS WARNING V-16-10001-20002 (sirius) Zpool:zpool_limsdb-datafiles:online:zpool import limsdb-datafiles failed. Try again using the force import -f option
2013/07/05 10:17:54 VCS WARNING V-16-10001-20002 (sirius) Zpool:zpool_limsdb-indexfiles:online:zpool import limsdb-indexfiles failed. Try again using the force import -f option
..
013/07/05 10:16:53 VCS ERROR V-16-10001-5013 (sirius) IPMultiNICB:dbhost_ipmultinicb_VLAN10:online:This IP address is configured elsewhere. Will not online
2013/07/05 10:17:53 VCS ERROR V-16-10001-5013 (sirius) IPMultiNICB:dbhost_ipmultinicb_VLAN10:online:This IP address is configured elsewhere. Will not online
..
main.cf:
-------------------------------------------------------
group dbhost-app_sg (
SystemList = { sirius = 1, arcturus = 0 }
ContainerInfo @sirius = { Name = nssitdb01-zone, Type = Zone, Enabled = 1 }
ContainerInfo @arcturus = { Name = nssitdb01-zone, Type = Zone, Enabled = 1 }
AutoStartList = { arcturus, sirius }
Administrators = { z_nssitdb01-zone_arcturus, z_nssitdb01-zone_sirius }
)
...
requires group nssitdb01-zone_sg online local firm
--------------------------------------------------------
group nssitdb01-zone_sg (
SystemList = { arcturus = 0, sirius = 1 }
ContainerInfo @arcturus = { Name = nssitdb01-zone, Type = Zone, Enabled = 1 }
ContainerInfo @sirius = { Name = nssitdb01-zone, Type = Zone, Enabled = 1 }
Parallel = 1
AutoStartList = { arcturus, sirius }
Administrators = { z_nssitdb01-zone_sirius, z_nssitdb01-zone_arcturus }
)
FileNone nssitdb01-zone-root_FileNone (
PathName = "/export/home/nssitdb01-zone/root/.vcs-FileNone-agent"
)
Zone nssitdb01-zone (
Critical = 0
DetachZonePath = 0
)
nssitdb01-zone requires nssitdb01-zone-root_FileNone
---------------------------------------------------------------------------------
Solved! Go to Solution.
β07-05-2013 05:08 AM
I agree with you that the online of the failover group should not be evaluated. You may want to try this with latest version of VCS if you have.
i think this issue must be already fixed in later versions - you may want to check the same on https://sort.symantec.com/documents
β07-05-2013 03:04 AM
what version of VCS are you running? also can you provide the logs where the failover group is declared as ONLINE.
β07-05-2013 04:47 AM
It is "relatively" complex VCS configuration. I've attached some screenshots, if it helps :)
I'v cuted engine_A.log to have only data from last reboot of both nodes (not at once) on 2013/07/03 till today.
IPs are changed from aaa.bbb.*.* to XXX.YYY.*.*
Version of VCS:
SPARC64, Solaris 10 8/11 (Update 10)
# pkginfo -l VRTSvcs
PKGINST: VRTSvcs
NAME: Veritas Cluster Server by Symantec
CATEGORY: system
ARCH: sparc
VERSION: 5.1
BASEDIR: /
VENDOR: Symantec Corporation
DESC: Veritas Cluster Server by Symantec
PSTAMP: 5.1.103.000-5.1SP1RP3-2012-09-13_16.00.00
INSTDATE: Oct 02 2012 16:17
STATUS: completely installed
FILES: 284 installed pathnames
26 shared pathnames
61 directories
105 executables
237190 blocks used (approx)
β07-05-2013 05:08 AM
I agree with you that the online of the failover group should not be evaluated. You may want to try this with latest version of VCS if you have.
i think this issue must be already fixed in later versions - you may want to check the same on https://sort.symantec.com/documents
β08-07-2013 06:38 AM
I do no agree with status of this as Solved! It is not solved all. goto https://sort.symantec.com/documents or goto google.com is not a solution.
Thank you.
β08-08-2013 01:23 AM
Hi mcapler,
I agree google is not a solution.
The Message:
2013/07/05 10:16:33 VCS INFO V-16-1-50010 Group dbhost-app_sg is online or faulted on system arcturus
Is completely normal during failover as VCS checks all cluster nodes as possible failover targets.
Regarding the zpool import error.
Can you please check whether the zpool mountpoint automount is set to legacy?
Please see the bundled agents guide for details (page 72)
http://sfdoccentral.symantec.com/sf/5.1SP1/solaris/pdf/vcs_bundled_agents_51sp1_sol.pdf
Can you import the zpool manually on the command line?
Are there any zpool related errors logged in the system log?
If you try to import manually, do you get any error?
Regarding the IP resource error.
This is quite straigt forward, you try to online an IP address that is already in use on another node.
You need to make sure to use unique IP addresses.
Cheers,
Daniel
β08-08-2013 02:09 AM
β08-08-2013 02:43 AM
My understanding of issue is that if failover service group dbhost-app_sg which is online local firm dependent on parallel group nssitdb01-zone_sg is online on arcturus, then when VCS starts on sirius, group dbhost-app_sg starts on sirius, but it shouldn't as dbhost-app_sg is a failover group and already online on arcturus. I think I have seen something similar to this issue before, but I had PreOnline scripts, so a second node would try to online a group while that group was already onlining, but was in PreOnline script.
Could you provide more logs as there seems to be entries missing as I don't see the entry "Initiating online of group nssitdb01-zone_sg" and logs says "Group dbhost-app_sg is online or faulted on system arcturus" and if group is faulted on arcturus, then online of dbhost-app_sg on sirius is ok, so do the logs show that dbhost-app_sg is online on arcturus.
Another oddity if that I see entries:
2013/07/05 10:20:16 VCS NOTICE V-16-1-10446 Group dbhost-app_sg is offline on system sirius 2013/07/05 10:20:16 VCS INFO V-16-1-10493 Evaluating sirius as potential target node for group dbhost-app_sg 2013/07/05 10:20:16 VCS INFO V-16-1-50010 Group dbhost-app_sg is online or faulted on system sirius
So first VCS says group dbhost-app_sg is "offline" and then in the same second it says it is "online or faulted", but for the group to transition from offline to online or faulted a resource must have gone online, but this is not reported in the logs.
If your issue is a bug then it MAY be fixed in 6.0, but before upgrading to 6.0 you would want to know that this incident is fixed and I couldn't find anything in VCS 6.0 release notes to say this issue had been identified as an incident.
Mike
β08-08-2013 06:55 AM
The logs are attached below.
There are no messages like "Initiating online of group nssitdb01-zone_sg" but "auto-start online":
..
2013/07/05 10:16:13 VCS NOTICE V-16-1-10442 Initiating auto-start online of group nssitdb01-zone_sg on system sirius
...
mcapler.
PS. I see it now it is not a question for community, it is question for Support. But I thought it will better to ask it here because It is almost very stressful to come throw First-Level Support.
BTW. I have in this moment a couple of Cases opened and I say you I'm tired. I'm tired to answer a beginner questions, the people trying to send you throw basics. of course it can work but not for my setup with zfs, 36 zpools over iSCSI from two S7320, VCS, Parallel und Failover Solaris Zones, Clustered Nebackup, 5 Oracle DBs, one Samba-fileserver, BIND-DNS, DHCP on two Cluster Nodes.
Okay forget it. Close the thread, but do not mark it as resolved.
In three weeks when I come from Holyday and then go throw the First-Second-Level-Support hell cause of this case, I will post the βSolutionβ. :)