09-02-2011 12:04 AM
Software:Veritas Storage Foundation Cluster File System 5.0MP3
Hareware:Sun Blade X6270
System:Sun Solaris 10
Description
1.Cluster has two group,one oracle agant,another is tomcat,every group has local vip and external vip。
after hastart,All group,dg,agant and resource can run ,no problem at first time。
NodeA can switch Group1 to NodeB(hagrp -switch Group1 -to NodeB)
But NodeB can't switch Group1 to NodeA (hagrp -switch Group1 -to NodeA), the processes stop on offline external vip.
hastatus:
NodeA:/# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A NodeA RUNNING 0
A NodeB RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B Group1 NodeA Y N OFFLINE
B Group1 NodeB Y N STOPPING|PARTIAL
B Network NodeA Y N ONLINE
B Network NodeB Y N ONLINE
B Network_OM NodeA Y N ONLINE
B Network_OM NodeB Y N ONLINE
B Group2 NodeA Y N ONLINE
B Group2 NodeB Y N OFFLINE
B SNMPMasterAgent NodeA Y N ONLINE
B SNMPMasterAgent NodeB Y N ONLINE
B cvm NodeA Y N ONLINE
B cvm NodeB Y N ONLINE
B ora_DG NodeA Y N ONLINE
B ora_DG NodeB Y N ONLINE
-- RESOURCES OFFLINING
-- Group Type Resource System IState
F Group1 IPMultiNIC Group1_OM_IP NodeB W_OFFLINE_PROPAGATE
Log file:
NodeB:/var/VRTSvcs/log# cat engine_A.log |grep Group1_OM_IP
2011/08/29 18:16:21 VCS NOTICE V-16-1-10300 Initiating Offline of Resource Group1_OM_IP (Owner: unknown, Group: FMMgrp) on System NodeB
2011/08/29 18:16:24 VCS ERROR V-16-2-13064 (NodeB) Agent is calling clean for resource(Group1_OM_IP) because the resource is up even after offline completed.
2011/08/29 18:16:26 VCS INFO V-16-2-13001 (NodeB) Resource(Group1_OM_IP): Output of the completed operation (clean)
2011/08/29 18:16:26 VCS INFO V-16-2-13068 (NodeB) Resource(Group1_OM_IP) -clean completed successfully.
2011/08/29 18:16:27 VCS ERROR V-16-2-13077 (NodeB) Agent is unable to offline resource(Group1_OM_IP). Administrative intervention may be required.
NodeB:/var/VRTSvcs/log# cat IPMultiNIC_A.log
2011/08/29 18:16:24 VCS ERROR V-16-2-13064 Thread(5) Agent is calling clean for resource(Group1_OM_IP) because the resource is up even after offline completed.
2011/08/29 18:16:26 VCS ERROR V-16-2-13068 Thread(5) Resource(Group1_OM_IP) -clean completed successfully.
2011/08/29 18:16:27 VCS ERROR V-16-2-13077 Thread(4) Agent is unable to offline resource(Group1_OM_IP). Administrative intervention may be required.
2011/08/29 18:16:28 VCS ERROR V-16-2-13067 Thread(4) Agent is calling clean for resource(Group1_IP) because the resource became OFFLINE unexpectedly, on its own.
2011/08/29 18:16:29 VCS ERROR
Group1_OM_IP is external vip
Group1_IP is local vip
Service Group1 and Network configuration in main.cf :
group Group1 (
SystemList = { NodeA = 2, NodeB = 1 }
AutoStartList = { NodeB }
)
IPMultiNIC Group1_IP (
Address = "192.168.100.71"
NetMask = "255.255.255.192"
MultiNICResName = MultiNICA
IfconfigTwice = 1
)
IPMultiNIC Group1_OM_IP (
Address = "10.10.10.1"
NetMask = "255.255.255.0"
MultiNICResName = MultiNICA_OM
IfconfigTwice = 1
)
ORACLE G1 (
)
Proxy Group1_NIC_PROXY (
TargetResName = MultiNICA
)
Proxy Group1_OM_NIC_PROXY (
TargetResName = MultiNICA_OM
)
Tomcat G1web (
)
requires group Group4 online global soft
Group1_IP requires Group1_NIC_PROXY
Group1_OM_IP requires Group1_OM_NIC_PROXY
G1 requires Group1_IP
G1web requires G1
// resource dependency tree
//
// group Group1
// {
// IPMultiNIC Group1_OM_IP
// {
// Proxy Group1_OM_NIC_PROXY
// }
// Tomcat G1web
// {
// ORACLE G1
// {
// IPMultiNIC Group1_IP
// {
// Proxy Group1_NIC_PROXY
// }
// }
// }
// }
group Network (
SystemList = { NodeA = 1, NodeB = 2 }
Parallel = 1
AutoStartList = { NodeA, NodeB }
)
MultiNICA MultiNICA (
Device @NodeA = { e1000g2 = "192.168.100.66" }
Device @NodeB = { e1000g2 = "192.168.100.67" }
NetMask = "255.255.255.192"
ArpDelay = 5
RouteOptions = "192.168.100.65"
IfconfigTwice = 1
NetworkHosts = { "192.168.100.65", "192.168.100.126" }
)
Phantom Phantom (
)
// resource dependency tree
//
// group Network
// {
// MultiNICA MultiNICA
// Phantom Phantom
// }
group Network_OM (
SystemList = { NodeA = 1, NodeB = 2 }
Parallel = 1
AutoStartList = { NodeA, NodeB }
)
MultiNICA MultiNICA_OM (
Device @NodeA = { e1000g4 = "10.10.10.2" }
Device @NodeB = { e1000g4 = "10.10.10.3" }
NetMask = "255.255.255.0"
ArpDelay = 5
RouteOptions = "10.10.10.10"
IfconfigTwice = 1
NetworkHosts = { "10.10.10.10", "10.10.10.4" }
)
Phantom Phantom_OM (
)
// resource dependency tree
//
// group Network_OM
// {
// MultiNICA MultiNICA_OM
// Phantom Phantom_OM
// }
if unplumb e1000g4:1 interface, NodeB Service Group1 can switch to NodeA !!
e1000g4:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
inet 10.10.10.1 netmask fffffff0 broadcast 10.10.10.255
hastatus:
NodeA:/# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A NodeA RUNNING 0
A NodeB RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B Group1 NodeA Y N ONLINE
B Group1 NodeB Y N OFFLINE
B Network NodeA Y N ONLINE
B Network NodeB Y N ONLINE
B Network_OM NodeA Y N ONLINE
B Network_OM NodeB Y N ONLINE
B Group2 NodeA Y N ONLINE
B Group2 NodeB Y N OFFLINE
B SNMPMasterAgent NodeA Y N ONLINE
B SNMPMasterAgent NodeB Y N ONLINE
B cvm NodeA Y N ONLINE
B cvm NodeB Y N ONLINE
B ora_DG NodeA Y N ONLINE
B ora_DG NodeB Y N ONLINE
i don't know why, please give me a solution to fix it ,tks.
09-02-2011 02:23 AM
The problem could be that the netmask shown in ifconfig is ffffffe0 (255.255.255.224), but in VCS you have this defined as ffffff00 (
NetMask = "255.255.255.0" for resource Group1_OM_IP ) and you also have it defined as something different agai ffffffc0 on the base MultiNICA resource (NetMask = "255.255.255.192" for MultiNICA resource)
A few other points:
Mike
09-02-2011 05:18 AM
i modify log has a fault, because this system is not public. i just fix it .
thanks you ...
i will thinking your points
09-02-2011 06:12 AM
The other question I would have is if you have these interfaces starting when the system starts?
There are several things that could be going wrong. If you have a support contract, I would work with Symantec Technical Support as they have the tools to determine where the error is. The reason the IP is unable to be offlined, even after the clean process is run is not evident though MikeBounds does bring up a good point.
Regards,
Anthony
09-02-2011 07:52 AM
This configuration is ok on different area other Node, The same goes for software hardware and system.
when the system starts only start e1000g2 for local ip. and e1000g4 for external ip start when the HA starts