isaac_tsang
14 years agoLevel 3
MultiNICA failed issue from hagrp
Software: Veritas Storage Foundation Cluster File System 5.0MP3
Hardware: Sun Blade X6270
System: Solaris 10
Description:
hastatus normalcy :
NODE1:/# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A NODE1 RUNNING 0
A NODE2 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B Tomcat NODE1 Y N ONLINE
B Tomcat NODE2 Y N OFFLINE
B Network NODE1 Y N ONLINE
B Network NODE2 Y N ONLINE
B Network_OM NODE1 Y N ONLINE
B Network_OM NODE2 Y N ONLINE
B Oracle1 NODE1 Y N ONLINE
B Oracle1 NODE2 Y N OFFLINE
B cvm NODE1 Y N ONLINE
B cvm NODE2 Y N ONLINE
B ora_DG NODE1 Y N ONLINE
B ora_DG NODE2 Y N ONLINE
Node1 switch Tomcat to Node2:
NODE1:/# hagrp -switch Tomcat -to NODE2
Node1 engine_A.log on Node1 switch to Node2 of Tomcat:
NODE1:/# tail -f /var/VRTSvcs/log/engine_A.log
2011/09/05 15:23:11 VCS WARNING V-16-0 (NODE1) POSTOFFLINE: NODE1 Mediator1 PREONLINE
2011/09/05 15:23:12 VCS ERROR V-16-2-13067 (NODE1) Agent is calling clean for resource(Server1) because the resource became OFFLINE unexpectedly, on its own.
2011/09/05 15:23:13 VCS INFO V-16-2-13068 (NODE1) Resource(Server1) - clean completed successfully.
2011/09/05 15:23:13 VCS ERROR V-16-2-13073 (NODE1) Resource(Server1) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 1 of 1) the resource.
2011/09/05 15:23:13 VCS WARNING V-16-0 (NODE1) Mediator:Server1:online:POSTOFFLINE: NODE1 Mediator1 PREONLINE
2011/09/05 15:23:16 VCS NOTICE V-16-2-13076 (NODE1) Agent has successfully restarted resource(Server1).
2011/09/05 15:23:18 VCS WARNING V-16-0 (NODE1) POSTOFFLINE: NODE1 Mediator1 PREONLINE
2011/09/05 15:23:20 VCS WARNING V-16-0 (NODE1) POSTOFFLINE: NODE1 Mediator1 PREONLINE
2011/09/05 15:23:21 VCS WARNING V-16-0 (NODE1) POSTOFFLINE: NODE1 Mediator1 PREONLINE
2011/09/05 15:23:22 VCS WARNING V-16-0 (NODE1) POSTOFFLINE: NODE1 Mediator1 PREONLINE
2011/09/05 15:28:16 VCS INFO V-16-1-50135 User root fired command: hagrp -switch Tomcat NODE2 from localhost
2011/09/05 15:28:16 VCS NOTICE V-16-1-10208 Initiating switch of group Tomcatfrom system NODE1 to system NODE2
2011/09/05 15:28:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource Tomcat_OM_IP (Owner: unknown, Group: Tomcat) on System NODE1
2011/09/05 15:28:16 VCS NOTICE V-16-1-10300 Initiating Offline of Resource fmmweb (Owner: unknown, Group: Tomcat) on System NODE1
2011/09/05 15:28:18 VCS ERROR V-16-10001-0 (NODE1) Tomcat:fmmweb:offline:Could not stop FMM web server. May be still running.
2011/09/05 15:28:20 VCS INFO V-16-1-10305 Resource Tomcat_OM_IP (Owner: unknown, Group: Tomcat) is offline on NODE1 (VCS initiated)
2011/09/05 15:28:21 VCS ERROR V-16-2-13064 (NODE1) Agent is calling clean for resource(fmmweb) because the resource is up even after offline completed.
2011/09/05 15:28:21 VCS INFO V-16-10001-0 (NODE1) Tomcat:fmmweb:clean:Killing FMM webserver forcefully using /usr/bin/kill -9 command.
2011/09/05 15:28:21 VCS INFO V-16-10001-0 (NODE1) Tomcat:fmmweb:clean:FMM web server successfully killed with /usr/bin/kill -9 command.
2011/09/05 15:28:22 VCS INFO V-16-2-13068 (NODE1) Resource(fmmweb) - clean completed successfully.
2011/09/05 15:28:23 VCS INFO V-16-1-10305 Resource fmmweb (Owner: unknown, Group: Tomcat) is offline on NODE1 (VCS initiated)
2011/09/05 15:28:23 VCS NOTICE V-16-1-10300 Initiating Offline of Resource fmm (Owner: unknown, Group: Tomcat) on System NODE1
2011/09/05 15:28:45 VCS INFO V-16-1-10305 Resource fmm (Owner: unknown, Group: Tomcat) is offline on NODE1 (VCS initiated)
2011/09/05 15:28:45 VCS NOTICE V-16-1-10300 Initiating Offline of Resource Tomcat_IP (Owner: unknown, Group: Tomcat) on System NODE1
2011/09/05 15:28:48 VCS INFO V-16-1-10305 Resource Tomcat_IP (Owner: unknown, Group: Tomcat) is offline on NODE1 (VCS initiated)
2011/09/05 15:28:48 VCS NOTICE V-16-1-10446 Group Tomcatis offline on system NODE1
2011/09/05 15:28:48 VCS NOTICE V-16-1-10301 Initiating Online of Resource Tomcat_IP (Owner: unknown, Group: Tomcat) on System NODE2
2011/09/05 15:28:48 VCS NOTICE V-16-1-10301 Initiating Online of Resource Tomcat_OM_IP (Owner: unknown, Group: Tomcat) on System NODE2
2011/09/05 15:28:49 VCS INFO V-16-6-15002 (NODE1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/nfs_postoffline NODE1 Tomcat successfully
2011/09/05 15:28:49 VCS WARNING V-16-0 (NODE1) hatrigger:POSTOFFLINE: NODE1 Tomcat
2011/09/05 15:28:49 VCS INFO V-16-6-15002 (NODE1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/postoffline NODE1 Tomcat successfully
2011/09/05 15:28:56 VCS INFO V-16-1-10298 Resource Tomcat_IP (Owner: unknown, Group: Tomcat) is online on NODE2 (VCS initiated)
2011/09/05 15:28:56 VCS NOTICE V-16-1-10301 Initiating Online of Resource fmm (Owner: unknown, Group: Tomcat) on System NODE2
2011/09/05 15:28:56 VCS INFO V-16-1-10298 Resource Tomcat_OM_IP (Owner: unknown, Group: Tomcat) is online on NODE2 (VCS initiated)
2011/09/05 15:29:29 VCS INFO V-16-1-10298 Resource fmm (Owner: unknown, Group: Tomcat) is online on NODE2 (VCS initiated)
2011/09/05 15:29:29 VCS NOTICE V-16-1-10301 Initiating Online of Resource fmmweb (Owner: unknown, Group: Tomcat) on System NODE2
2011/09/05 15:29:29 VCS ERROR V-16-10001-0 (NODE2) Tomcat:fmmweb:online:Could not start FMM web server. Kindly check the log file /opt/mediation/Tomcat6.0/logs/catalina.out
2011/09/05 15:29:33 VCS INFO V-16-1-10298 Resource fmmweb (Owner: unknown, Group: Tomcat) is online on NODE2 (VCS initiated)
2011/09/05 15:29:33 VCS NOTICE V-16-1-10447 Group Tomcatis online on system NODE2
After switch , hastatus is ok !
NODE1:/# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A NODE1 RUNNING 0
A NODE2 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B Tomcat NODE1 Y N OFFLINE
B Tomcat NODE2 Y N ONLINE
B Network NODE1 Y N ONLINE
B Network NODE2 Y N ONLINE
B Network_OM NODE1 Y N ONLINE
B Network_OM NODE2 Y N ONLINE
B Oracle1 NODE1 Y N ONLINE
B Oracle1 NODE2 Y N OFFLINE
B cvm NODE1 Y N ONLINE
B cvm NODE2 Y N ONLINE
B ora_DG NODE1 Y N ONLINE
B ora_DG NODE2 Y N ONLINE
Ths same test,Node2 Switch to Node1 of Tomcat:
Switch process stop on Tomcat_OM_IP Resource。
NODE1:/# hastatus -sum
-- SYSTEM STATE
-- System State Frozen
A NODE1 RUNNING 0
A NODE2 RUNNING 0
-- GROUP STATE
-- Group System Probed AutoDisabled State
B Tomcat NODE1 Y N OFFLINE
B Tomcat NODE2 Y N STOPPING|PARTIAL
B Network NODE1 Y N ONLINE
B Network NODE2 Y N ONLINE
B Network_OM NODE1 Y N ONLINE
B Network_OM NODE2 Y N ONLINE
B Oracle1 NODE1 Y N ONLINE
B Oracle1 NODE2 Y N OFFLINE
B cvm NODE1 Y N ONLINE
B cvm NODE2 Y N ONLINE
B ora_DG NODE1 Y N ONLINE
B ora_DG NODE2 Y N ONLINE
-- RESOURCES OFFLINING
-- Group Type Resource System IState
F Tomcat IPMultiNIC Tomcat_OM_IP NODE2 W_OFFLINE_PROPAGATE
Node2 engine_A.log for Node2 switch to Node1 of Tomcat
NODE2:/# tail -f /var/VRTSvcs/log/engine_A.log
2011/09/05 15:29:44 VCS WARNING V-16-0 (NODE1) hatrigger:POSTOFFLINE: NODE1 Tomcat
2011/09/05 15:29:44 VCS INFO V-16-6-15002 (NODE1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/postoffline NODE1 Tomcat successfully
2011/09/05 15:29:52 VCS INFO V-16-1-10298 Resource Tomcat_IP (Owner: unknown, Group: Tomcat) is online on NODE2 (VCS initiated)
2011/09/05 15:29:52 VCS NOTICE V-16-1-10301 Initiating Online of Resource fmm (Owner: unknown, Group: Tomcat) on System NODE2
2011/09/05 15:29:52 VCS INFO V-16-1-10298 Resource Tomcat_OM_IP (Owner: unknown, Group: Tomcat) is online on NODE2 (VCS initiated)
2011/09/05 15:30:24 VCS INFO V-16-1-10298 Resource fmm (Owner: unknown, Group: Tomcat) is online on NODE2 (VCS initiated)
2011/09/05 15:30:24 VCS NOTICE V-16-1-10301 Initiating Online of Resource fmmweb (Owner: unknown, Group: Tomcat) on System NODE2
2011/09/05 15:30:24 VCS ERROR V-16-10001-0 (NODE2) Tomcat:fmmweb:online:Could not start FMM web server. Kindly check the log file /opt/mediation/Tomcat6.0/logs/catalina.out
2011/09/05 15:30:28 VCS INFO V-16-1-10298 Resource fmmweb (Owner: unknown, Group: Tomcat) is online on NODE2 (VCS initiated)
2011/09/05 15:30:28 VCS NOTICE V-16-1-10447 Group Tomcatis online on system NODE2
2011/09/05 15:34:59 VCS NOTICE V-16-1-10208 Initiating switch of group Tomcatfrom system NODE2 to system NODE1
2011/09/05 15:34:59 VCS NOTICE V-16-1-10300 Initiating Offline of Resource Tomcat_OM_IP (Owner: unknown, Group: Tomcat) on System NODE2
2011/09/05 15:34:59 VCS NOTICE V-16-1-10300 Initiating Offline of Resource fmmweb (Owner: unknown, Group: Tomcat) on System NODE2
2011/09/05 15:35:01 VCS ERROR V-16-10001-0 (NODE2) Tomcat:fmmweb:offline:Could not stop FMM web server. May be still running.
2011/09/05 15:35:02 VCS ERROR V-16-2-13064 (NODE2) Agent is calling clean for resource(Tomcat_OM_IP) because the resource is up even after offline completed.
2011/09/05 15:35:04 VCS ERROR V-16-2-13067 (NODE2) Agent is calling clean for resource(SENTINELgrp_IP) because the resource became OFFLINE unexpectedly, on its own.
2011/09/05 15:35:04 VCS INFO V-16-2-13001 (NODE2) Resource(Tomcat_OM_IP): Output of the completed operation (clean)
ifconfig: setifflags: SIOCGLIFFLAGS: e1000g2: no such interface
ifconfig: SIOCGLIFNETMASK: e1000g2: no such interface
ifconfig: SIOCGLIFADDR: e1000g2: no such interface
ifconfig: unplumb: SIOCGLIFFLAGS: e1000g2: no such interface
2011/09/05 15:35:04 VCS INFO V-16-2-13068 (NODE2) Resource(Tomcat_OM_IP) - clean completed successfully.
2011/09/05 15:35:05 VCS INFO V-16-2-13068 (NODE2) Resource(SENTINELgrp_IP) - clean completed successfully.
2011/09/05 15:35:05 VCS ERROR V-16-2-13073 (NODE2) Resource(SENTINELgrp_IP) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 1 of 10) the resource.
2011/09/05 15:35:05 VCS ERROR V-16-10001-0 (NODE2) Tomcat:fmmweb:monitor:The service IP address for fmm service group is unreachable. Please check the network adaptore (NIC's).
2011/09/05 15:35:05 VCS ERROR V-16-2-13077 (NODE2) Agent is unable to offline resource(Tomcat_OM_IP). Administrative intervention may be required.
2011/09/05 15:35:06 VCS ERROR V-16-2-13064 (NODE2) Agent is calling clean for resource(fmmweb) because the resource is up even after offline completed.
2011/09/05 15:35:06 VCS INFO V-16-10001-0 (NODE2) Tomcat:fmmweb:clean:Killing FMM webserver forcefully using /usr/bin/kill -9 command.
2011/09/05 15:35:06 VCS INFO V-16-10001-0 (NODE2) Tomcat:fmmweb:clean:FMM web server successfully killed with /usr/bin/kill -9 command.
2011/09/05 15:35:07 VCS INFO V-16-2-13068 (NODE2) Resource(fmmweb) - clean completed successfully.
2011/09/05 15:35:08 VCS INFO V-16-1-10305 Resource fmmweb (Owner: unknown, Group: Tomcat) is offline on NODE2 (VCS initiated)
2011/09/05 15:35:08 VCS NOTICE V-16-1-10300 Initiating Offline of Resource fmm (Owner: unknown, Group: Tomcat) on System NODE2
2011/09/05 15:35:18 VCS WARNING V-16-10001-6004 (NODE2) MultiNICA:MultiNICA:monitor:Device FAILED
2011/09/05 15:35:18 VCS WARNING V-16-10001-6005 (NODE2) MultiNICA:MultiNICA:monitor:Acquired a WRITE Lock
2011/09/05 15:35:18 VCS WARNING V-16-10001-6006 (NODE2) MultiNICA:MultiNICA:monitor:Bringing down IP addresses
2011/09/05 15:35:18 VCS WARNING V-16-10001-6007 (NODE2) MultiNICA:MultiNICA:monitor:Trying to online Device e1000g2
2011/09/05 15:35:23 VCS INFO V-16-10001-6008 (NODE2) MultiNICA:MultiNICA:monitor:Sleeping 5 seconds
2011/09/05 15:35:28 VCS WARNING V-16-10001-6010 (NODE2) MultiNICA:MultiNICA:monitor:Pinging 192.168.102.125 with Device e1000g2 configured: iteration 1
2011/09/05 15:35:29 VCS WARNING V-16-10001-6016 (NODE2) MultiNICA:MultiNICA:monitor:Migrated to Device e1000g2
2011/09/05 15:35:29 VCS WARNING V-16-10001-6017 (NODE2) MultiNICA:MultiNICA:monitor:Releasing Lock
2011/09/05 15:35:30 VCS ERROR V-16-2-13067 (NODE2) Agent is calling clean for resource(Tomcat_IP) because the resource became OFFLINE unexpectedly, on its own.
2011/09/05 15:35:30 VCS INFO V-16-2-13001 (NODE2) Resource(MultiNICA): Output of the completed operation (monitor)
route: gateway required for add or delete command
2011/09/05 15:35:31 VCS INFO V-16-2-13068 (NODE2) Resource(Tomcat_IP) - clean completed successfully.
2011/09/05 15:35:31 VCS ERROR V-16-2-13073 (NODE2) Resource(Tomcat_IP) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 1 of 10) the resource.
2011/09/05 15:35:32 VCS INFO V-16-1-10305 Resource fmm (Owner: unknown, Group: Tomcat) is offline on NODE2 (VCS initiated)
2011/09/05 15:35:32 VCS NOTICE V-16-1-10300 Initiating Offline of Resource Tomcat_IP (Owner: unknown, Group: Tomcat) on System NODE2
2011/09/05 15:35:39 VCS NOTICE V-16-2-13076 (NODE2) Agent has successfully restarted resource(Tomcat_IP).
2011/09/05 15:35:42 VCS INFO V-16-1-10305 Resource Tomcat_IP (Owner: unknown, Group: Tomcat) is offline on NODE2 (VCS initiated)
2011/09/05 15:36:07 VCS ERROR V-16-2-13066 (NODE2) Agent is calling clean for resource(SENTINELgrp_IP) because the resource is not up even after online completed.
2011/09/05 15:36:08 VCS INFO V-16-2-13068 (NODE2) Resource(SENTINELgrp_IP) - clean completed successfully.
2011/09/05 15:36:08 VCS INFO V-16-2-13072 (NODE2) Resource(SENTINELgrp_IP): Agent is retrying online (attempt number 1 of 4).
2011/09/05 15:36:16 VCS NOTICE V-16-2-13076 (NODE2) Agent has successfully restarted resource(SENTINELgrp_IP).
Cluster main.cf :
include "types.cf"
include "CFSTypes.cf"
include "CVMTypes.cf"
cluster MedCluster (
UserNames = { vcsguest = cD2a90jzh1hgg, vcsop = j5rBONYy1OtL6,
vcsadm = "sEFEKH1CaHW5.",
root = ajjOjeJdkCjpIejQjh }
Administrators = { vcsadm, root }
UseFence = SCSI3
)
system NODE1 (
)
system NODE2 (
)
group FMMgrp (
SystemList = { NODE1 = 2, NODE2 = 1 }
AutoStartList = { NODE1 }
)
IPMultiNIC Tomcat_IP (
Address = "192.168.0.71"
NetMask = "255.255.255.192"
MultiNICResName = MultiNICA
IfconfigTwice = 1
)
IPMultiNIC Tomcat_OM_IP (
Address = "10.10.10.164"
NetMask = "255.255.255.224"
MultiNICResName = MultiNICA_OM
IfconfigTwice = 1
)
ORACLE fmm (
)
Proxy Tomcat_NIC_PROXY (
TargetResName = MultiNICA
)
Proxy Tomcat_OM_NIC_PROXY (
TargetResName = MultiNICA_OM
)
Tomcat fmmweb (
)
requires group Oracle1 online global soft
Tomcat_IP requires Tomcat_NIC_PROXY
Tomcat_OM_IP requires Tomcat_OM_NIC_PROXY
fmm requires Tomcat_IP
fmmweb requires fmm
// resource dependency tree
//
// group FMMgrp
// {
// IPMultiNIC Tomcat_OM_IP
// {
// Proxy Tomcat_OM_NIC_PROXY
// }
// Tomcat fmmweb
// {
// ORACLE fmm
// {
// IPMultiNIC Tomcat_IP
// {
// Proxy Tomcat_NIC_PROXY
// }
// }
// }
// }
group Network (
SystemList = { NODE1 = 1, NODE2 = 2 }
Parallel = 1
AutoStartList = { NODE1, NODE2 }
)
MultiNICA MultiNICA (
Device @NODE1 = { e1000g2 = "192.168.0.66" }
Device @NODE2 = { e1000g2 = "192.168.0.67" }
NetMask = "255.255.255.192"
ArpDelay = 5
RouteOptions = "192.168.0.65"
IfconfigTwice = 1
NetworkHosts = { "192.168.0.125", "192.168.0.126" }
)
Phantom Phantom (
)
// resource dependency tree
//
// group Network
// {
// MultiNICA MultiNICA
// Phantom Phantom
// }
group Network_OM (
SystemList = { NODE1 = 1, NODE2 = 2 }
Parallel = 1
AutoStartList = { NODE1, NODE2 }
)
MultiNICA MultiNICA_OM (
Device @NODE1 = { e1000g4 = "10.10.10.167" }
Device @NODE2 = { e1000g4 = "10.10.10.168" }
NetMask = "255.255.255.224"
ArpDelay = 5
RouteOptions = "10.10.10.190"
IfconfigTwice = 1
NetworkHosts = { "10.10.10.169", "10.10.10.170" }
)
Phantom Phantom_OM (
)
// resource dependency tree
//
// group Network_OM
// {
// MultiNICA MultiNICA_OM
// Phantom Phantom_OM
// }
group Oracle1 (
SystemList = { NODE1 = 1, NODE2 = 2 }
AutoStartList = { NODE1 }
)
IPMultiNIC Oracle1_IP (
Address = "192.168.0.72"
NetMask = "255.255.255.192"
MultiNICResName = MultiNICA
IfconfigTwice = 1
)
ORACLE bgw (
)
Proxy Oracle1_NIC_PROXY (
TargetResName = MultiNICA
)
requires group ora_DG online local firm
Oracle1_IP requires Oracle1_NIC_PROXY
bgw requires Oracle1_IP
// resource dependency tree
//
// group Oracle1
// {
// ORACLE bgw
// {
// IPMultiNIC Oracle1_IP
// {
// Proxy Oracle1_NIC_PROXY
// }
// }
// }
group cvm (
SystemList = { NODE1 = 0, NODE2 = 1 }
AutoFailOver = 0
Parallel = 1
AutoStartList = { NODE1, NODE2 }
)
CFSfsckd vxfsckd (
ActivationMode @NODE1 = { ora1dg = sw, tom1dg = sw }
ActivationMode @NODE2 = { ora1dg = sw, tom1dg = sw }
)
CVMCluster cvm_clus (
CVMClustName = MedCluster
CVMNodeId = { NODE1 = 0, NODE2 = 1 }
CVMTransport = gab
CVMTimeout = 200
)
CVMVxconfigd cvm_vxconfigd (
Critical = 0
CVMVxconfigdArgs = { syslog }
)
cvm_clus requires cvm_vxconfigd
vxfsckd requires cvm_clus
// resource dependency tree
//
// group cvm
// {
// CFSfsckd vxfsckd
// {
// CVMCluster cvm_clus
// {
// CVMVxconfigd cvm_vxconfigd
// }
// }
// }
group ora_DG (
SystemList = { NODE1 = 0, NODE2 = 1 }
AutoFailOver = 0
Parallel = 1
AutoStartList = { NODE1, NODE2 }
)
CFSMount cfsmount4 (
Critical = 0
MountPoint = "/var/opt/mediation/ora"
BlockDevice = "/dev/vx/dsk/ora1dg/vol01"
NodeList = { NODE1, NODE2 }
)
CVMVolDg cvmvoldg4 (
Critical = 0
CVMDiskGroup = ora1dg
CVMActivation @NODE1 = sw
CVMActivation @NODE2 = sw
)
requires group cvm online local firm
cfsmount4 requires cvmvoldg4
// resource dependency tree
//
// group ora_DG
// {
// CFSMount cfsmount4
// {
// CVMVolDg cvmvoldg4
// }
// }
i don't know why, please give me a solution to fix it or search issue, Thanks you very much!
You have resources in your log messages which you have not shown in main.cf - at least:
Resource: Server1
Resource: SENTINELgrp_IP
You probably have conflicing resources like you have defined same IP or interface for different resources.
You need to supply compete main.cf showing Server1 and SENTINELgrp_IP resources and also full output of "ifconfig -a"
Mike