02-08-2012 07:32 AM
Hi,
I am using VCS5.1 SP1 on two Solaris 10 servers with a Solaris 9 container as a service. I have three networks on the physical hosts using link based ip and then through VCS setting the cluster to ip the three networks onto the Solaris 9 container.
I am receiving the following errors every 30 seconds in the IPMultiNICB logs:
2012/02/08 15:09:03 VCS WARNING V-16-10001-6559 IPMultiNICB:ipb_backup:monitor:Unknown Protocol () type. Set Protocol to default (IPv4).
2012/02/08 15:09:04 VCS WARNING V-16-10001-6559 IPMultiNICB:ipb_ukcsn:monitor:Unknown Protocol () type. Set Protocol to default (IPv4).
2012/02/08 15:09:04 VCS WARNING V-16-10001-6559 IPMultiNICB:ipb_prod:monitor:Unknown Protocol () type. Set Protocol to default (IPv4).
Here is what is set in the main.cf file for each MultiNICB resource and IPMultiNICB resource and types file entries:
IPMultiNICB ipb_backup (
Critical = 0
BaseResName = mnb_backup
Address = "10.6.241.83"
NetMask = "255.255.255.0"
)
IPMultiNICB ipb_prod (
Critical = 0
BaseResName = mnb_prod
Address = "10.6.8.66"
NetMask = "255.255.255.0"
)
IPMultiNICB ipb_ukcsn (
Critical = 0
BaseResName = mnb_ukcsn
Address @ecsclapmanu001 = "10.101.14.65"
Address @ecscscomanu001 = "10.101.142.128"
NetMask = "255.255.255.0"
)
MultiNICB mnb_backup (
Critical = 0
UseMpathd = 1
ConfigCheck = 0
Device = { nxge2 = 0, nxge6 = 0 }
GroupName = backup
)
MultiNICB mnb_prod (
Critical = 0
UseMpathd = 1
ConfigCheck = 0
Device = { nxge0 = 0, nxge4 = 0 }
GroupName = prod
)
MultiNICB mnb_ukcsn (
Critical = 0
UseMpathd = 1
ConfigCheck = 0
Device = { nxge1 = 0, nxge5 = 0 }
GroupName = ukcsn
)
type IPMultiNICB (
static int MonitorInterval = 30
static int OnlineRetryLimit = 1
static int ToleranceLimit = 1
static str ArgList[] = { BaseResName, Address, NetMask, DeviceChoice, RouteOptions, PrefixLen, IgnoreMultiNICBFailure, "BaseResName:Protocol", Option
s }
static int ContainerOpts{} = { RunInContainer=0, PassCInfo=1 }
str BaseResName
str Address
str NetMask
str DeviceChoice = 0
str RouteOptions
int PrefixLen
int IgnoreMultiNICBFailure
str Options
)
type MultiNICB (
static int MonitorInterval = 10
static int OfflineMonitorInterval = 60
static str ArgList[] = { UseMpathd, MpathdCommand, ConfigCheck, MpathdRestart, Device, NetworkHosts, LinkTestRatio, IgnoreLinkStatus, NetworkTimeout,
OnlineTestRepeatCount, OfflineTestRepeatCount, NoBroadcast, DefaultRouter, Failback, GroupName, Protocol }
static str Operations = None
int UseMpathd
str MpathdCommand = "/usr/lib/inet/in.mpathd"
int ConfigCheck = 1
int MpathdRestart = 1
str Device{}
str NetworkHosts[]
int LinkTestRatio = 1
int IgnoreLinkStatus = 1
int NetworkTimeout = 100
int OnlineTestRepeatCount = 3
int OfflineTestRepeatCount = 3
int NoBroadcast
str DefaultRouter = "0.0.0.0"
int Failback
str GroupName
str Protocol = IPv4
)
Can anyone recommend a solution - the ip addresses are up.
Regards,
Mark Davies
Solved! Go to Solution.
02-10-2012 01:23 AM
Are you able to run "ha" commands from local zone without been prompted for a password - this file is normally needed so you have permissions to run VCS commands from an essentially unauthorised node.
Also just making sure you noticed the file has a dot at front, so obvioulsly need to use "ls -a" to list and that this should be in roots home directory (normally /root)
I believe the permissions work differently in VCS 6.0, so maybe Symantec brought them in early so work differently from 5.1SP1
Mike
02-08-2012 08:22 AM
Hi Mark,
Hope you are well.
You have not upgraded your types files when you installed SP1. Your types file should contain a "Protocol" attribute.
So you need to extract any tuned setting from types.cf file (custom Restarts, Retrys, Timeouts etc) . Then copy SP1 types files from /etc/VRTSvcs/conf to /etc/VRTSvcs/conf/config and apply any tuned settings you extracted.
You will then need to bounce VCS
Mike
02-08-2012 08:28 AM
Hi Mark,
I trust the types.cf content above is from /etc/VRTSvcs/conf/config directory .. well this seems little wierd as types.cf is clearly setting the string for IPv4 so doesn't makes sense for an protocol error..
I checked the resource state definations from 5.1SP1 bundled guide, all looks pretty much in shape.
would like to know any changes being done manually to the monitor script of IPMultiNICB agent ? also at the same time, are you noticing any error message in engine_A.log or system messages ?
Gaurav
02-08-2012 08:36 AM
Hi Gaurav,
I have not made any changes to the monitoring scripts from what was installed.
No errors in either the engine_A.log or system messages only errors are in the IPMultiNICB_A.log
Regards,
Mark
02-08-2012 08:43 AM
Here is what is in the zones type.cf file:
type IPMultiNICB (
static int ToleranceLimit = 1
static int MonitorInterval = 30
static int OnlineRetryLimit=1
static str ArgList[] = { BaseResName, Address, NetMask, DeviceChoice, RouteOptions, PrefixLen, IgnoreMultiNICBFailure, "BaseResName:Protocol" }
static int ContainerOpts{} = { RunInContainer=0, PassCInfo=1 }
str BaseResName
str Address
str NetMask
str DeviceChoice = 0
str RouteOptions
int PrefixLen
int IgnoreMultiNICBFailure = 0
)
type MultiNICB (
static int MonitorInterval = 10
static int OfflineMonitorInterval = 60
static str Operations = None
static str ArgList[] = { UseMpathd, MpathdCommand, ConfigCheck, MpathdRestart, Device, NetworkHosts, LinkTestRatio, IgnoreLinkStatus, NetworkTimeout,
OnlineTestRepeatCount, OfflineTestRepeatCount, NoBroadcast, DefaultRouter, Failback, GroupName, Protocol }
int UseMpathd
str MpathdCommand = "/usr/lib/inet/in.mpathd"
int ConfigCheck = 1
int MpathdRestart = 1
str Device{}
str NetworkHosts[]
int LinkTestRatio = 1
int IgnoreLinkStatus = 1
int NetworkTimeout = 100
int OnlineTestRepeatCount = 3
int OfflineTestRepeatCount = 3
int NoBroadcast
str DefaultRouter = "0.0.0.0"
int Failback
str GroupName
str Protocol
)
Does this need to change in the Soalris 9 zone ?
02-08-2012 08:44 AM
Sorry Mark, was looking at IPMultiNICB type (as error is shown against IPMultiNICB resource not MultiNICB resource) in your post when looking for Protocol attribute, but Protocol attribute should be in MultiNICB type, so I see you have upgraded types file.
Can you check output of "hares -display mnb_prod" to check resources shows "IPv4" for Protocol attribute (this should have been brought though from types file)
Mike
02-08-2012 09:54 AM
I would create an IPMultiNICB resource that runs in the global zone to see if you get the same error - if you get the same error, then you know it has nothing to do with zones, but if you don't get error, then quite possibly it is a bug or undocumented step you need to do as there is very little info on branded zones - there is just a short section in the SFHA Virtualisation Guide which I presume you have read to get as far as you have.
Mike
02-08-2012 10:46 AM
Hi Mike,
I am fine thanks and still working with EC !
I hope you are well.
Here is the output, as you can see the MultiNICB process is in a commonsg group which is not part of the zone and the IPMultiNIB is is part of the zone as it has to be:
bash-3.2# hares -display mnb_prod
#Resource Attribute System Value
mnb_prod Group global commonsg
mnb_prod Type global MultiNICB
mnb_prod AutoStart global 1
mnb_prod Critical global 0
mnb_prod Enabled global 1
mnb_prod LastOnline global ecsclapmanu001
mnb_prod MonitorOnly global 0
mnb_prod ResourceOwner global
mnb_prod TriggerEvent global 0
mnb_prod ArgListValues ecsclapmanu001 UseMpathd 1 1 MpathdCommand 1 /usr/lib/inet/in.mpathd ConfigCheck 1 0 MpathdRestart 1 1 Device 4 nxge0 0 nxge4 0 NetworkHosts 0 LinkTestRatio 1 1 IgnoreLinkStatus 1 1 NetworkTimeout 1 100 OnlineTestRepeatCount 1 3 OfflineTestRepeatCount 1 3 NoBroadcast 1 0 DefaultRouter 1 0.0.0.0 Failback 1 0 GroupName 1 prod Protocol 1 IPv4
mnb_prod ArgListValues ecscscomanu001 UseMpathd 1 1 MpathdCommand 1 /usr/lib/inet/in.mpathd ConfigCheck 1 0 MpathdRestart 1 1 Device 4 nxge0 0 nxge4 0 NetworkHosts 0 LinkTestRatio 1 1 IgnoreLinkStatus 1 1 NetworkTimeout 1 100 OnlineTestRepeatCount 1 3 OfflineTestRepeatCount 1 3 NoBroadcast 1 0 DefaultRouter 1 0.0.0.0 Failback 1 0 GroupName 1 prod Protocol 1 IPv4
mnb_prod ConfidenceLevel ecsclapmanu001 0
mnb_prod ConfidenceLevel ecscscomanu001 0
mnb_prod ConfidenceMsg ecsclapmanu001
mnb_prod ConfidenceMsg ecscscomanu001
mnb_prod Flags ecsclapmanu001
mnb_prod Flags ecscscomanu001
mnb_prod IState ecsclapmanu001 not waiting
mnb_prod IState ecscscomanu001 not waiting
mnb_prod MonitorMethod ecsclapmanu001 Traditional
mnb_prod MonitorMethod ecscscomanu001 Traditional
mnb_prod Probed ecsclapmanu001 1
mnb_prod Probed ecscscomanu001 1
mnb_prod Start ecsclapmanu001 0
mnb_prod Start ecscscomanu001 0
mnb_prod State ecsclapmanu001 ONLINE
mnb_prod State ecscscomanu001 ONLINE
mnb_prod ComputeStats global 0
mnb_prod ConfigCheck global 0
mnb_prod DefaultRouter global 0.0.0.0
mnb_prod Device global nxge0 0 nxge4 0
mnb_prod Failback global 0
mnb_prod GroupName global prod
mnb_prod IgnoreLinkStatus global 1
mnb_prod LinkTestRatio global 1
mnb_prod MpathdCommand global /usr/lib/inet/in.mpathd
mnb_prod MpathdRestart global 1
mnb_prod NetworkHosts global
mnb_prod NetworkTimeout global 100
mnb_prod NoBroadcast global 0
mnb_prod OfflineTestRepeatCount global 3
mnb_prod OnlineTestRepeatCount global 3
mnb_prod Protocol global IPv4
mnb_prod TriggerResStateChange global 0
mnb_prod UseMpathd global 1
mnb_prod ContainerInfo ecsclapmanu001 Type Name Enabled
mnb_prod ContainerInfo ecscscomanu001 Type Name Enabled
mnb_prod MonitorTimeStats ecsclapmanu001 Avg 0 TS
mnb_prod MonitorTimeStats ecscscomanu001 Avg 0 TS
mnb_prod ResourceInfo ecsclapmanu001 State Valid Msg TS
mnb_prod ResourceInfo ecscscomanu001 State Valid Msg TS
bash-3.2# hares -display ipb_prod
#Resource Attribute System Value
ipb_prod Group global MGX1P
ipb_prod Type global IPMultiNICB
ipb_prod AutoStart global 1
ipb_prod Critical global 0
ipb_prod Enabled global 1
ipb_prod LastOnline global ecsclapmanu001
ipb_prod MonitorOnly global 0
ipb_prod ResourceOwner global
ipb_prod TriggerEvent global 0
ipb_prod ArgListValues ecsclapmanu001 BaseResName 1 mnb_prod Address 1 10.6.8.66 NetMask 1 255.255.255.0 DeviceChoice 1 0 RouteOptions 1 "" PrefixLen 1 0 IgnoreMultiNICBFailure 1 0 BaseResName:Protocol 1 "" Options 1 ""
ipb_prod ArgListValues ecscscomanu001 BaseResName 1 mnb_prod Address 1 10.6.8.66 NetMask 1 255.255.255.0 DeviceChoice 1 0 RouteOptions 1 "" PrefixLen 1 0 IgnoreMultiNICBFailure 1 0 BaseResName:Protocol 1 "" Options 1 ""
ipb_prod ConfidenceLevel ecsclapmanu001 0
ipb_prod ConfidenceLevel ecscscomanu001 0
ipb_prod ConfidenceMsg ecsclapmanu001
ipb_prod ConfidenceMsg ecscscomanu001
ipb_prod Flags ecsclapmanu001
ipb_prod Flags ecscscomanu001
ipb_prod IState ecsclapmanu001 not waiting
ipb_prod IState ecscscomanu001 not waiting
ipb_prod MonitorMethod ecsclapmanu001 Traditional
ipb_prod MonitorMethod ecscscomanu001 Traditional
ipb_prod Probed ecsclapmanu001 1
ipb_prod Probed ecscscomanu001 1
ipb_prod Start ecsclapmanu001 1
ipb_prod Start ecscscomanu001 0
ipb_prod State ecsclapmanu001 ONLINE
ipb_prod State ecscscomanu001 OFFLINE
ipb_prod Address global 10.6.8.66
ipb_prod BaseResName global mnb_prod
ipb_prod ComputeStats global 0
ipb_prod DeviceChoice global 0
ipb_prod IgnoreMultiNICBFailure global 0
ipb_prod NetMask global 255.255.255.0
ipb_prod Options global
ipb_prod PrefixLen global 0
ipb_prod ResourceInfo global State Stale Msg TS
ipb_prod RouteOptions global
ipb_prod TriggerResStateChange global 0
ipb_prod ContainerInfo ecsclapmanu001 Type Zone Name ebs-manu51prod Enabled 1
ipb_prod ContainerInfo ecscscomanu001 Type Zone Name ebs-manu51prod Enabled 1
ipb_prod MonitorTimeStats ecsclapmanu001 Avg 0 TS
ipb_prod MonitorTimeStats ecscscomanu001 Avg 0 TS
bash-3.2#
02-08-2012 10:53 AM
Hi Mike,
I have created a IPMultiNIB resource in the global zone and doesn't report any errors.
So it must be something in running the IPMultiNICB resource from within the zone.
Mark
02-09-2012 01:42 AM
Hi Mark,
Could just check that when you zlogin into the local zone, that in the home directory (of root) there is a file called .vcspwd and that you can run ha commands from the local zone (this will only work if .vcspwd exists and is correct)
I may I have identified why the issue might be occuring. Normally the ArgListValues of a resource passes just the attributes of that resource. It doesn't have to and I have seen agents that pass type attributes (OfflineTimeOut etc) and although this works I have seen issues with this as for instance you can't copy this type of resource properly in the 5.0 GUI as the copy bombs out as it seems to use the ArglistValues to create the resource. Your IPMultiNIC resource has a value in the ArgListValues of:
BaseResName:Protocol 1 ""
So this presumbly, SHOULD take the resource that is in BaseResName which is the MultiNICB resource and extract the Protocol attribute from this resource, but this is "", where I think it should be IPv4. This might not be the case so check your global IPMultiNICB resource to see if IPv4 is set there. If it is not set then it maybe the agent generates this on the fly and this is why it may work in the global zone and not for the agent running in the local zone which is why I asked you to check ha commands work in the local zone.
Mike
02-09-2012 01:50 AM
Hi Mike,
There is no .vcspwd file in the /root of the zone.
Regards,
Mark
02-09-2012 02:11 AM
Mike,
How strange ! I have restarted VCS and the I am no longer seeing the errors and the hares -display is now showing the correct IPv4 !
bash-3.2# hares -display ipb_prod
#Resource Attribute System Value
ipb_prod Group global MGX1P
ipb_prod Type global IPMultiNICB
ipb_prod AutoStart global 1
ipb_prod Critical global 0
ipb_prod Enabled global 1
ipb_prod LastOnline global ecsclapmanu001
ipb_prod MonitorOnly global 0
ipb_prod ResourceOwner global
ipb_prod TriggerEvent global 0
ipb_prod ArgListValues ecsclapmanu001 BaseResName 1 mnb_prod Address 1 10.6.8.66 NetMask 1 255.255.255.0 DeviceChoice 1 0 RouteOptions 1 "" PrefixLen 1 0 IgnoreMultiNICBFailure 1 0 BaseResName:Protocol 1 IPv4 Options 1 ""
ipb_prod ArgListValues ecscscomanu001 BaseResName 1 mnb_prod Address 1 10.6.8.66 NetMask 1 255.255.255.0 DeviceChoice 1 0 RouteOptions 1 "" PrefixLen 1 0 IgnoreMultiNICBFailure 1 0 BaseResName:Protocol 1 IPv4 Options 1 ""
ipb_prod ConfidenceLevel ecsclapmanu001 0
ipb_prod ConfidenceLevel ecscscomanu001 0
ipb_prod ConfidenceMsg ecsclapmanu001
ipb_prod ConfidenceMsg ecscscomanu001
ipb_prod Flags ecsclapmanu001
ipb_prod Flags ecscscomanu001
ipb_prod IState ecsclapmanu001 not waiting
ipb_prod IState ecscscomanu001 not waiting
ipb_prod MonitorMethod ecsclapmanu001 Traditional
ipb_prod MonitorMethod ecscscomanu001 Traditional
ipb_prod Probed ecsclapmanu001 1
ipb_prod Probed ecscscomanu001 1
ipb_prod Start ecsclapmanu001 1
ipb_prod Start ecscscomanu001 0
ipb_prod State ecsclapmanu001 ONLINE
ipb_prod State ecscscomanu001 OFFLINE
ipb_prod Address global 10.6.8.66
ipb_prod BaseResName global mnb_prod
ipb_prod ComputeStats global 0
ipb_prod DeviceChoice global 0
ipb_prod IgnoreMultiNICBFailure global 0
ipb_prod NetMask global 255.255.255.0
ipb_prod Options global
ipb_prod PrefixLen global 0
ipb_prod ResourceInfo global State Valid Msg TS
ipb_prod RouteOptions global
ipb_prod TriggerResStateChange global 0
ipb_prod ContainerInfo ecsclapmanu001 Type Zone Name ebs-manu51prod Enabled 1
ipb_prod ContainerInfo ecscscomanu001 Type Zone Name ebs-manu51prod Enabled 1
ipb_prod MonitorTimeStats ecsclapmanu001 Avg 0 TS
ipb_prod MonitorTimeStats ecscscomanu001 Avg 0 TS
02-09-2012 02:12 AM
Mark,
So presumebly this means ha commands don't work then from the local zone
To fix:
First make sure local zone can resolve the global zone by name - if it can't at the moment, this is why you won't have a .vcspwd file.
Then bounce zone resource in VCS and when zone resource onlines it should create the .vcspwd file and you should be able to run ha commands from the local zone.
Mike
02-09-2012 02:13 AM
Do you have a .vcspwd file now and are ha commands working from local zone.
Mike
02-09-2012 08:25 AM
Hi Miek,
I have put the servers ip addresses into the local zone and proved I can ping the addresses. I have managed to bounce the zone, but I am still not getting that file.
Regards,
Mark
02-10-2012 01:23 AM
Are you able to run "ha" commands from local zone without been prompted for a password - this file is normally needed so you have permissions to run VCS commands from an essentially unauthorised node.
Also just making sure you noticed the file has a dot at front, so obvioulsly need to use "ls -a" to list and that this should be in roots home directory (normally /root)
I believe the permissions work differently in VCS 6.0, so maybe Symantec brought them in early so work differently from 5.1SP1
Mike