Solved: restrict the number of vcs service group starting ...

IdaWong · ‎11-28-2012

hi,

is there any way to restrict the number of service groups that can be started during hastart? the service groups all have AutoStartList and there is no group dependency except the network interface is in its service group and other service groups access it via proxy.

Our problem is it seems that some of the application would fault because it didn't start within the default 60s during cluster startup (hastart). there is no resource dependency problem. as if i offline and online the service group, the service group would be ok again.

so i am thinking if the number of service groups to start up during (hastart) can be restricted, this problem should go away. or we can put some time delay in the config to do this perhaps?

Wally_Heim · ‎11-28-2012

Hi IdaWong,

There is no way to serialize the service group online without service group dependancies. However, you can lower the NumThreads for a given resource type to minimize the number of operations done for a given resource type. This would be handy if you have a large number of a single resource.

In your case, I would recommend increasing the OnlineRetryLimit for the resource that is having problems. This will tell VCS to attempt to online the resource a given number of times if it is not online after the online entry point completes. It might take some playing to get a value that works for you. I would start with either a value of 2 or 3 and see how that does. The default is 0.

Thank you,

Wally

View solution in original post

Marianne · ‎11-28-2012

Only way I could think of is to use Service Group dependencies or pre-online triggers.

I have seen similar issues at a customer with more or less 30 service groups, mostly Oracle instances, spread across 3 HP-UX servers.

Customer had to power down all servers once a month during maintenance slot.
I cannot remember exact details, other than some sort of catastrophic results when all SGs tried to come online simultateously.
Customer did not want to consider dependencies or triggers, rather opted to not online automatically.
A sysadmin would be onsite to start SGs one-by-one.
This way all SGs onlined successfully without any other issues.

I am not aware of any way to restrict number of SG's allowed to come online simultaneously.

Handy NetBackup Links

Wally_Heim · ‎11-28-2012

Hi IdaWong,

There is no way to serialize the service group online without service group dependancies. However, you can lower the NumThreads for a given resource type to minimize the number of operations done for a given resource type. This would be handy if you have a large number of a single resource.

In your case, I would recommend increasing the OnlineRetryLimit for the resource that is having problems. This will tell VCS to attempt to online the resource a given number of times if it is not online after the online entry point completes. It might take some playing to get a value that works for you. I would start with either a value of 2 or 3 and see how that does. The default is 0.

Thank you,

Wally

mikebounds · ‎11-28-2012

You don't need to retrict the number of service groups starting at the same time, because most resources, like diskgroups, mounts and IPs can cope with been started in parallel, but it sounds like your application cannot.

So you can use NumThreads (as Wally says) for resource type for your application, so if for instance your application is using "Application" resource type and you want no more than 2 applications starting at the same time on the same system use:

hatype -modify Application NumThreads=2

An alternative is to modify the OnlineTimeout so that if multiple applications start at the same time and they take longer as they are starting together, then you allow for, but I think you may have to increase OnlineTimeout even if you use NumThreads, as I think if VCS tries to online a resource, and there isn't a thread available, then the time waiting for a thread eats into the OnlineTimeout.

As Marianne, says you can use the Preonline trigger so the logic is:

loop
If other applications service groups are onlining (they are in PARTIAL, ONLINING state), then wait 10 seconds
continue looping
Now online service group (execute hagrp -nopre -online)

Mike

IdaWong · ‎11-29-2012

Thank you for all the great reply. it turns out the problem was just the oracle listener resource. on linux, when the virtual ip is up, usually it doesn't response to ping immediately. the IP resource is reported online. Netlsnr resource then start in less than a second. However, the listener failed to start with the following errors:

Error listening on: (ADDRESS=(PROTOCOL=TCP)(HOST=xxxxx)(PORT=1525))

TNS-12545: Connect failed because target host or object does not exist

TNS-12560: TNS:protocol adapter error

TNS-00515: Connect failed because target host or object does not exist

Linux Error: 99: Cannot assign requested address

restarting the resource would work. i had increase the OnlineRetryLimit to 2. but the real problem was the IP resource reported the IP is ready but it is not really.

regards,

ida

VOX

restrict the number of vcs service group starting up at the same time during hastart