cancel
Showing results for 
Search instead for 
Did you mean: 

restrict the number of vcs service group starting up at the same time during hastart

IdaWong
Level 4

hi,

is there any way to restrict the number of service groups that can be started during hastart?  the service groups all have AutoStartList and there is no group dependency except the network interface is in its service group and other service groups access it via proxy.

Our problem is it seems that some of the application would fault because it didn't start within the default 60s during cluster startup (hastart). there is no resource dependency problem. as if i offline and online the service group, the service group would be ok again.

so i am thinking if the number of service groups to start up during (hastart) can be restricted, this problem should go away. or we can put some time delay in the config to do this perhaps?

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Wally_Heim
Level 6
Employee

Hi IdaWong,

There is no way to serialize the service group online without service group dependancies.  However, you can lower the NumThreads for a given resource type to minimize the number of operations done for a given resource type.  This would be handy if you have a large number of a single resource.

In your case, I would recommend increasing the OnlineRetryLimit for the resource that is having problems.  This will tell VCS to attempt to online the resource a given number of times if it is not online after the online entry point completes.  It might take some playing to get a value that works for you.  I would start with either a value of 2 or 3 and see how that does.  The default is 0.

 

Thank you,

Wally

View solution in original post

4 REPLIES 4

Marianne
Level 6
Partner    VIP    Accredited Certified

Only way I could think of is to use Service Group dependencies or pre-online triggers.

I have seen similar issues at a customer with more or less 30 service groups, mostly Oracle instances, spread across 3 HP-UX servers.

Customer had to power down all servers once a month during maintenance slot. 
I cannot remember exact details, other than some sort of catastrophic results when all SGs tried to come online simultateously. 
Customer did not want to consider dependencies or triggers, rather opted to not online automatically.
A sysadmin would be onsite to start SGs one-by-one.
This way all SGs onlined successfully without any other issues.

I am not aware of any way to restrict number of SG's allowed to come online simultaneously.

Wally_Heim
Level 6
Employee

Hi IdaWong,

There is no way to serialize the service group online without service group dependancies.  However, you can lower the NumThreads for a given resource type to minimize the number of operations done for a given resource type.  This would be handy if you have a large number of a single resource.

In your case, I would recommend increasing the OnlineRetryLimit for the resource that is having problems.  This will tell VCS to attempt to online the resource a given number of times if it is not online after the online entry point completes.  It might take some playing to get a value that works for you.  I would start with either a value of 2 or 3 and see how that does.  The default is 0.

 

Thank you,

Wally

mikebounds
Level 6
Partner Accredited

You don't need to retrict the number of service groups starting at the same time, because most resources, like diskgroups, mounts and IPs can cope with been started in parallel, but it sounds like your application cannot.

So you can use NumThreads (as Wally says) for resource type for your application, so if for instance your application is using "Application" resource type and you want no more than 2 applications starting at the same time on the same system use:

hatype -modify Application NumThreads=2

An alternative is to modify the OnlineTimeout so that if multiple applications start at the same time and they take longer as they are starting together, then you allow for, but I think you may have to increase OnlineTimeout even if you use NumThreads, as I think if VCS tries to online a resource, and there isn't a thread available, then the time waiting for a thread eats into the OnlineTimeout.

As Marianne, says you can use the Preonline trigger so the logic is:

loop
  If other applications service groups are onlining (they are in PARTIAL, ONLINING state), then wait 10 seconds
continue looping
Now online service group (execute hagrp -nopre -online)

Mike

IdaWong
Level 4

Thank you for all the great reply. it turns out the problem was just the oracle listener resource. on linux, when the virtual ip is up, usually it doesn't response to ping immediately. the IP resource is reported online. Netlsnr resource then start in less than a second. However, the listener failed to start with the following errors:

 

 

Error listening on: (ADDRESS=(PROTOCOL=TCP)(HOST=xxxxx)(PORT=1525))
TNS-12545: Connect failed because target host or object does not exist
 TNS-12560: TNS:protocol adapter error
  TNS-00515: Connect failed because target host or object does not exist
   Linux Error: 99: Cannot assign requested address
 
restarting the resource would work. i had increase the OnlineRetryLimit to 2. but the real problem was the IP resource reported the IP is ready but it is not really.
 
regards,
ida