Forum Discussion

fabrizio_tivano's avatar
12 years ago

Active VCS node fails when reboting passive node

Dear all,

I've been asked to manage few asymmetric failover win 2003 systems with Veritas Cluster Server installed

in one of this clusters i'm experiencing a failure (service restart and sometimes service freeze)  on the active node when the passive one is rebooting.

 

Looking the forums i found similar problmes was resolved with the last ServicePak, my system looks already updated:

vxassist version = 5.1.20000.87

had.exe version: = 5.1.20024.495

haclus -value EngineVersion = 5.1.00.0

hasys -values NODENAME EngineVersion = 5.1.00.0

 

I'm just lil confused when i see all the above commands returning different versions is that correct ?. Do i need to install this SP2 ?

 

A colleague suggested to disable detail monitor in cluster config and configure the clustered service on windows services.msc = manual

 

Any suggestion is welcomed ,

 

Thanks in advance and Kind Regards,

 

/fabrizio

  • A proxy is used to avoid have 2 resources controlling the same object.  So if you have 2 application service groups using the same NIC, if you put a NIC resource in each service group, then VCS is monitoring the same object twice and this is inefficient, so you create one NIC resource and create a Proxy to that resource that just copies the state from the NIC resource.  You could put a NIC resource in 1 application service group and the proxy in the other, but the usually way to do this is to put the NIC in its own "Parallel" group and both application service groups use Proxys

    A service group dependency is used to link 2 service groups, so for example you might configure an application service group which requires a database service group to be online first (on any system).  Service group dependencies make the config more complex so should be avoided if they are neccessary, so if for instance the application and database were required to be on the same system, then you should put all application and database resources in a single service group.

    So really service group dependencies and proxys are not similar.

    Mike

8 Replies

  • A proxy is used to avoid have 2 resources controlling the same object.  So if you have 2 application service groups using the same NIC, if you put a NIC resource in each service group, then VCS is monitoring the same object twice and this is inefficient, so you create one NIC resource and create a Proxy to that resource that just copies the state from the NIC resource.  You could put a NIC resource in 1 application service group and the proxy in the other, but the usually way to do this is to put the NIC in its own "Parallel" group and both application service groups use Proxys

    A service group dependency is used to link 2 service groups, so for example you might configure an application service group which requires a database service group to be online first (on any system).  Service group dependencies make the config more complex so should be avoided if they are neccessary, so if for instance the application and database were required to be on the same system, then you should put all application and database resources in a single service group.

    So really service group dependencies and proxys are not similar.

    Mike

  • Thanks again Mike,

    what is the difference between service group dependencies and proxie ?

     

     

    /fabrizio

  • Thanks for you answer Mike!

     

    As soon as I'll be able to make this changes and test them,

    this  is a production enviroment,  i'll keep you informed! ;)

     

    /fabrizio

  • Yes this is a problem:

    requires group NIC online global firm

    means the service group this line is below in the main.cf requires the NIC to be online on any system, but the NIC needs to be online on the same local system.  You shouldn't use service group dependencis for this so you shouldn't use

    requires group NIC online local firm

    either and you should use proxies.  So supposing the NIC resource in the NIC group is called public_nic, then in your application service group you should remove line "requires group NIC .." at the bottom and add a Proxy resource dependent on the IP resource in this group (lets call this app1_ip) like:

    Proxy app1_pub_nic_proxy (
      TargetResName = pubic_nic)
    
    app1_ip requires pub_nic_proxy

    and if you have a second service grioup, then you do the same so:

    Proxy app2_pub_nic_proxy (
      TargetResName = pubic_nic)
    
    app2_ip requires pub_nic_proxy

    Mike

  • NEWS:

    In main.cf I found:

    ====

    requires group NIC online global firm 

    ====

     

    VCS nodes that seems not be affected by this issue have:

    ====

    requires group NIC online local firm

    ====

     

    Could be this the problem ?

     

     

     

  • Hi Marianne, 

     

    thanks for your reply.

     

    storeport.sys : 5.2.3790.4121 (srv03_sp2_qfe.070720-0003)

    In the attached .zip file both main.cf  and 201210engine_A.txt.

     

    How i found the problem:

    During some night-time maintenance works (2012/10/04)  where the services was
    active in on node-2, when preformed a clean shutdown of node-1, 

    all the activeservices active on node 2 failled working.

    The only way to resolve problem was to force a reboot of node-2.

    The day after, (2012/10/05) round 12:00 i was able to reproduce the failure,  i saw when rebooting the passive node1, all the services running on active node2 stopped, and then CVS tried  restarting them again on the same node2, but  a service failed to start and then all the services stopped again.

    In order to fix the problem i cleared the failure on VCS console and manually ONLINE node2 from VCSCLI.

     

    Thanks in Advance and Best Regards,

    /fabrizio

  • We need more info, please.

    Overview of current cluster config will be helpful:

     

    <Install Drive>:\Program Files\VERITAS\cluster server\conf\config\main.cf
     
    Record of VCS activity  -  Engine_A log on active node: 
    <Install Drive>:\Program Files\VERITAS\cluster server\log\engine_A.txt
    Please post above as File attachments.
     
    Please also check System Event viewer logs for errors. Windows 2003 Storport drivers are notorious for causing I/O errors, system freeze, etc.
    Check Microsoft KB for latest Storport hotfixes.