Forum Discussion

one_who_waits's avatar
11 years ago

VCS Switch Over problem

Hi,

I do have som problems when testing Cluster Switch (2 node cluster VCS 6.2, Solaris 10). We test with init 6. Active node always hangs with :

2014/02/09 08:07:36 VCS ERROR V-16-10001-11511 (node1) Volume:vol_v1:offline:Cannot find Volume v1, either the Volume is a Volume Set or Veritas Volume Manager configuration daemon (vxconfigd) is not in enabled mode
2014/02/09 08:07:36 VCS INFO V-16-6-15002 (node1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resstatechange node1 mnt_1 ONLINE OFFLINE successfully
2014/02/09 08:07:37 VCS INFO V-16-2-13716 (node1) Resource(vol_v1): Output of the completed operation (offline)
==============================================
VxVM vxprint ERROR V-5-1-684 IPC failure: Configuration daemon is not accessible
==============================================

2014/02/09 08:07:37 VCS ERROR V-16-2-13064 (node1) Agent is calling clean for resource(vol_v1) because the resource is up even after offline completed.
2014/02/09 08:07:37 VCS ERROR V-16-10001-11511 (node1) Volume:vol_v1:clean:Cannot find Volume v1, either the Volume is a Volume Set or Veritas Volume Manager configuration daemon (vxconfigd) is not in enabled mode
2014/02/09 08:07:38 VCS INFO V-16-2-13716 (node1) Resource(vol_v1): Output of the completed operation (clean)
==============================================
VxVM vxprint ERROR V-5-1-684 IPC failure: Configuration daemon is not accessible
==============================================

 

I noticed that the init script /etc/rc0.d/K99vxvm-shutdown does stop the vxconfigd and also does "/usr/sbin/vxdctl -k stop".

My question is do I need any vxvm init script since the upgrade from 5 to 6.1 is done and we have SMF service in place, or should I increase the timeouts of the VCS stop procedures?

Thanks a lot in advance!

Cheers

  • I only have access to a 5.1 system.  5.0 has limited support so there are not many customers still running this - see https://sort.symantec.com/eosl/show_matrix:

    Product Version Platform Country End of Support
    Life Date
    End of Limited Support Date Release Date
    Storage Foundation for UNIX/Linux 5.0 MP3 Solaris Worldwide 2014-08-31 2012-06-07 2008-10-06
    Solaris x86-64 Worldwide 2014-08-31 2012-06-07 2008-10-06
    Linux Worldwide 2014-08-31 2012-06-07 2008-08-20
    AIX Worldwide 2014-08-31 2012-06-07 2008-10-06

    So it is quite likely that VCS 6.0 was not tested with vxvm 5.0 which could be why vxvm is shutting down before VCS or it could be you have some obsolete rc scripts.

    As before, I would shut VCS down manually before rebooting system which is good practice in any case, even after you have upgraded vxvm to 6.0 - i.e if you are shutting down node A, then you should manually switch service groups to node B (to ensure they online successfully on node B before shutting down node A), or offline groups if you are shutting both nodes down.

    Mike

  • Normally VCS is shutdown first and then Vxvm and so I would guess that maybe in 5.0, VCS and vxvm were both shutdown by rc scripts and in 6.1, both and shutdown by SMF service, so perhaps with the upgrade from 5 to 6, the vxvm rc script was not removed and so maybe vxvm is being shutdown by rc script before VCS is shutdown using SMF service.

    What services do you have, if you have a SMF service for vxvm, then I would disable the vxvm rc script.

    Mike

  • Hi Mike,

    thanks a lot for you quick answer, we have as follows :

     

    SMF

    # svcs -a | egrep "vcs|gab|vxfen"
    disabled       Feb_09   svc:/system/vcs-onenode:default
    online         Feb_09   svc:/system/gab:default
    online         Feb_09   svc:/system/vxfen:default
    online         Feb_09   svc:/system/vcs:default
    #
     

    init scripts

     # ls -al /etc/rc?.d/{S,K}*{vcs,llt,vxfen,gab,vx}* 2>/dev/null
    -r-x------   5 root     sys         3539 Nov 11  2009 /etc/rc0.d/K00vxrsyncd
    -r-xr-xr-x   5 root     sys         2508 Nov 11  2009 /etc/rc0.d/K02vxnm-vxnetd
    lrwxrwxrwx   1 root     root          27 Aug  4  2010 /etc/rc0.d/K750vxpal.gridnode -> //etc/init.d/vxpal.gridnode
    lrwxrwxrwx   1 root     root          30 Aug  4  2010 /etc/rc0.d/K760vxpal.actionagent -> //etc/init.d/vxpal.actionagent
    -r-xr-xr-x   2 root     sys         3068 Nov 11  2009 /etc/rc0.d/K99vxvm-shutdown
    -r-x------   5 root     sys         3539 Nov 11  2009 /etc/rc1.d/K00vxrsyncd
    -r-xr-xr-x   5 root     sys         2508 Nov 11  2009 /etc/rc1.d/K02vxnm-vxnetd
    lrwxrwxrwx   1 root     root          27 Aug  4  2010 /etc/rc1.d/K750vxpal.gridnode -> //etc/init.d/vxpal.gridnode
    lrwxrwxrwx   1 root     root          30 Aug  4  2010 /etc/rc1.d/K760vxpal.actionagent -> //etc/init.d/vxpal.actionagent
    lrwxrwxrwx   1 root     other         18 Aug  4  2010 /etc/rc2.d/K50vxvail -> /etc/init.d/vxvail
    lrwxrwxrwx   1 root     root          27 Aug  4  2010 /etc/rc2.d/K750vxpal.gridnode -> //etc/init.d/vxpal.gridnode
    lrwxrwxrwx   1 root     other         30 Aug  4  2010 /etc/rc2.d/K75vxpal.StorageAgent -> /etc/init.d/vxpal.StorageAgent
    lrwxrwxrwx   1 root     root          30 Aug  4  2010 /etc/rc2.d/K760vxpal.actionagent -> //etc/init.d/vxpal.actionagent
    -rwxr-xr-x   1 root     bin         2370 Nov 28  2007 /etc/rc2.d/K99vxatd
    lrwxrwxrwx   1 root     other         18 Aug  4  2010 /etc/rc2.d/S50vxvail -> /etc/init.d/vxvail
    -rwxr-xr-x   1 root     bin         2370 Nov 28  2007 /etc/rc2.d/S70vxatd
    lrwxrwxrwx   1 root     root          27 Aug  4  2010 /etc/rc2.d/S750vxpal.gridnode -> //etc/init.d/vxpal.gridnode
    lrwxrwxrwx   1 root     other         30 Aug  4  2010 /etc/rc2.d/S75vxpal.StorageAgent -> /etc/init.d/vxpal.StorageAgent
    lrwxrwxrwx   1 root     root          30 Aug  4  2010 /etc/rc2.d/S760vxpal.actionagent -> //etc/init.d/vxpal.actionagent
    -r-x------   2 root     root        2441 Jul 19  2008 /etc/rc2.d/S92vxdcli
    -r-xr-xr-x   5 root     sys         2508 Nov 11  2009 /etc/rc2.d/S94vxnm-vxnetd
    -r-x------   5 root     sys         3539 Nov 11  2009 /etc/rc2.d/S96vxrsyncd
    -rwxr-xr-x   1 root     sys         1310 May  6  2006 /etc/rc3.d/S99vxtf_chklog
    -r-x------   5 root     sys         3539 Nov 11  2009 /etc/rcS.d/K00vxrsyncd
    -r-xr-xr-x   5 root     sys         2508 Nov 11  2009 /etc/rcS.d/K02vxnm-vxnetd
    lrwxrwxrwx   1 root     root          27 Aug  4  2010 /etc/rcS.d/K750vxpal.gridnode -> //etc/init.d/vxpal.gridnode
    lrwxrwxrwx   1 root     root          30 Aug  4  2010 /etc/rcS.d/K760vxpal.actionagent -> //etc/init.d/vxpal.actionagent
     

    Thanks.

    Cheers

     

  • Hi,

     

    your description is quite unclear.

    If you do a cluster switch (i.e. hagrp -switch command) no init script will be called at all.

    As for the error, it means that the vxconfigd is either not running or not in enabled mode.

     

    What is the current OS and SF version?

    What was previous OS and SF version?

    Was there a vxconfigd coredump generated?

    Was vxconfigd killed or restarted?

    Does vxconfigd start successfully after a reboot?

    There should be errors in the syslog if vxconfigd was not able to start or core dumped.

    regards,
    Dan

     

  • Looks to me that you have vxvm as a service AND an rc script.

    I have looked on a Solaris 10 server running 5.1 and "ls -al /etc/rc?.d/{S,K}*{vcs,llt,vxfen,gab,vx}*" does not match anything, so I think you should disable all these rc script and re-run your test.

    You should also review your upgrade procedure - and if you believe you did this as per the product guides, and you have not manuall changed any rc scripts, then you should raise a support call to say that upgrade has not worked correctly.

    I would have thought that when you removed 5.0 packages, it should have removed the rc scripts and it seems this happened for VCS, but not for vxvm.

    Mike

  • Hi Mike,

    vxvm is still 5.0 , but VCS is 6.1 :

    # pkginfo -l VRTSvxvm
       PKGINST:  VRTSvxvm
          NAME:  Binaries for VERITAS Volume Manager by Symantec
      CATEGORY:  system
          ARCH:  sparc
       VERSION:  5.0,REV=05.11.2006.17.55
       BASEDIR:  /
        VENDOR:  Symantec Corporation
          DESC:  Virtual Disk Subsystem
        PSTAMP:  Veritas-5.0_MP3_RP3.3:2009-11-12
      INSTDATE:  Aug 04 2010 14:40
       HOTLINE:  http://support.veritas.com/phonesup/phonesup_ddProduct_.htm
         EMAIL:  support@veritas.com
        STATUS:  completely installed
         FILES:      995 installed pathnames
                      30 shared pathnames
                      13 linked files
                     112 directories
                     430 executables
                  400303 blocks used (approx)

    #

    # pkginfo -l VRTSvcs 
       PKGINST:  VRTSvcs
          NAME:  Veritas Cluster Server by Symantec
      CATEGORY:  system
          ARCH:  sparc
       VERSION:  6.0.100.000
       BASEDIR:  /
        VENDOR:  Symantec Corporation
          DESC:  Veritas Cluster Server by Symantec
        PSTAMP:  6.0.300.000-GA-2013-01-17-16.00.00
      INSTDATE:  Apr 21 2013 10:15
        STATUS:  completely installed
         FILES:      281 installed pathnames
                      26 shared pathnames
                      57 directories
                     116 executables
                  450978 blocks used (approx)

    #

     

    Thanks for your help.

    Cheers

  • Hi Mike,

    I beleive that you are right, if there are no S/K scripts for vx on Solaris 10 with 5.0 of VERITAS Volume Manager , which is exactly what I have, then I should mv the scripts away and try the test.

    As mentioned, we always do the "init 6". To answer Dan - no, there are no coredums of vxconfigd and it always comes without problems after the reboot (reset from XSCF). Previous we had 5.1 VCS and upgraded to 6.0.1 with standard scripts from Veritas.

    Thanks for you help.

    Mike, did you ran the command in bash (as I did), cause otherwise it does not work:

    # ksh
    #  ls -al /etc/rc?.d/{S,K}*{vcs,llt,vxfen,gab,vx}*
    /etc/rc?.d/{S,K}*{vcs,llt,vxfen,gab,vx}*: No such file or directory
    #
     

    Thanks a lot guys.

    Cheers

    p.s.

    # pkginfo -l VRTSvxvm | grep VERSION
       VERSION:  5.0,REV=05.11.2006.17.55
    #

    # pkginfo -l VRTSvcs | grep VERSION
       VERSION:  6.0.100.000
    #
     

  • Yes shell was batch and also did an ls in the rc directory to check what was there, but this was on a 5.1 system, not 5.0.

    I thought I had seen you had vxvm services, but looking now you only have vxvm rc scripts which may be normal for 5.0.

    Note sure if VCS 6.0 supports Vxvm 5.0.  I still think basic problem is that vxvm is shutting down before VCS, but I don't know how Solaris integrates ordering of rc scripts with services, so this is what you should look at and if you can't get Solaris to shutdown VCS first, then you will have to shut VCS down manually before rebooting systems, as I presume this is just a temporary configuration until you upgrade vxvm.

    Mike

     

  • Hi Mike,

    I also have SMF vxvm services :

    # svcs -a | egrep 'vx|vcs|gab|llt'
    legacy_run     Jan_26   lrc:/etc/rc2_d/S70vxatd
    legacy_run     Jan_26   lrc:/etc/rc2_d/S750vxpal_gridnode
    legacy_run     Jan_26   lrc:/etc/rc2_d/S75vxpal_StorageAgent
    legacy_run     Jan_26   lrc:/etc/rc2_d/S760vxpal_actionagent
    legacy_run     Jan_26   lrc:/etc/rc2_d/S92vxdcli
    legacy_run     Jan_26   lrc:/etc/rc2_d/S94vxnm-vxnetd
    legacy_run     Jan_26   lrc:/etc/rc2_d/S96vxrsyncd
    disabled       Jan_26   svc:/system/vcs-onenode:default
    online         Jan_26   svc:/system/llt:default
    online         Jan_26   svc:/system/gab:default
    online         Jan_26   svc:/system/vxvm/vxvm-sysboot:default
    online         Jan_26   svc:/system/vxvm/vxvm-startup1:default
    online         Jan_26   svc:/system/vxvm/vxvm-startup2:default
    online         Jan_26   svc:/system/vxvm/vxvm-startvc:default
    online         Jan_26   svc:/system/vxvm/vxvm-reconfig:default
    online         Jan_26   svc:/system/vxpbx:default
    online         Jan_26   svc:/system/vxvm/vxvm-recover:default
    online         Jan_26   svc:/system/vxfen:default
    online         Jan_28   svc:/system/vcs:default
    #

    Do you have somewhere such combination, I would really like to know, do I need the rc scripts or not, to me, as you already confirmed, it looks like they are obsolete?

    I have :

    # pkginfo -l VRTSvxvm VRTSvcs | egrep 'NAME|VERS'
          NAME:  Veritas Cluster Server by Symantec
       VERSION:  6.0.100.000
          NAME:  Binaries for VERITAS Volume Manager by Symantec
       VERSION:  5.0,REV=05.11.2006.17.55
    #

     

    THANKS A LOT !

    Cheers

  • I only have access to a 5.1 system.  5.0 has limited support so there are not many customers still running this - see https://sort.symantec.com/eosl/show_matrix:

    Product Version Platform Country End of Support
    Life Date
    End of Limited Support Date Release Date
    Storage Foundation for UNIX/Linux 5.0 MP3 Solaris Worldwide 2014-08-31 2012-06-07 2008-10-06
    Solaris x86-64 Worldwide 2014-08-31 2012-06-07 2008-10-06
    Linux Worldwide 2014-08-31 2012-06-07 2008-08-20
    AIX Worldwide 2014-08-31 2012-06-07 2008-10-06

    So it is quite likely that VCS 6.0 was not tested with vxvm 5.0 which could be why vxvm is shutting down before VCS or it could be you have some obsolete rc scripts.

    As before, I would shut VCS down manually before rebooting system which is good practice in any case, even after you have upgraded vxvm to 6.0 - i.e if you are shutting down node A, then you should manually switch service groups to node B (to ensure they online successfully on node B before shutting down node A), or offline groups if you are shutting both nodes down.

    Mike

  • Hi Mike,

    thanks, you confirmed what seamed to be the possible problem to me and I am going to take your advice.

    Thanks a lot.

    Cheers