02-10-2014 02:57 AM
Hi,
I do have som problems when testing Cluster Switch (2 node cluster VCS 6.2, Solaris 10). We test with init 6. Active node always hangs with :
2014/02/09 08:07:36 VCS ERROR V-16-10001-11511 (node1) Volume:vol_v1:offline:Cannot find Volume v1, either the Volume is a Volume Set or Veritas Volume Manager configuration daemon (vxconfigd) is not in enabled mode
2014/02/09 08:07:36 VCS INFO V-16-6-15002 (node1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/resstatechange node1 mnt_1 ONLINE OFFLINE successfully
2014/02/09 08:07:37 VCS INFO V-16-2-13716 (node1) Resource(vol_v1): Output of the completed operation (offline)
==============================================
VxVM vxprint ERROR V-5-1-684 IPC failure: Configuration daemon is not accessible
==============================================
2014/02/09 08:07:37 VCS ERROR V-16-2-13064 (node1) Agent is calling clean for resource(vol_v1) because the resource is up even after offline completed.
2014/02/09 08:07:37 VCS ERROR V-16-10001-11511 (node1) Volume:vol_v1:clean:Cannot find Volume v1, either the Volume is a Volume Set or Veritas Volume Manager configuration daemon (vxconfigd) is not in enabled mode
2014/02/09 08:07:38 VCS INFO V-16-2-13716 (node1) Resource(vol_v1): Output of the completed operation (clean)
==============================================
VxVM vxprint ERROR V-5-1-684 IPC failure: Configuration daemon is not accessible
==============================================
I noticed that the init script /etc/rc0.d/K99vxvm-shutdown does stop the vxconfigd and also does "/usr/sbin/vxdctl -k stop".
My question is do I need any vxvm init script since the upgrade from 5 to 6.1 is done and we have SMF service in place, or should I increase the timeouts of the VCS stop procedures?
Thanks a lot in advance!
Cheers
Solved! Go to Solution.
02-11-2014 09:53 AM
I only have access to a 5.1 system. 5.0 has limited support so there are not many customers still running this - see https://sort.symantec.com/eosl/show_matrix:
Product | Version | Platform | Country | End of Support Life Date |
End of Limited Support Date | Release Date |
---|---|---|---|---|---|---|
Storage Foundation for UNIX/Linux | 5.0 MP3 | Solaris | Worldwide | 2014-08-31 | 2012-06-07 | 2008-10-06 |
Solaris x86-64 | Worldwide | 2014-08-31 | 2012-06-07 | 2008-10-06 | ||
Linux | Worldwide | 2014-08-31 | 2012-06-07 | 2008-08-20 | ||
AIX | Worldwide | 2014-08-31 | 2012-06-07 | 2008-10-06 |
So it is quite likely that VCS 6.0 was not tested with vxvm 5.0 which could be why vxvm is shutting down before VCS or it could be you have some obsolete rc scripts.
As before, I would shut VCS down manually before rebooting system which is good practice in any case, even after you have upgraded vxvm to 6.0 - i.e if you are shutting down node A, then you should manually switch service groups to node B (to ensure they online successfully on node B before shutting down node A), or offline groups if you are shutting both nodes down.
Mike
02-10-2014 03:39 AM
Normally VCS is shutdown first and then Vxvm and so I would guess that maybe in 5.0, VCS and vxvm were both shutdown by rc scripts and in 6.1, both and shutdown by SMF service, so perhaps with the upgrade from 5 to 6, the vxvm rc script was not removed and so maybe vxvm is being shutdown by rc script before VCS is shutdown using SMF service.
What services do you have, if you have a SMF service for vxvm, then I would disable the vxvm rc script.
Mike
02-10-2014 04:45 AM
Hi Mike,
thanks a lot for you quick answer, we have as follows :
SMF
# svcs -a | egrep "vcs|gab|vxfen"
disabled Feb_09 svc:/system/vcs-onenode:default
online Feb_09 svc:/system/gab:default
online Feb_09 svc:/system/vxfen:default
online Feb_09 svc:/system/vcs:default
#
init scripts
# ls -al /etc/rc?.d/{S,K}*{vcs,llt,vxfen,gab,vx}* 2>/dev/null
-r-x------ 5 root sys 3539 Nov 11 2009 /etc/rc0.d/K00vxrsyncd
-r-xr-xr-x 5 root sys 2508 Nov 11 2009 /etc/rc0.d/K02vxnm-vxnetd
lrwxrwxrwx 1 root root 27 Aug 4 2010 /etc/rc0.d/K750vxpal.gridnode -> //etc/init.d/vxpal.gridnode
lrwxrwxrwx 1 root root 30 Aug 4 2010 /etc/rc0.d/K760vxpal.actionagent -> //etc/init.d/vxpal.actionagent
-r-xr-xr-x 2 root sys 3068 Nov 11 2009 /etc/rc0.d/K99vxvm-shutdown
-r-x------ 5 root sys 3539 Nov 11 2009 /etc/rc1.d/K00vxrsyncd
-r-xr-xr-x 5 root sys 2508 Nov 11 2009 /etc/rc1.d/K02vxnm-vxnetd
lrwxrwxrwx 1 root root 27 Aug 4 2010 /etc/rc1.d/K750vxpal.gridnode -> //etc/init.d/vxpal.gridnode
lrwxrwxrwx 1 root root 30 Aug 4 2010 /etc/rc1.d/K760vxpal.actionagent -> //etc/init.d/vxpal.actionagent
lrwxrwxrwx 1 root other 18 Aug 4 2010 /etc/rc2.d/K50vxvail -> /etc/init.d/vxvail
lrwxrwxrwx 1 root root 27 Aug 4 2010 /etc/rc2.d/K750vxpal.gridnode -> //etc/init.d/vxpal.gridnode
lrwxrwxrwx 1 root other 30 Aug 4 2010 /etc/rc2.d/K75vxpal.StorageAgent -> /etc/init.d/vxpal.StorageAgent
lrwxrwxrwx 1 root root 30 Aug 4 2010 /etc/rc2.d/K760vxpal.actionagent -> //etc/init.d/vxpal.actionagent
-rwxr-xr-x 1 root bin 2370 Nov 28 2007 /etc/rc2.d/K99vxatd
lrwxrwxrwx 1 root other 18 Aug 4 2010 /etc/rc2.d/S50vxvail -> /etc/init.d/vxvail
-rwxr-xr-x 1 root bin 2370 Nov 28 2007 /etc/rc2.d/S70vxatd
lrwxrwxrwx 1 root root 27 Aug 4 2010 /etc/rc2.d/S750vxpal.gridnode -> //etc/init.d/vxpal.gridnode
lrwxrwxrwx 1 root other 30 Aug 4 2010 /etc/rc2.d/S75vxpal.StorageAgent -> /etc/init.d/vxpal.StorageAgent
lrwxrwxrwx 1 root root 30 Aug 4 2010 /etc/rc2.d/S760vxpal.actionagent -> //etc/init.d/vxpal.actionagent
-r-x------ 2 root root 2441 Jul 19 2008 /etc/rc2.d/S92vxdcli
-r-xr-xr-x 5 root sys 2508 Nov 11 2009 /etc/rc2.d/S94vxnm-vxnetd
-r-x------ 5 root sys 3539 Nov 11 2009 /etc/rc2.d/S96vxrsyncd
-rwxr-xr-x 1 root sys 1310 May 6 2006 /etc/rc3.d/S99vxtf_chklog
-r-x------ 5 root sys 3539 Nov 11 2009 /etc/rcS.d/K00vxrsyncd
-r-xr-xr-x 5 root sys 2508 Nov 11 2009 /etc/rcS.d/K02vxnm-vxnetd
lrwxrwxrwx 1 root root 27 Aug 4 2010 /etc/rcS.d/K750vxpal.gridnode -> //etc/init.d/vxpal.gridnode
lrwxrwxrwx 1 root root 30 Aug 4 2010 /etc/rcS.d/K760vxpal.actionagent -> //etc/init.d/vxpal.actionagent
Thanks.
Cheers
02-10-2014 05:01 AM
Hi,
your description is quite unclear.
If you do a cluster switch (i.e. hagrp -switch command) no init script will be called at all.
As for the error, it means that the vxconfigd is either not running or not in enabled mode.
What is the current OS and SF version?
What was previous OS and SF version?
Was there a vxconfigd coredump generated?
Was vxconfigd killed or restarted?
Does vxconfigd start successfully after a reboot?
There should be errors in the syslog if vxconfigd was not able to start or core dumped.
regards,
Dan
02-10-2014 05:25 AM
Looks to me that you have vxvm as a service AND an rc script.
I have looked on a Solaris 10 server running 5.1 and "ls -al /etc/rc?.d/{S,K}*{vcs,llt,vxfen,gab,vx}*" does not match anything, so I think you should disable all these rc script and re-run your test.
You should also review your upgrade procedure - and if you believe you did this as per the product guides, and you have not manuall changed any rc scripts, then you should raise a support call to say that upgrade has not worked correctly.
I would have thought that when you removed 5.0 packages, it should have removed the rc scripts and it seems this happened for VCS, but not for vxvm.
Mike
02-10-2014 06:23 AM
Hi Mike,
vxvm is still 5.0 , but VCS is 6.1 :
# pkginfo -l VRTSvxvm
PKGINST: VRTSvxvm
NAME: Binaries for VERITAS Volume Manager by Symantec
CATEGORY: system
ARCH: sparc
VERSION: 5.0,REV=05.11.2006.17.55
BASEDIR: /
VENDOR: Symantec Corporation
DESC: Virtual Disk Subsystem
PSTAMP: Veritas-5.0_MP3_RP3.3:2009-11-12
INSTDATE: Aug 04 2010 14:40
HOTLINE: http://support.veritas.com/phonesup/phonesup_ddProduct_.htm
EMAIL: support@veritas.com
STATUS: completely installed
FILES: 995 installed pathnames
30 shared pathnames
13 linked files
112 directories
430 executables
400303 blocks used (approx)
#
# pkginfo -l VRTSvcs
PKGINST: VRTSvcs
NAME: Veritas Cluster Server by Symantec
CATEGORY: system
ARCH: sparc
VERSION: 6.0.100.000
BASEDIR: /
VENDOR: Symantec Corporation
DESC: Veritas Cluster Server by Symantec
PSTAMP: 6.0.300.000-GA-2013-01-17-16.00.00
INSTDATE: Apr 21 2013 10:15
STATUS: completely installed
FILES: 281 installed pathnames
26 shared pathnames
57 directories
116 executables
450978 blocks used (approx)
#
Thanks for your help.
Cheers
02-10-2014 07:30 AM
Hi Mike,
I beleive that you are right, if there are no S/K scripts for vx on Solaris 10 with 5.0 of VERITAS Volume Manager , which is exactly what I have, then I should mv the scripts away and try the test.
As mentioned, we always do the "init 6". To answer Dan - no, there are no coredums of vxconfigd and it always comes without problems after the reboot (reset from XSCF). Previous we had 5.1 VCS and upgraded to 6.0.1 with standard scripts from Veritas.
Thanks for you help.
Mike, did you ran the command in bash (as I did), cause otherwise it does not work:
# ksh
# ls -al /etc/rc?.d/{S,K}*{vcs,llt,vxfen,gab,vx}*
/etc/rc?.d/{S,K}*{vcs,llt,vxfen,gab,vx}*: No such file or directory
#
Thanks a lot guys.
Cheers
p.s.
# pkginfo -l VRTSvxvm | grep VERSION
VERSION: 5.0,REV=05.11.2006.17.55
#
# pkginfo -l VRTSvcs | grep VERSION
VERSION: 6.0.100.000
#
02-10-2014 12:19 PM
Yes shell was batch and also did an ls in the rc directory to check what was there, but this was on a 5.1 system, not 5.0.
I thought I had seen you had vxvm services, but looking now you only have vxvm rc scripts which may be normal for 5.0.
Note sure if VCS 6.0 supports Vxvm 5.0. I still think basic problem is that vxvm is shutting down before VCS, but I don't know how Solaris integrates ordering of rc scripts with services, so this is what you should look at and if you can't get Solaris to shutdown VCS first, then you will have to shut VCS down manually before rebooting systems, as I presume this is just a temporary configuration until you upgrade vxvm.
Mike
02-11-2014 12:38 AM
Hi Mike,
I also have SMF vxvm services :
# svcs -a | egrep 'vx|vcs|gab|llt'
legacy_run Jan_26 lrc:/etc/rc2_d/S70vxatd
legacy_run Jan_26 lrc:/etc/rc2_d/S750vxpal_gridnode
legacy_run Jan_26 lrc:/etc/rc2_d/S75vxpal_StorageAgent
legacy_run Jan_26 lrc:/etc/rc2_d/S760vxpal_actionagent
legacy_run Jan_26 lrc:/etc/rc2_d/S92vxdcli
legacy_run Jan_26 lrc:/etc/rc2_d/S94vxnm-vxnetd
legacy_run Jan_26 lrc:/etc/rc2_d/S96vxrsyncd
disabled Jan_26 svc:/system/vcs-onenode:default
online Jan_26 svc:/system/llt:default
online Jan_26 svc:/system/gab:default
online Jan_26 svc:/system/vxvm/vxvm-sysboot:default
online Jan_26 svc:/system/vxvm/vxvm-startup1:default
online Jan_26 svc:/system/vxvm/vxvm-startup2:default
online Jan_26 svc:/system/vxvm/vxvm-startvc:default
online Jan_26 svc:/system/vxvm/vxvm-reconfig:default
online Jan_26 svc:/system/vxpbx:default
online Jan_26 svc:/system/vxvm/vxvm-recover:default
online Jan_26 svc:/system/vxfen:default
online Jan_28 svc:/system/vcs:default
#
Do you have somewhere such combination, I would really like to know, do I need the rc scripts or not, to me, as you already confirmed, it looks like they are obsolete?
I have :
# pkginfo -l VRTSvxvm VRTSvcs | egrep 'NAME|VERS'
NAME: Veritas Cluster Server by Symantec
VERSION: 6.0.100.000
NAME: Binaries for VERITAS Volume Manager by Symantec
VERSION: 5.0,REV=05.11.2006.17.55
#
THANKS A LOT !
Cheers
02-11-2014 09:53 AM
I only have access to a 5.1 system. 5.0 has limited support so there are not many customers still running this - see https://sort.symantec.com/eosl/show_matrix:
Product | Version | Platform | Country | End of Support Life Date |
End of Limited Support Date | Release Date |
---|---|---|---|---|---|---|
Storage Foundation for UNIX/Linux | 5.0 MP3 | Solaris | Worldwide | 2014-08-31 | 2012-06-07 | 2008-10-06 |
Solaris x86-64 | Worldwide | 2014-08-31 | 2012-06-07 | 2008-10-06 | ||
Linux | Worldwide | 2014-08-31 | 2012-06-07 | 2008-08-20 | ||
AIX | Worldwide | 2014-08-31 | 2012-06-07 | 2008-10-06 |
So it is quite likely that VCS 6.0 was not tested with vxvm 5.0 which could be why vxvm is shutting down before VCS or it could be you have some obsolete rc scripts.
As before, I would shut VCS down manually before rebooting system which is good practice in any case, even after you have upgraded vxvm to 6.0 - i.e if you are shutting down node A, then you should manually switch service groups to node B (to ensure they online successfully on node B before shutting down node A), or offline groups if you are shutting both nodes down.
Mike
02-11-2014 11:54 PM
Hi Mike,
thanks, you confirmed what seamed to be the possible problem to me and I am going to take your advice.
Thanks a lot.
Cheers