Highlighted

V-35-410: Cluster server not running on local node on solaris

Hello,

We had a hardware failure and on restarting the server we could not reach our mount points or even start the server with hastart but nothing was started and we keep getting the error in Title above.

Kindly assist in resolving this issue.

 

 

1 Solution

Accepted Solutions
Accepted Solution!

Hello looks like you have

Hello

looks like you have made your cluster to use IOFencing however fencing is not configured correctly ..

refer below logs

2013/11/28 11:11:46 VCS NOTICE V-16-1-52006 UseFence=SCSI3. Fencing is enabled
2013/11/28 11:11:46 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying...
2013/11/28 11:12:01 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying...

2013/11/28 11:12:16 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying..

 

See if your main.cf contains below line

UseFence = SCSI3

# cat /etc/VRTSvcs/conf/config/main.cf |grep -i usefence

 

Above line exists in main.cf, that means cluster is intended to use fencing which is not configured correctly.

IOFencing provided data protection from cluster split brain situations

 

Refer to VCS admin guide & see the article on how to configure IOFencing. If you do not intend to use IOFencing (which is not recommended), you can remove the entry from main.cf after stopping the cluster & start the cluster again.

 

Link to documentation

 

https://sort.symantec.com/documents

IOFencing link for VCS 5.1 on solaris

https://sort.symantec.com/public/documents/sf/5.1/solaris/html/vcs_admin/ch_admin_fencing.html#760094

 

G

View solution in original post

10 Replies
Highlighted

Hi, Is this a CFS server ?

Hi,

Is this a CFS server ? And what's the hardware failure exactly, since the local hard disk failure could lead to data loss which impact VCS configuration.

And pls paste content of below files:

/etc/VRTSvcs/conf/sysname

/etc/llttab

And  output of below commands:

lltstat -nvv active

gabconfig -a

Highlighted

Hello stinsong, Thanks for

Hello stinsong,

Thanks for your response.

The hardware failure caused a shared mount point to become unavailable, Yes it is a CFS server, currently we could see the filesystem  on one node but it is not coming up on the other node , when trying to start it up it gives a new error.

VCS ERROR V-16-1-10600 Cannot connect to VCS engine.

 
Highlighted

Hello, Could not connect to

Hello,

Could not connect to VCS engine means your "had" process has not started or not running.

For VCS to run, you need to ensure that components like LLT, GAB & Fencing (if configured) are running. Please paste the output of

# lltconfig

# lltstat -vvn | head -10

# gabconfig -a

# modinfo | egrep 'gab|llt|vxfen'

# had -version

# uname -a

 

when you say that nothing was started .. assuming its a unix system, are your rc scripts all OK ? i.e

/etc/rc2.d/S70llt

/etc/rc2.d/S92gab

/etc/rc3.d/S99vcs

If services are configured under SMF, are the SMF services in online state ?

 

G

Highlighted

Hello, This are the

Hello,

This are the results:

root@ap1.gf.net # lltconfig
LLT is running
root@ap1.gf.net # lltstat -vvn | head -10
LLT node information:
    Node                 State    Link  Status  Address
   * 0 ap1          OPEN    
                                  igb2   UP      00:21:28:BB:40:3C
                                  igb3   UP      00:21:28:BB:40:3D
     1 ap2          OPEN    
                                  igb2   UP      00:21:28:BB:0F:04
                                  igb3   UP      00:21:28:BB:0F:05
     2                   CONNWAIT
                                  igb2   DOWN    
 
root@ap1.gf.net # gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   68f501 membership 01
Port d gen   68f506 membership 01
 
root@ap1.gf.net # df -ah
Filesystem             size   used  avail capacity  Mounted on
rpool/ROOT/s10s_u9wos_14a
                       274G    26G   238G    10%    /
/devices                 0K     0K     0K     0%    /devices
ctfs                     0K     0K     0K     0%    /system/contract
proc                     0K     0K     0K     0%    /proc
mnttab                   0K     0K     0K     0%    /etc/mnttab
swap                    53G   504K    53G     1%    /etc/svc/volatile
objfs                    0K     0K     0K     0%    /system/object
sharefs                  0K     0K     0K     0%    /etc/dfs/sharetab
/platform/sun4v/lib/libc_psr/libc_psr_hwcap2.so.1
                       264G    26G   238G    10%    /platform/sun4v/lib/libc_psr.so.1
/platform/sun4v/lib/sparcv9/libc_psr/libc_psr_hwcap2.so.1
                       264G    26G   238G    10%    /platform/sun4v/lib/sparcv9/libc_psr.so.1
fd                       0K     0K     0K     0%    /dev/fd
swap                    53G    72K    53G     1%    /tmp
swap                    53G    72K    53G     1%    /var/run
swap                    53G     0K    53G     0%    /dev/vx/dmp
swap                    53G     0K    53G     0%    /dev/vx/rdmp
applprod1              150G    40G   109G    27%    /applprod1
applprod2               98G    39G    59G    40%    /applprod2
rpool/export           274G    23K   238G     1%    /export
rpool/export/home      274G   3.6G   238G     2%    /export/home
rpool                  274G    97K   238G     1%    /rpool
-hosts                   0K     0K     0K     0%    /net
auto_home                0K     0K     0K     0%    /home
ap1.gf.net:vold(pid2375)
                         0K     0K     0K     0%    /vol
/dev/odm                 0K     0K     0K     0%    /dev/odm
root@ap1.gf.net # cfsmount all
  Error: V-35-410: Cluster Server not running on local node: to
 
root@ap1.gf.net # modinfo | egrep 'gab|llt|vxfen'
234 7aaea000  2cf88 331   1  llt (LLT 5.1SP1)
235 7ab0e000  5a338 332   1  gab (GAB device 5.1SP1)
236 7ab4c000  6a0c8 333   1  vxfen (VRTS Fence 5.1SP1)
 
root@ap1.gf.net # had -version
Engine Version    5.1
Join Version      5.1.10.0
Build Date        Fri Oct 01 07:30:00 2010
PSTAMP            5.1.100.000-5.1SP1-2010-09-30_23.30.00
 
root@ap1.gf.net # uname -a
SunOS ap1.gf.net 5.10 Generic_147440-19 sun4v sparc sun4v
 
This scripts below do not exist in our server:

/etc/rc2.d/S70llt

/etc/rc2.d/S92gab

/etc/rc3.d/S99vcs

 

 

 

Highlighted

Well 'had' is not running for

Well 'had' is not running for some reason, but most everything else seems to be...

You may not see those 'rc'-scripts because llt, gab, and vcs may be under Solaris' SMF control on your system.  Check your SMF configuration and see when 'had' (vcs) should have been started.

What run-level is your system in?:

# who -r 

You may be in a run-level whereby SMF is not configured to run VCS ('had'), and then you would get an error like:  'Cluster Server not running on local node'

Either manually start VCS (via 'hastart') or transition your host to the appropriate run-level. 

-HTH

 

Highlighted

Hello, It is on run-level

Hello,

It is on run-level 3.

 

root@ap1.gf.net # who -r 
   .       run-level 3  Nov 27 15:52     3      0  3

 

Highlighted

Have you tried to 'hastart'

Have you tried to 'hastart' it yet?

Make sure to report back to us any error messages that go into VCS' message log (/opt/VRTSvcs/log/engine_A.log) and the Solaris messages log (/var/adm/messages) after you you ran 'hastart'...

Do you have a valid VCS license? -- if it has expired than you will get a message in those logs.

Run 'vxlicrep -s' and provide the output..., as well as the relevent output from the various messages files mentioned above...

-kjb

 

Highlighted

Also, make sure that "had

Also, make sure that "had -version" is same on both the nodes, above you have only pasted outputs from one node so can't confirm.

Also, as suggested above, try an hastart & let us know the output from engine_A.log

 

G

Accepted Solution!

Hello looks like you have

Hello

looks like you have made your cluster to use IOFencing however fencing is not configured correctly ..

refer below logs

2013/11/28 11:11:46 VCS NOTICE V-16-1-52006 UseFence=SCSI3. Fencing is enabled
2013/11/28 11:11:46 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying...
2013/11/28 11:12:01 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying...

2013/11/28 11:12:16 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying..

 

See if your main.cf contains below line

UseFence = SCSI3

# cat /etc/VRTSvcs/conf/config/main.cf |grep -i usefence

 

Above line exists in main.cf, that means cluster is intended to use fencing which is not configured correctly.

IOFencing provided data protection from cluster split brain situations

 

Refer to VCS admin guide & see the article on how to configure IOFencing. If you do not intend to use IOFencing (which is not recommended), you can remove the entry from main.cf after stopping the cluster & start the cluster again.

 

Link to documentation

 

https://sort.symantec.com/documents

IOFencing link for VCS 5.1 on solaris

https://sort.symantec.com/public/documents/sf/5.1/solaris/html/vcs_admin/ch_admin_fencing.html#760094

 

G

View solution in original post

Highlighted

Please check your SMF

Please check your SMF services for any issues.

#svcs -a|egrep 'vxfen|vcs|llt|gab'

online         Oct_17   svc:/system/vxfen:default
online         Oct_17   svc:/system/llt:default
online         Oct_17   svc:/system/gab:default
online         Oct_17   svc:/system/vcs:default
 
If any service is not online please check the reason why is it not online 
#svcs -xv vxfen
 
And you may check the SMF logs for more details.
 
#more /var/svc/log/system-vxfen\:default.log 
 
If fencing is not coming up you may configiure fencing with vxfenconfig command.
 
#vxfenadm -d
#vxfenconfig 
 
Thanks,
Venkat