cancel
Showing results for 
Search instead for 
Did you mean: 

vmd not starting on second cluster node

devfx
Level 3

Hello

I am having problem running Netbackup 7.5 on second node of the cluster, from logs it looks like vmd is not starting and VCS is restarting nbu_server resource. Second node was added to the one-node cluster, all files has been updated correctly main.cf, /etc/llthosts, /etc/gabtab etc, then Netbackup 7.5 was installed successfully on second node, during installation nbu_group was identified and second node was added by the installator to the service group. Switching from first node to the second node is working fine, but after two attempts of restarting nbu_server resource whole group is failed back to the first node where everything is working fine.

What I noticed is that second node is listed in EMM as media server, nbemmcmd -listhosts and nbemmcmd -getemmserver outputs are showing second node as media server type.

Can anyone please advise how to solve this issue? Why installator did not added second node to EMM as "master"? Why vmd is not starting? Could this be due to second node listed as media server not master? Can I remove second node from EMM and add it as master manually? Are there any other things that would need to be done after updating EMM?

Below are some logs from cluster, I dont have any logs from vmd daemon.

[23510] <4> Online::main: Starting Application completed with status 0.
[31993] <16> monitor:processStatus: Some Processes are DOWN while others are UP
[31993] <16> monitor:processStatus: Following Process are found DOWN: vmd
[31993] <16> monitor:processStatus: Following Process are found UP: nbevtmgr nbstserv bpnbjm nbaudit nbsl nbrmms nbemm nbrb NB_dbsrv
[32049] <4> Offline::main: Offline called with 2 Parameters
[32049] <4> Offline::main: Initializing NBCluster using /usr/openv/netbackup/bin/cluster
[32049] <4> NBClusterApp::stopApp: Executing Command : /usr/openv/netbackup/bin/bpclustel -timeout 120 15,TERM -verbose
[32050] <4> standard_shutdown:

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

devfx
Level 3

Hi all

The issue was resolved by running some SQL script from Symantec that cleaned wrong entries in database, after that second node of the cluster was added manually to DB by running nbemmcmd. NBU is running fine now on both nodes.

View solution in original post

10 REPLIES 10

Marianne
Level 6
Partner    VIP    Accredited Certified

If the 2nd node was added to VCS and cluster comms have been tested, the installation should have automatically added the 2nd node. 

Have you confirmed all pre-reqs before the installation? Such as requirement for rsh with no pasword between the nodes?

If no rsh access has been configured, the cluster config in NBU will fail.

You can use this TN to use ssh instead of rsh: http://www.symantec.com/docs/TECH160242 

It will probably be best to uninstall 2nd node and re-install.
Or else try to rerun /usr/openv/netbackup/bin/install_bp on 2nd node once you have tested and confirmed that rsh is working.

Michael_G_Ander
Level 6
Certified

/usr/openv/volmgr/debug/daemon and other volmgr logs might help

if possible try to start ltid wit the -v option on the second node

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

Deb_Wilmot
Level 6
Employee Accredited Certified

I've seen this a few times in the last month when the NBU_RSP file does not have both node names in the NODES line.

I.e.

NODES=node1 node2

 

Deb

devfx
Level 3

ssh was setup and connection tested before starting the installation, during which NBU_RSP was downloaded from first node using ssh, among other files. Both nodes are present in NBU_RSP file. Forgot to mention that installation on second node was performed while Netbackup was fully running on first node and nbu_group was frozen, could this be the reason for installator NOT adding second node to EMM? Is installator starting nbu_group on installed node and running some commands or is it running EMM commands remotely on first node of the cluster? Is reinstallation the only option here?

Deb_Wilmot
Level 6
Employee Accredited Certified

Reinstallation would be the best method (and probably least frustrating) .

IF everything appears ok, and you don't see node2 in the nbemmcmd -listhosts output, you could try adding the node (Note I don't have the exact syntax available - you might have to play around with this):

nbemmcmd -addhost -activenodename [node1] -clustername [cluster_name] -machinename [node2] -machinetype master -netbackupversion <<level>.<major level>.<minor level>.<hot fix>> -operatingsystem <hpux | linux | rs6000 | sgi | solaris | tru64 | windows>]
 

Hopefully that will work - if not, go head and rin nbemmcmd -addhost -help and experiement with the fields.

When done, nbemmcmd -listhosts should look like:


# nbemmcmd -listhosts
NBEMMCMD, Version: 7.6
The following hosts were found:
server           Virtual_name

cluster          Virtual_name

master           node1

master           node2

 

Other entries may exist.

 

If all else fails, try to reinstall that node based on the install and upgrade guides

 

Deb
 

Marianne
Level 6
Partner    VIP    Accredited Certified

I agree with Deb.

Uninstall, test rsh (as this is what the NBU installation is using) in both direction, unfreeze sg (as the installation needs to modify the service group), then reinstall.

Back in NBU 6.5 days, I managed to manually add a 2nd node to a one-node cluster.
See if anything in my experience helps: 

Add node to NetBackup Clustered master server

PS:

My experience has been that 'nbemmcmd -listhosts' will only display correct info after successful failover to 2nd node. Even if installation added 2nd node correctly.

My experience on 7.1 clustered installation in our lab some time ago:

Both nodes in cluster, active on node1. But look at this:

# /usr/openv/netbackup/bin/admincmd/nbemmcmd -listhosts

NBEMMCMD, Version:7.1
The following hosts were found:
server             nbumas
cluster            nbumas
master             mvdb-lnx1
Command completed successfully.

 

Shows only one node… Offline, online, still the same output.

Failover.

Only NOW is emm updated with 2nd node!

# /usr/openv/netbackup/bin/admincmd/nbemmcmd -listhosts
NBEMMCMD, Version:7.1
The following hosts were found:
server             nbumas
cluster            nbumas
master             mvdb-lnx1
master             mvdb-lnx2
Command completed successfully.

 

devfx
Level 3

Thanks all for your advise I will try to fix this and will let you know the result.

 

devfx
Level 3

Hi all

The issue was resolved by running some SQL script from Symantec that cleaned wrong entries in database, after that second node of the cluster was added manually to DB by running nbemmcmd. NBU is running fine now on both nodes.

Marianne
Level 6
Partner    VIP    Accredited Certified

You never tried to reinstall?

Their is no reason why 2nd node installation would fail if correct steps were followed...

devfx
Level 3

Correct upgrade steps were followed but looks like EMM DB had some bad entries, it was also growing very fast and we were running out of space, after all some additional defragmentation scripts were run to optimize it.