05-29-2014 09:57 AM
Hello
I am having problem running Netbackup 7.5 on second node of the cluster, from logs it looks like vmd is not starting and VCS is restarting nbu_server resource. Second node was added to the one-node cluster, all files has been updated correctly main.cf, /etc/llthosts, /etc/gabtab etc, then Netbackup 7.5 was installed successfully on second node, during installation nbu_group was identified and second node was added by the installator to the service group. Switching from first node to the second node is working fine, but after two attempts of restarting nbu_server resource whole group is failed back to the first node where everything is working fine.
What I noticed is that second node is listed in EMM as media server, nbemmcmd -listhosts and nbemmcmd -getemmserver outputs are showing second node as media server type.
Can anyone please advise how to solve this issue? Why installator did not added second node to EMM as "master"? Why vmd is not starting? Could this be due to second node listed as media server not master? Can I remove second node from EMM and add it as master manually? Are there any other things that would need to be done after updating EMM?
Below are some logs from cluster, I dont have any logs from vmd daemon.
[23510] <4> Online::main: Starting Application completed with status 0.
[31993] <16> monitor:processStatus: Some Processes are DOWN while others are UP
[31993] <16> monitor:processStatus: Following Process are found DOWN: vmd
[31993] <16> monitor:processStatus: Following Process are found UP: nbevtmgr nbstserv bpnbjm nbaudit nbsl nbrmms nbemm nbrb NB_dbsrv
[32049] <4> Offline::main: Offline called with 2 Parameters
[32049] <4> Offline::main: Initializing NBCluster using /usr/openv/netbackup/bin/cluster
[32049] <4> NBClusterApp::stopApp: Executing Command : /usr/openv/netbackup/bin/bpclustel -timeout 120 15,TERM -verbose
[32050] <4> standard_shutdown:
Solved! Go to Solution.
08-04-2014 07:57 AM
Hi all
The issue was resolved by running some SQL script from Symantec that cleaned wrong entries in database, after that second node of the cluster was added manually to DB by running nbemmcmd. NBU is running fine now on both nodes.
05-29-2014 10:18 AM
If the 2nd node was added to VCS and cluster comms have been tested, the installation should have automatically added the 2nd node.
Have you confirmed all pre-reqs before the installation? Such as requirement for rsh with no pasword between the nodes?
If no rsh access has been configured, the cluster config in NBU will fail.
You can use this TN to use ssh instead of rsh: http://www.symantec.com/docs/TECH160242
It will probably be best to uninstall 2nd node and re-install.
Or else try to rerun /usr/openv/netbackup/bin/install_bp on 2nd node once you have tested and confirmed that rsh is working.
05-29-2014 10:22 AM
/usr/openv/volmgr/debug/daemon and other volmgr logs might help
if possible try to start ltid wit the -v option on the second node
05-29-2014 11:02 AM
I've seen this a few times in the last month when the NBU_RSP file does not have both node names in the NODES line.
I.e.
NODES=node1 node2
Deb
05-29-2014 11:33 AM
ssh was setup and connection tested before starting the installation, during which NBU_RSP was downloaded from first node using ssh, among other files. Both nodes are present in NBU_RSP file. Forgot to mention that installation on second node was performed while Netbackup was fully running on first node and nbu_group was frozen, could this be the reason for installator NOT adding second node to EMM? Is installator starting nbu_group on installed node and running some commands or is it running EMM commands remotely on first node of the cluster? Is reinstallation the only option here?
05-29-2014 01:32 PM
Reinstallation would be the best method (and probably least frustrating) .
IF everything appears ok, and you don't see node2 in the nbemmcmd -listhosts output, you could try adding the node (Note I don't have the exact syntax available - you might have to play around with this):
nbemmcmd -addhost -activenodename [node1] -clustername [cluster_name] -machinename [node2] -machinetype master -netbackupversion <<level>.<major level>.<minor level>.<hot fix>> -operatingsystem <hpux | linux | rs6000 | sgi | solaris | tru64 | windows>]
Hopefully that will work - if not, go head and rin nbemmcmd -addhost -help and experiement with the fields.
When done, nbemmcmd -listhosts should look like:
# nbemmcmd -listhosts
NBEMMCMD, Version: 7.6
The following hosts were found:
server Virtual_name
cluster Virtual_name
master node1
master node2
Other entries may exist.
If all else fails, try to reinstall that node based on the install and upgrade guides
Deb
05-29-2014 10:26 PM
I agree with Deb.
Uninstall, test rsh (as this is what the NBU installation is using) in both direction, unfreeze sg (as the installation needs to modify the service group), then reinstall.
Back in NBU 6.5 days, I managed to manually add a 2nd node to a one-node cluster.
See if anything in my experience helps:
Add node to NetBackup Clustered master server
PS:
My experience has been that 'nbemmcmd -listhosts' will only display correct info after successful failover to 2nd node. Even if installation added 2nd node correctly.
My experience on 7.1 clustered installation in our lab some time ago:
Both nodes in cluster, active on node1. But look at this:
# /usr/openv/netbackup/bin/admincmd/nbemmcmd -listhosts
NBEMMCMD, Version:7.1
The following hosts were found:
server nbumas
cluster nbumas
master mvdb-lnx1
Command completed successfully.
Shows only one node… Offline, online, still the same output.
Failover.
Only NOW is emm updated with 2nd node!
# /usr/openv/netbackup/bin/admincmd/nbemmcmd -listhosts
NBEMMCMD, Version:7.1
The following hosts were found:
server nbumas
cluster nbumas
master mvdb-lnx1
master mvdb-lnx2
Command completed successfully.
06-03-2014 12:16 AM
Thanks all for your advise I will try to fix this and will let you know the result.
08-04-2014 07:57 AM
Hi all
The issue was resolved by running some SQL script from Symantec that cleaned wrong entries in database, after that second node of the cluster was added manually to DB by running nbemmcmd. NBU is running fine now on both nodes.
08-04-2014 11:21 AM
You never tried to reinstall?
Their is no reason why 2nd node installation would fail if correct steps were followed...
08-20-2014 01:48 AM
Correct upgrade steps were followed but looks like EMM DB had some bad entries, it was also growing very fast and we were running out of space, after all some additional defragmentation scripts were run to optimize it.