Troubleshooting EMM database error (196)
Posible Quick Fix: Have notice that in many cases if I delete all the Media Servers entries from the bp.conf of the Media Server in question and only leave the Master, Master nodes (in case is cluster) and the actual media server name under the bp.conf SERVER list EMM command nbemmcmd -getemmserver normaly works, meaning the box is unable to see another media server in the environment, try this first if doesnt work go through the process below.
1. Tape Drive Troubleshoot
1.1 Check the output of this 3 commands, they must match Drive Serial Number, Drive Name and Path
vmoprcmd -h <media> -shmdrive | awk '{print $38,$39,$31}'
vmoprcmd -h <media> -autoconfig -t
vmglob -listall -b | grep -i <media server>
An alternate way to check for configDB and globDB discrepancies is with:
tpautoconf –report_disc
Any mismatch on this you need update the drives even with tpautoconf -replace_drive or tpconfig -update.
1.2 If they match, your next step is to ensure your library is responding, a simple robtest can confirm this and remember that robtest needs to be run from the Library Controller.
1.3. Library looks healthy, than we need to know if each drive is responding or pingable, use the following command to ping each drive:
From Master Server:
vmoprcmd -h <media server> -devconfig "-dev_ping -drive -path <PATH>"
From Media Server:
1.4 For any drive that is not responding you need to check the SAN side, ports, fabrics, zones, Fiber Wires and HBA's, depending of the HBA driver you can use lputil, emlxadm or powerpath for windows.
1.5 Confirm the OS can see the drives:
/usr/openv/volmgr/bin/scan
1.6 Sometimes is necessary to re-scan the HBA’s at OS level, try:
Solaris 10: devfsadm
AIX: cfgmgr
Windows: Go to Device Manager and Re-Scan.
2. Network Troubleshoot
2.1 If every drive is pingable, you need to confirm the media server is communicating properly with the Master and EMM Server
From the media server:
traceroute <Library Controller | EMM | Master server>
ping -s -i <nic> -p 1556 <Library Controller | EMM | Master server> 65000 4 -- (solaris)
telnet <EMM | Master> 1556
nslookup <Library Controller | EMM | Master server>
getent hosts <Library Controller | EMM | Master server> -- (solaris)
bpclntcmd -hn <Library Controller | EMM | Master server>
bpclntcmd -ip <Library Controller | EMM | Master server IP>
bpclntcmd -pn
bpclntcmd –self
tpautoconf -get_gdbhost
Check hosts file, must have:
127.0.0.1 localhost
Check vm.conf and lower the scan interval:
echo "SCAN_HOST_STATUS_INTERVAL = 120" >> /usr/openv/volmgr/vm.conf
From the EMM, Master Server and Library Controller:
traceroute <Media server>
ping -s -i <nic> -p 13782 <Media server> 65000 4 (for solaris)
ping -s -i <nic> -p 13724 <Media server> 65000 4 (for solaris)
nslookup <Media Server>
getent hosts <Media Server Name>
bpclntcmd -hn <Media Server>
bpclntcmd -ip <Media Server IP>
bpclntcmd -pn
bpclntcmd -self
2.2 Confirm at OS and Switch level that you have the proper Network setting for your NIC and Switch ports, looking for 10half/duplex configurations or even 100full/duplex, if you have a 1Gig NIC ensure is configured like this.
2.3 Run SAS tool and keep the log ready for Symantec, this will tell you if there is any overload at network layer between Master and Media, in many cases there are environments where the master is also a media server and clients use the Production NIC for backups overloading the throughput causing errors with Metadata transports between Media Server and EMM.
3. EMM Troubleshoot
3.1 If TCP/IP troubleshooting looks fine, check if the media server is able to query the EMM DB
From Media Server
nbemmcmd -getemmserver
nbdb_ping
NOTE: If there is something like EMM database error (196) or Database [NBDB] is not available, most of the time is a network issue between your media and the EMM server, look in the hosts files or bp.conf for a typo or a REQUIERED_INTERFACE that is unable to connect with the EMM DB.
3.2 Check the settings of the media server under the EMM DB
From EMM Server:
nbemmcmd -listhosts -verbose -display_server -machinename <media server> -machinetype media
The output here will show several lines, you are looking for:
MachineState = not active for tape drives and disk jobs (14)
If is something different like:
MachineState = not active for disk jobs (8)
than your media server is offline and unable to communicate with your EMM box, something that occasionally works is:
From master server:
vmoprcmd -activate_host -h <media server>
3.3 Win some time and enable VxUL verbosity in your media server to do some deep troubleshooting later.
From Media Server enable logging:
vxlogcfg -a -p 51216 -o Default -s DebugLevel=5 -s DiagnosticLevel=5
vxlogcfg -a -p 50936 -o Default -s DebugLevel=5 -s DiagnosticLevel=5
Create bptm, bpbrm and bpcd folders under /usr/openv/netbackup/logs
Ensure VERBOSE = 5 line is set under the media server bp.conf file.
3.4 Restart Netbackup and PBX on the Media Server
From Media Server:
bp.kill_all
/opt/VRTSpbx/bin/vxpbx_exchanged stop
bpps –x -- (Be sure no process is running)
/opt/VRTSpbx/bin/vxpbx_exchanged start
netbackup start
3.5 From the media server run nbemmcmd –getemmserver, if you still have EMM database error (196) or unable to connect with the EMM DB, you may need to check every SERVER entry under the bp.conf of the media server, if for any reason the media server cannot reach any of the listed boxes you will experience communication problems and EMM DB will be unresponsive. A good practice is:
cp /usr/openv/netbackup/bp.conf /usr/openv/netbackup/bp.conf.<date>
Remove every SERVER entry under the bp.conf leaving only the master and current media server, this will tell you if there is an issue with any other box that your media server cannot reach.
Test again EMM connection with:
nbemmcmd –getemmserver.
3.6 If at this point nothing of this have fix the issue you probably need to change or update the EMM DB:
From EMM Server:
nbemmcmd -updatehost -machinename <media server> -machinetype media -masterserver <master>
From the Media Server:
nbemmcmd -getemmserver
3.7 Still getting EMM Error 196?, try to delete and re-add the media server from the EMM DB
nbemmcmd -deletehost -machinename <media server> -machinetype media
nbemmcmd -addhost -machinename <media server> -machinetype media -masterserver <master server> -operatingsystem <OS>
3.8. Restarting the Master and EMM Server can help, if you are still having problems to reach your EMM box it will be necessary to review some VxUL logs.
3.10 Change the Scan host status interval
From Media Server
cat vm.conf
MM_SERVER_NAME = <media server name>
SCAN_HOST_STATUS_INTERVAL = 120
4. NBRB Cleanup
4.1 Even if EMM is not responding a good practice can be to cleanup the NBRB cache
Dump NBRB Tables and Cache for future troubleshooting purposes.
nbrbutil –dump > /tmp/nbrb.dump.<date>
nbrbutil –dumptables > /tmp/nbrb.tables.<date>
nbrbutil –resetMediaServer <mediaserver>
Run a second dump and grep for the media server, get all the allocation Keys and reset them all
nbrbutil –resetMDS <AllocationKey>
nbrbutil –releaseAllocHolds
The last resource to cleanup NBRB is restarting the EMM Server that most of the time is the Master.
5. Logs
5.1 List the Symantec products under your media server, normally you will see 51216 (NBU) and 50936 (PBX)
vxlogmgr –l
5.2 On step 3.3 you already changed the verbosity of the logs, just confirm they are set to 5
From Media Server:
vxlogcfg -l -p 51216 -o <originator>
vxlogcfg -l –o 103 –p 50936
NOTE: You don’t know the originators? Try this:
vxlogcfg –l –p 51216 | while read ORG
{
vxlogcfg –l –p 51216 –o $ORG
} > /tmp/org.list
This little script will post all the details of every Originator, if you know other way, please let me know :D
5.3 Generate the proper logs for all the Originators under the Media Server
To view unified log entries within the last time interval:
vxlogview -p 51216 -t 01:30:00 > nbu_all.log
To display all Error messages for the last 24 hours run:
vxlogview -p 51216 -L -E -t 01:30:00 > nbu_err.log
To view unified log entries for all Products in last hour:
vxlogview -a -t 01:30:00 -d all > products_all.log
6. BP.CONF Settings.
If your Master Server is clustered ensure both nodes and cluster name are on the top of your media server bp.conf
Always check your bp.conf settings, search for typos under the EMM server name or setting has
CLIENT_RESERVED_PORT_WINDOW = 1 1
RANDOM_PORTS = NO
CLIENT_PORT_WINDOW = 1024 1024
From here I will recommend to open a case with Symantec, having all this logs and every single step documented it can be of great help for Symantec support.
For any other step or something you think I miss, please post it in the Article comments, me and many other will appreciate it.