Troubleshooting EMM Error 196

Omar_Villa · ‎01-02-2010

Troubleshooting EMM database error (196)

Posible Quick Fix: Have notice that in many cases if I delete all the Media Servers entries from the bp.conf of the Media Server in question and only leave the Master, Master nodes (in case is cluster) and the actual media server name under the bp.conf SERVER list EMM command nbemmcmd -getemmserver normaly works, meaning the box is unable to see another media server in the environment, try this first if doesnt work go through the process below.

1. Tape Drive Troubleshoot

1.1 Check the output of this 3 commands, they must match Drive Serial Number, Drive Name and Path

vmoprcmd -h <media> -shmdrive | awk '{print $38,$39,$31}'
vmoprcmd -h <media> -autoconfig -t
vmglob -listall -b | grep -i <media server>

An alternate way to check for configDB and globDB discrepancies is with:

tpautoconf –report_disc

Any mismatch on this you need update the drives even with tpautoconf -replace_drive or tpconfig -update.

1.2 If they match, your next step is to ensure your library is responding, a simple robtest can confirm this and remember that robtest needs to be run from the Library Controller.

1.3. Library looks healthy, than we need to know if each drive is responding or pingable, use the following command to ping each drive:

From Master Server:
vmoprcmd -h <media server> -devconfig "-dev_ping -drive -path <PATH>"

From Media Server:
1.4 For any drive that is not responding you need to check the SAN side, ports, fabrics, zones, Fiber Wires and HBA's, depending of the HBA driver you can use lputil, emlxadm or powerpath for windows.

1.5 Confirm the OS can see the drives:

/usr/openv/volmgr/bin/scan

1.6 Sometimes is necessary to re-scan the HBA’s at OS level, try:

Solaris 10: devfsadm

AIX: cfgmgr

Windows: Go to Device Manager and Re-Scan.

2. Network Troubleshoot

2.1 If every drive is pingable, you need to confirm the media server is communicating properly with the Master and EMM Server

From the media server:
traceroute <Library Controller | EMM | Master server>
ping -s -i <nic> -p 1556 <Library Controller | EMM | Master server> 65000 4 -- (solaris)

tpautoconf -get_gdbhost

Check hosts file, must have:

127.0.0.1 localhost

Check vm.conf and lower the scan interval:

echo "SCAN_HOST_STATUS_INTERVAL = 120" >> /usr/openv/volmgr/vm.conf

From the EMM, Master Server and Library Controller:
traceroute <Media server>
ping -s -i <nic> -p 13782 <Media server> 65000 4 (for solaris)
ping -s -i <nic> -p 13724 <Media server> 65000 4 (for solaris)
nslookup <Media Server>
getent hosts <Media Server Name>
bpclntcmd -hn <Media Server>
bpclntcmd -ip <Media Server IP>
bpclntcmd -pn
bpclntcmd -self

2.2 Confirm at OS and Switch level that you have the proper Network setting for your NIC and Switch ports, looking for 10half/duplex configurations or even 100full/duplex, if you have a 1Gig NIC ensure is configured like this.

2.3 Run SAS tool and keep the log ready for Symantec, this will tell you if there is any overload at network layer between Master and Media, in many cases there are environments where the master is also a media server and clients use the Production NIC for backups overloading the throughput causing errors with Metadata transports between Media Server and EMM.

3. EMM Troubleshoot

3.1 If TCP/IP troubleshooting looks fine, check if the media server is able to query the EMM DB

From Media Server

nbemmcmd -getemmserver
nbdb_ping

NOTE: If there is something like EMM database error (196) or Database [NBDB] is not available, most of the time is a network issue between your media and the EMM server, look in the hosts files or bp.conf for a typo or a REQUIERED_INTERFACE that is unable to connect with the EMM DB.

3.2 Check the settings of the media server under the EMM DB

From EMM Server:
nbemmcmd -listhosts -verbose -display_server -machinename <media server> -machinetype media

The output here will show several lines, you are looking for:

MachineState = not active for tape drives and disk jobs (14)

If is something different like:

MachineState = not active for disk jobs (8)

than your media server is offline and unable to communicate with your EMM box, something that occasionally works is:

From master server:
vmoprcmd -activate_host -h <media server>

3.3 Win some time and enable VxUL verbosity in your media server to do some deep troubleshooting later.

From Media Server enable logging:

vxlogcfg -a -p 51216 -o Default -s DebugLevel=5 -s DiagnosticLevel=5

vxlogcfg -a -p 50936 -o Default -s DebugLevel=5 -s DiagnosticLevel=5

Create bptm, bpbrm and bpcd folders under /usr/openv/netbackup/logs

Ensure VERBOSE = 5 line is set under the media server bp.conf file.

3.4 Restart Netbackup and PBX on the Media Server

From Media Server:

bp.kill_all

/opt/VRTSpbx/bin/vxpbx_exchanged stop

bpps –x -- (Be sure no process is running)

/opt/VRTSpbx/bin/vxpbx_exchanged start

netbackup start

3.5 From the media server run nbemmcmd –getemmserver, if you still have EMM database error (196) or unable to connect with the EMM DB, you may need to check every SERVER entry under the bp.conf of the media server, if for any reason the media server cannot reach any of the listed boxes you will experience communication problems and EMM DB will be unresponsive. A good practice is:

cp /usr/openv/netbackup/bp.conf /usr/openv/netbackup/bp.conf.<date>

Remove every SERVER entry under the bp.conf leaving only the master and current media server, this will tell you if there is an issue with any other box that your media server cannot reach.

Test again EMM connection with:

nbemmcmd –getemmserver.

3.6 If at this point nothing of this have fix the issue you probably need to change or update the EMM DB:

From EMM Server:
nbemmcmd -updatehost -machinename <media server> -machinetype media -masterserver <master>

From the Media Server:
nbemmcmd -getemmserver

3.7 Still getting EMM Error 196?, try to delete and re-add the media server from the EMM DB

nbemmcmd -deletehost -machinename <media server> -machinetype media
nbemmcmd -addhost -machinename <media server> -machinetype media -masterserver <master server> -operatingsystem <OS>

NOTE: To delete the host you will need to perform a decom process, follow http://seer.entsupport.symantec.com/docs/317931.htm

3.8. Restarting the Master and EMM Server can help, if you are still having problems to reach your EMM box it will be necessary to review some VxUL logs.

3.9 Follow the next technote: http://seer.entsupport.symantec.com/docs/304561.htm

3.10 Change the Scan host status interval

From Media Server

cat vm.conf

MM_SERVER_NAME = <media server name>

SCAN_HOST_STATUS_INTERVAL = 120

4. NBRB Cleanup

4.1 Even if EMM is not responding a good practice can be to cleanup the NBRB cache

Dump NBRB Tables and Cache for future troubleshooting purposes.

nbrbutil –dump > /tmp/nbrb.dump.<date>

nbrbutil –dumptables > /tmp/nbrb.tables.<date>

nbrbutil –resetMediaServer <mediaserver>

Run a second dump and grep for the media server, get all the allocation Keys and reset them all

nbrbutil –resetMDS <AllocationKey>

nbrbutil –releaseAllocHolds

The last resource to cleanup NBRB is restarting the EMM Server that most of the time is the Master.

5. Logs

5.1 List the Symantec products under your media server, normally you will see 51216 (NBU) and 50936 (PBX)

vxlogmgr –l

5.2 On step 3.3 you already changed the verbosity of the logs, just confirm they are set to 5

From Media Server:

vxlogcfg -l -p 51216 -o <originator>

vxlogcfg -l –o 103 –p 50936

NOTE: You don’t know the originators? Try this:

vxlogcfg –l –p 51216 | while read ORG

{

vxlogcfg –l –p 51216 –o $ORG

} > /tmp/org.list

This little script will post all the details of every Originator, if you know other way, please let me know :D

5.3 Generate the proper logs for all the Originators under the Media Server

To view unified log entries within the last time interval:

vxlogview -p 51216 -t 01:30:00 > nbu_all.log

To display all Error messages for the last 24 hours run:

vxlogview -p 51216 -L -E -t 01:30:00 > nbu_err.log

To view unified log entries for all Products in last hour:

vxlogview -a -t 01:30:00 -d all > products_all.log

6. BP.CONF Settings.

If your Master Server is clustered ensure both nodes and cluster name are on the top of your media server bp.conf

Always check your bp.conf settings, search for typos under the EMM server name or setting has

CLIENT_RESERVED_PORT_WINDOW = 1 1

RANDOM_PORTS = NO

CLIENT_PORT_WINDOW = 1024 1024

This limits the server to a single socket, generating EMM connection errors.

See More steps on http://www.squidoo.com/lensmaster/new_workshop/troubleshooting-emm-error-196

From here I will recommend to open a case with Symantec, having all this logs and every single step documented it can be of great help for Symantec support.

For any other step or something you think I miss, please post it in the Article comments, me and many other will appreciate it.

Omar_Villa · ‎01-02-2010

Please let me know if there is any step that I could posible miss.

Thanks.

Marianne · ‎11-23-2010

Great Stuff Omar!

Something that caught me yesterday (Solaris 10 master and media server):

I was totally unaware that a new media server had 2 NIC's.

We were given a hostname and IP address that was added to Master's hosts file. All forward and reverse lookup as well as bptestbpcd was successful.

We could scan devices using the device wizard, but when daemons on media server had to be restarted, we got status 196.

After hours of troubleshooting, Symantec engineer asked us to create admin log on media server.

After stop/start of daemons on media server, we found an unknown IP address in admin log.

I did 'ifconfig -a' on media server and found another NIC with an IP address on the same subnet as the primary IP. It turned out that IPMP config was done incorrectly.

What happened was: the master server connected to the media server on the primary IP. The media server chose the 2nd IP to connect back to the master. The master server was unable to resolve the IP to a valid SERVER name and rejected the connection.

All started working well when we disabled the 2nd NIC. We told the Sysadmin to fix up IPMP so that the 2nd NIC will be in Standby/Failover mode.

Zahid_Haseeb · ‎11-27-2010

gr8 stuff Omar

Possible · ‎08-13-2011

 Hi,

we are having master (6.5.4 / aix) master server and also with same version of media server with LINUX OS.

My concern is the bakup is failing with 23 for client  (Linux).

we have done the following :

1) Checked the host table and bp.conf file on client....It's OK.
2)All NBU services are running fine on client.
3) Created bpcd, bpbkar logs on client.
bpcd (it shows "can not resolve server by address.".....but all entries are fine).

4)The client and media server are not having any other DNS / gateway.
5)All routing entries are fine (Confirmed with Networking guys).
6)Used bptestbpcd which shows same log as mentioned above.
7)Tried all telnet commands (telnet 12782 and for vnetd)
8)Tried bpclntcmd -hn , -pn , -ip al are normal.

Please suggest.

Regards,
Giridhar

VOX

Troubleshooting EMM Error 196