EMM failed all of a sudden

Anton_Panyushki · ‎09-01-2008

Hello,

Master: NBU 6.0 MP6 running Solaris 9

The problem is that EMM daemon failed and then started few times during last weekend, whereas NBDB was up and running.

There are a number of status codes 252 in Activity Monitor and a messages in job status says that

NBEMM returned an extended error status: invalid error number (3000001)

I failed finding this msg in both Internet and Troubleshooting Guide.

I think it would be nice to have at least *.h files with error codes in NBU distro.

Message Edited by Anton Panyushkin on 09-01-2008 06:51 AM

dami · ‎09-01-2008

For starters EMM will log to the unified logs /usr/openv/logs (for unix) and can only be viewed with the vxlogview commands.

See for example: http://seer.entsupport.symantec.com/docs/280029.htm

Where commands such as vxlogview -i 111 -t 24:00:00 are described.

You may also need additional verbosity in these logs (the default I think is 1 or 0)

To see the current setting:

/usr/openv/netbackup/bin/vxlogcfg -p 51216 -o Default -l

To set to 6 (of course you will need to wait till the error happens again to see additional detail):

/usr/openv/netbackup/bin/vxlogcfg -a -p 51216 -o ALL -s DebugLevel=6 -s DiagnosticLevel=6

back to 0:

/usr/openv/netbackup/bin/vxlogcfg -a -p 51216 -o ALL -s DebugLevel=0 -s DiagnosticLevel=0

It's been my experience that these logs are fairly hard toi interpret and that Symantec (with their log searching tools) often can get to a root cause faster. I seem to recall loads of engineering binaries at 6.0 MP6.

sdo · ‎09-01-2008

EMM daemon will stop if free disk space falls below 1%.

Anton_Panyushki · ‎09-02-2008

Hello,

I have checked emm log and it says that emm first shut sown in a normal manner and then was brought online. I know that EMM was started by a fellow from duty admins team, but why EMM was shut down is a total puzzle for me.

Concerning free space, I checked df output and should say that each FS has a plenty of free blocks.

dami · ‎09-02-2008

Maybe increase the UNIFIED logging to verbose 5 or 6 (I think they are the same anyway) and leave it to run over the weekend.

NOTE though that these logs at this verbosity require a HUGE amount of space (especially if you have a busy NetBackup server). If you have to the order of hundreds of free GB on your /usr/openv/ partition (these write to /usr/openv/logs) then fine, otherwise it could cause you even more headaches.

Omar_Villa · ‎09-02-2008

stop netbackup

stop PBX

clean schedules

check ipc jobs and clean if are hang

start PBX

start netbackup

and let us know what it came up in the logs

regards

Anton_Panyushki · ‎09-03-2008

I suppose that you imply removing rm /usr/openv/netbackup/db/jobs/pempersist by clean schedules.

But I wonder what do you mean by removing IPC jobs?

V_Dinesh · ‎09-04-2008

hi ,

Try this command.....

This will start the EMM database seperately...

Command : installpath\programfiles\veritas\netbackup\bin\nbdb_admin -start

Anton_Panyushki · ‎09-04-2008

I have examined NBDB log and there were no any signs of DB going down. The problem is in emm daemon itself.

sdo · ‎09-05-2008

Anton, try Dami's suggestion of analyzing the VxUL logs again.

Here's a quick reference card to help with the parameters:

ftp://exftpp.symantec.com/pub/support/products/NetBackup_Enterprise_Server/287647.pdf

Anton_Panyushki · ‎09-07-2008

EMM has not expirienced any faults since then, so I suppose I can close this thread.

Thank you very much for your answers.

Status_Code_0 · ‎09-22-2008

I realize that this thread was closed, but I have the same issue with EMM doing down with an invalid number? Please let me know if you have any updates?

Thanks.

James

Manoj_Siricilla · ‎09-22-2008

Here are the steps that you need to do.

cat /usr/openv/db/log/server.log

and verify if you see any entries like this

I. 09/22 12:14:07. Database "NBDB" (NBDB.db) stopped at Mon Sep 22 2008 12:14
I. 09/22 12:14:07. Database "utility_db" (utility_db) stopped at Mon Sep 22 2008 12:14

These above lines indicate that the database has been stopped gracefully and if not examine the last 100 or 200 lines as to why and how the database has stopped.

Every time a checkpoint is taken it is updated to this log.

----- Next....

Run this command

/usr/openv/netbackup/bin/vxlogview -o nbemm -E -L -n 1 > nbemn_error.log

The above command dumps only the errors log generated for nbemm in the last 1 day.

Examine those logs and it should give you an idea on failures

also check the disk space where the emm is mounted on... if emm runs out of disk space it will log a critical message and not an error message and you can verify that as follows

/usr/openv/netbackup/bin/vxlogview -o nbemm -C -L -n 1 > nbemm_critc.log

Let me know the outcome, I will try and get you some more info.

Status_Code_0 · ‎09-22-2008

What if I am attempting the view the log file on a different server? I took the unified log off the master and moved it to another NBU machine. Is it the -K (hostname) switch?

Thanks.

VOX

EMM failed all of a sudden