cancel
Showing results for 
Search instead for 
Did you mean: 

Netbackup 6.5.3 is not stable under VCS

Marcinn
Level 3

Hi all smiley,

 

as mentioned in subject, in the past we had this kind of issue once per month, but now once per week sad. Netbackup VCS group is going to PARTIAL|FAULTED state , application is running with executed jobs ( all are active ), but new jobs are not able to start from external scheduler ( status 25, cannot connect on socket ). To fix it ( or fast workaround.. ) we need to offline group , kill -9 stuck processes ( from bpps -a ) as bp.kill_all -FORCE is not able to determinate them , clear the group and put online again.

 

Master is Netbackup 6.5.3 running on Solaris 5.10 , VCS Veritas-5.0MP3RP4-05

 

12 REPLIES 12

Nicolai
Moderator
Moderator
Partner    VIP   

If your're Netbackup cluster switches or restarts unexpected, you can find the process causing the switch in :

/usr/openv/netbackup/bin/cluster/AGENT_DEBUG.log

A example:

# grep "Detected process offline" AGENT_DEBUG.log

02/23/10 14:55:091 Detected process offline:nbemm  monitor::main
02/23/10 15:00:331 Detected process offline:vmd  monitor::main
02/23/10 15:02:331 Detected process offline:nbemm  monitor::main
02/23/10 15:42:031 Detected process offline:vmd  monitor::main
02/23/10 15:44:031 Detected process offline:nbstserv  monitor::main

Maybe update to 6.5.6 is the best fix

Marcinn
Level 3

Hi Nicolai , thank you for opinion , in  AGENT_DEBUG.LOG i can see mostly nbpem process

 

Tue May  3 09:24:14 2011 vcs/Online.pl: Start Online.......

05/03/11 09:24:143 Common Online called online::main

05/03/11 09:24:141 Calling start command online::main

05/04/11 07:05:571 Detected process offline:nbpem  monitor::main

05/04/11 07:10:581 detected process online when in OFFLINE state: nbevtmgr nbstserv vmd bprd bpdbm nbjm nbemm nbrb NB_dbsrv  monitor::main

05/04/11 07:12:011 Detected process offline:nbpem  monitor::main

05/04/11 07:17:021 detected process online when in OFFLINE state: nbevtmgr nbstserv vmd bprd bpdbm nbjm nbemm nbrb NB_dbsrv  monitor::main

05/04/11 07:18:031 Detected process offline:nbpem  monitor::main

05/04/11 07:23:051 detected process online when in OFFLINE state: nbevtmgr nbstserv vmd bprd bpdbm nbjm nbemm nbrb NB_dbsrv  monitor::main

05/04/11 07:24:061 Detected process offline:nbpem  monitor::main

05/04/11 07:29:071 detected process online when in OFFLINE state: nbevtmgr nbstserv vmd bprd bpdbm nbjm nbemm nbrb NB_dbsrv  monitor::main

05/04/11 07:30:081 Detected process offline:nbpem  monitor::main

05/04/11 07:35:101 detected process online when in OFFLINE state: nbevtmgr nbstserv vmd bprd bpdbm nbjm nbemm nbrb NB_dbsrv  monitor::main

05/04/11 07:36:111 Detected process offline:nbpem  monitor::main

05/04/11 07:41:121 detected process online when in OFFLINE state: nbevtmgr nbstserv vmd bprd bpdbm nbjm nbemm nbrb NB_dbsrv  monitor::main

05/04/11 07:42:131 Detected process offline:nbpem  monitor::main

 

Wed May  4 07:54:33 2011 vcs/Online.pl: Start Online.......

05/04/11 07:54:333 Common Online called online::main

05/04/11 07:54:331 Calling start command online::main

05/04/11 11:11:041 Detected process offline:nbpem  monitor::main

05/04/11 11:16:051 detected process online when in OFFLINE state: nbevtmgr nbstserv vmd bprd bpdbm nbjm nbemm nbrb NB_dbsrv  monitor::main

05/04/11 11:17:061 Detected process offline:nbpem  monitor::main

05/04/11 11:22:081 detected process online when in OFFLINE state: nbevtmgr nbstserv vmd bprd bpdbm nbjm nbemm nbrb NB_dbsrv  monitor::main

05/04/11 11:23:081 Detected process offline:nbpem  monitor::main

05/04/11 11:28:101 detected process online when in OFFLINE state: nbevtmgr nbstserv vmd bprd bpdbm nbjm nbemm nbrb NB_dbsrv  monitor::main

05/04/11 11:29:101 Detected process offline:nbpem  monitor::main

05/04/11 11:34:121 detected process online when in OFFLINE state: nbevtmgr nbstserv vmd bprd bpdbm nbjm nbemm nbrb NB_dbsrv  monitor::main

05/04/11 11:35:131 Detected process offline:nbpem  monitor::main

05/04/11 11:40:141 detected process online when in OFFLINE state: nbevtmgr nbstserv vmd bprd bpdbm nbjm nbemm nbrb NB_dbsrv  monitor::main

05/04/11 11:41:161 Detected process offline:nbpem  monitor::main

 

Wed May  4 11:41:16 2011 clean.pl: Start clean.......

05/04/11 11:43:493 Common Offline : Stop command completed. offline::main

 

Wed May  4 11:43:50 2011 vcs/Online.pl: Start Online.......

05/04/11 11:43:503 Common Online called online::main

05/04/11 11:43:511 Calling start command online::main

 

Wed May  4 12:12:53 2011 offline.pl: Start Offline.......

05/04/11 12:16:241 Common offline: Stop command failed.  offline::main

Wed May  4 12:12:53 2011 offline.pl: offline exited with 99

 

Wed May  4 12:19:20 2011 vcs/Online.pl: Start Online.......

05/04/11 12:19:203 Common Online called online::main

05/04/11 12:19:201 Calling start command online::main

Anton_Panyushki
Level 6
Certified

Please consider upgrade to 6.5.6 and do install patched nbpem http://www.symantec.com/docs/TECH141606. There is memory leak indeed.

 

Marianne
Level 6
Partner    VIP    Accredited Certified

Seems the memory leak is INTRODUCED with 6.5.6!!

 

BUG REPORT: Nbpem memory leak with NetBackup 6.5.6.

Anton_Panyushki
Level 6
Certified

I meant that  they upgrade to 6.5.6, this patch must be applied to eliminate memory leak. It is quite logical.

Marianne
Level 6
Partner    VIP    Accredited Certified

Really? INTRODUCED in 6.5.6 does not by default mean that the memory leak exists prior to 6.5.6.... Well, not according to my logic! wink

 

I do agree that there seems to be a problem with nbpem. VCS is mere reacting to nbpem terminating for some or other reason...

We have seen customers experiencing problems with nbpem core dumping and leaving defunct processes all resolved since they've upgraded to 7.0.1.

watsons
Level 6

When it comes to Solaris10, the lesson I got from Support is always to "disable tcp_fusion" first:

http://www.symantec.com/docs/TECH62004

tcp_fusion appears to cause many issues with Netbackup...

Nicolai
Moderator
Moderator
Partner    VIP   

This is a issue we had at our site. The NBPEM does leaks memory, but releases it when it has a quite state. In our case NBPEM always had something to do, so it continued allocating memory until NBPEM reached 4GB and the core dumped.

Marcinn
Level 3

thank you all yes , I will try to check it with Symantec engineer wink

Anton_Panyushki
Level 6
Certified

I'm sorry, I was not precise enough while writting my previous posts. I wanted to say that they should install the patch AFTER the upgrade to 6.5.6. I didn't imply that the bug exists in 6.5.3. 6.5.3 is quite outdated version, so, in my opinion, they should upgrade and THEN install the nbpem patch to fix the memory leak bug found in 6.5.6. Sorry for messing you up.

KEELIN_HART_2
Level 4

The memory leak introduced with 6.5.6 has an EEB to fix -

http://www.symantec.com/docs/TECH141606

Nicolai
Moderator
Moderator
Partner    VIP   

Marcinn - It seems the Netbackup Policy Execution Manager is the culprit, not VCS

Do you see any core files on the master server (is core file creation enabled).

There are a number of EEB for NBPEM, I had nbpem version 6 when I ran 6.5.3. I sound like the best solution is to upgrade to 6.5.6. NBPEM is very stable in this version (see my other note when NBPEM leaks memory).