cancel
Showing results for 
Search instead for 
Did you mean: 

Master Server performance problem

cimo
Level 4

Hello!

need some help to understand a performance problem on my Master Server.

My Master Server manages more than 350k bacukp images and more than 1000 clients.

root@michelangelo #  /usr/openv/netbackup/bin/admincmd/bpimagelist -idonly -d "01/01/1970 00:00:00" | wc -l
  374437

Yesterday I had to restart it because schedule jobs didn't start for any client, the "top" command showed that CPU had 0% idle, allocated by 71 bpdbm process running. After the restart the schedule jobs started, but I investigated to identify the problem.

I noted that the Automatic jobs started for a particular Oracle client were frozen during the last step "Validating Image", after that all "Default-Application" jobs completed:

Info bpbrm(pid=18471) validating image for client

This is a particular client, having a huge Oracle DB and archivelog scheduled every 30 minutes.

For each bpbrm process in the media server there was a corrisponding bpdbm process in the Master Server looking for something on the backup image catalog. So I try to count the backup images  I realized that I more than 90k images!!

 

root@michelangelo # /usr/openv/netbackup/bin/admincmd/bpimagelist -client renetta.unix.t-systems.it -idonly -d "01/01/1970 00:00:00" | wc -l
  90918

So, I try to rename the client in the bp.conf and all its policies and I noted that the "validate image" ended in few seconds: the new client has few backup images.

 

Some usefull information on my Environment:

Master Server version: 7.1.0.4

Master Server OS: Solaris 10

Master Server RAM: 32gb

Master Server CPUs: 40

Media Server version: 7.1.0.4

 

The question is: how can I improve the backup image catalog performance? Is it possible that the image cleanup does not working fine? the backup images for that client should expire in 6 months. Have I to consider this problem during the upgrade to the 7.5.0.5 version that we are planning (with the backup image metadata migration)?

Thank you very much!

Simone

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

cimo
Level 4

Support close the case, no corruption was detected.

 

As regard the Oracle client backup, after substitute the client name with a new one backup was successful. in the meanwhile a long cleanup delete a long number of old backup image for that client.

 

Thank you.

View solution in original post

12 REPLIES 12

Nicolai
Moderator
Moderator
Partner    VIP   

Try runnign bpimage -cleanup -allclients

This will intiate the catalgo clean up process manual.

Nicolai
Moderator
Moderator
Partner    VIP   

See also "Catalog maintenance and performance optimization" in the Netbackup Admin manual vol 1:

http://www.symantec.com/docs/DOC5334

cimo
Level 4

Hello,

Thank you for your answer.

I noted that during "image cleanup" process there are a lot of error in bpdbm log file from the cited client renetta.unix.t-systems.it like this:

 

09:15:50.029 [16352] <16> executeSql: Error executing UPDATE DBM_MAIN.DBM_ImageChangeLog SET CreatedDateTime = current utc timestamp , OperationID = 3 WHERE (MasterServerKey = 1000002)
AND (BackupID = 'renetta.unix.t-systems.it_1364705777') AND (ClientType = 4) AND (OperationID = 2) (rc=100) ErrMsg , ErrCode 0, SqlState
09:15:50.031 [16352] <16> executeSql: Error executing UPDATE DBM_MAIN.DBM_ImageChangeLog SET CreatedDateTime = current utc timestamp  WHERE (MasterServerKey = 1000002) AND (BackupID = '
renetta.unix.t-systems.it_1364705777') AND (ClientType = 4) AND (OperationID = 3) (rc=100) ErrMsg , ErrCode 0, SqlState
09:15:50.337 [16352] <16> executeSql: Error executing UPDATE DBM_MAIN.DBM_ImageChangeLog SET CreatedDateTime = current utc timestamp , OperationID = 3 WHERE (MasterServerKey = 1000002)
AND (BackupID = 'renetta.unix.t-systems.it_1364705762') AND (ClientType = 4) AND (OperationID = 2) (rc=100) ErrMsg , ErrCode 0, SqlState
09:15:50.339 [16352] <16> executeSql: Error executing UPDATE DBM_MAIN.DBM_ImageChangeLog SET CreatedDateTime = current utc timestamp  WHERE (MasterServerKey = 1000002) AND (BackupID = '
renetta.unix.t-systems.it_1364705762') AND (ClientType = 4) AND (OperationID = 3) (rc=100) ErrMsg , ErrCode 0, SqlState

 

Maybe is there some kind of corruption?

Simone

revarooo
Level 6
Employee

You could try a bpdbm -consistency to check the integratory of your backups, otherwise consider upgrading.

mph999
Level 6
Employee Accredited

With that mant bpdbm processes all related to DB activityies for the same client, looks like there is some issue that needs addressing.

You are at 7.1.0.4 , so the header files are still header files (they get imported into NBDB at 7.5).  bpdbm -consistency 2 will check the images DB, the header files being part of this.

It won't (well at least I dont believe it does) check the NBDB ...  which is where these errors are coming from ...

09:15:50.337 [16352] <16> executeSql: Error executing UPDATE DBM_MAIN.DBM_ImageChangeLog SET CreatedDateTime = current utc timestamp , OperationID = 3 WHERE (MasterServerKey = 1000002)

AND (BackupID = 'renetta.unix.t-systems.it_1364705762') AND (ClientType = 4) AND (OperationID = 2) (rc=100) ErrMsg , ErrCode 0, SqlState

I'm not sure how vital DBM_MAIN.DBM_ImageChangeLog is - but are there other issues in the DB for this client that cause the hanging.

I can only suggest that it needs investigating.

I partly agree with revaroo, sometimes an upgrade to a later version is good - but as there are mbig DB changes in 7.5 I would not upgrade when there are potential DB errors.

Martin

revarooo
Level 6
Employee

Indeed. Ar these SQL errors a cause or a sypmtom of the bpdbm processes?

I'd recommend suspending scheduling, firing a few manual jobs and see if these messages continue. If they don't THEN consider upgrading.

cimo
Level 4

I will open a case to the support.

I will update you asap...

 

Thank you for the support.

Simone

 

cimo
Level 4

Hello!

The support is still analyzing the bpdbm log file...  In the meanwhile I tried the upgrade on my DR site, where I had the same errors in bpdbm log files.

Upgrade was successful, and that error disappear. I'm pretty sure to upgrade my production site.

Some further hint?

 

Thank you very much

Simone

huanglao2002
Level 6

Hi Cimi

Can you try to check oracle backup scripts,is it the backup command contain _%t  options? this option avoid large catalog search after oracle backup complete.

 

Oracle Backup Format

Ensure that the format specified for all RMAN backup piece names, except for autobackups of the control file, ends with a _%t as documented in the NetBackup for Oracle manual. Failure to add the timestamp results in a series of extended queries that can cause significant performance degredation. These Oracle best practices and others can be found in the article below:

http://www.symantec.com/docs/TECH49868

cimo
Level 4

It is correct:

 

BACKUP
    $BACKUP_TYPE
    FILESPERSET 1
    FORMAT 'bk-hot-$ORACLE_SID-$BCK_TYPE-$rman_time-s%s_p%p_t%t'
    database;

 

Thank you.

Simone

cimo
Level 4

Support close the case, no corruption was detected.

 

As regard the Oracle client backup, after substitute the client name with a new one backup was successful. in the meanwhile a long cleanup delete a long number of old backup image for that client.

 

Thank you.

Omar_Villa
Level 6
Employee
Try moving thr image the log complaints in another folder and run a catalog backup also what support did? Do they ran NBCC?