cancel
Showing results for 
Search instead for 
Did you mean: 

catalog backup policy does not work

Kasra_Hashemi
Level 5

Hi ,

   Recently I found issue on my Netbackup cataloge backup policy , it takes too long to run and finally makes no progress .

here are the related settings if you want to know :

max job per client is : 10

1 tape library with 15 drives ( so I have 15 STUs )

all "policies storage" option are set to " any available " . ( I have 20 policies )

 

Netdigest
1 ACCEPTED SOLUTION

Accepted Solutions

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Please look at the suggestions in @Anshu_Pathak's post. 

Other than that, time to log a support call with Veritas. 

View solution in original post

12 REPLIES 12

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Firstly - a catalog backup cannot be multiplexed or multi-streamed.

How big is the images folder on your master server?
Are you backing up to tape directly attached to the master?

How many jobs do you see for the catalog backup?
There should be 4 - 
Parent job plus 3 child jobs. 
1st child job is the staging of the NBDB to a staging area (should go relative quick)
2nd child job backs up the staging area (also relatively quick)
3rd job backs up the images folder and other config files. (This job could take a long time, depending on size of images folder, read speed from disk and path-to-tape.)

How many jobs do you see? 
Can you please locate all of them? 
Copy the text in Details tab and post here (A lot easier if you post text, rather than screenshots).

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Oh! One more thing: 

1 tape library with 15 drives ( so I have 15 STUs )

If all drives are in the same robot, same type/density, attached to same media server, then you should have 1 (ONE) storage unit with number of drives set to 15. 

But 15 tape drives attached to a single server are way too many. 
I have not seen any media server that was capable of writing to more than 4 tape drives simultaneously at a constant speed of +-  100Mbytes/sec. 

With 15 tape drives, you should ideally have 4 or 5 media servers.

How many jobs do you see?  4 as you said one parent and 3 child jobs
Can you please locate all of them?  yes 

Feb 6, 2019 9:37:07 AM - Info nbjm (pid=4256) starting backup job (jobid=3456) for client co-itsrv10.int, policy CatalogBK, schedule Full
Feb 6, 2019 9:37:07 AM - Info nbjm (pid=4256) requesting CATALOG_BACKUP_RESOURCE resources from RB for backup job (jobid=3456, request id:{00A951DB-56AE-45C4-802A-DD17600CB09A})
Feb 6, 2019 9:37:07 AM - requesting resource co-itsrv10.int.NBU_CATALOG.MAXJOBS
Feb 6, 2019 9:37:07 AM - granted resource co-itsrv10.int.NBU_CATALOG.MAXJOBS
Feb 6, 2019 9:37:07 AM - estimated 0 kbytes needed
Feb 6, 2019 9:37:07 AM - begin Parent Job
Feb 6, 2019 9:37:07 AM - begin Catalog Backup: Start Notify Script
Feb 6, 2019 9:37:07 AM - Info RUNCMD (pid=6548) started
Feb 6, 2019 9:37:07 AM - Info RUNCMD (pid=6548) exiting with status: 0
Operation Status: 0
Feb 6, 2019 9:37:07 AM - end Catalog Backup: Start Notify Script; elapsed time 0:00:00
Feb 6, 2019 9:37:07 AM - begin Catalog Backup: Database Manager Query
Operation Status: 0
Feb 6, 2019 9:44:06 AM - end Catalog Backup: Database Manager Query; elapsed time 0:06:59
Feb 6, 2019 9:44:06 AM - begin Catalog Backup: Validate Image
Operation Status: 0
Feb 6, 2019 9:44:06 AM - end Catalog Backup: Validate Image; elapsed time 0:00:00
Feb 6, 2019 9:44:06 AM - begin Catalog Backup: End Notify Script
Feb 6, 2019 9:44:06 AM - Info RUNCMD (pid=7812) started
Feb 6, 2019 9:44:07 AM - Info RUNCMD (pid=7812) exiting with status: 0

-----------------

I have one HP EML with one robot and 15 drives .

please look at the two attached pictures.

policy.jpgSTUs.jpg

Netdigest

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Okay - that looks like the parent job. 

Can you please post the child jobs as well? 

(Will deal later with STU config.)

Feb 6, 2019 9:37:08 AM - Info bpdbm (pid=8160) staging relational database files for catalog backup
Feb 6, 2019 9:37:08 AM - Info bpdbm (pid=8160) staging NBAZDB backup to C:\Program Files\Veritas\NetBackupDB\staging
Feb 6, 2019 9:37:08 AM - Info bpdbm (pid=8160) done staging NBAZDB backup to C:\Program Files\Veritas\NetBackupDB\staging
Feb 6, 2019 9:37:08 AM - Info bpdbm (pid=8160) staging NBDB backup to C:\Program Files\Veritas\NetBackupDB\staging
Feb 6, 2019 9:37:12 AM - Info bpdbm (pid=8160) done staging NBDB backup to C:\Program Files\Veritas\NetBackupDB\staging
Feb 6, 2019 9:39:42 AM - Info bpdbm (pid=8160) validating NBAZDB backup in C:\Program Files\Veritas\NetBackupDB\staging
Feb 6, 2019 9:39:43 AM - Info bpdbm (pid=8160) done validating NBAZDB backup in C:\Program Files\Veritas\NetBackupDB\staging
Feb 6, 2019 9:39:43 AM - Info bpdbm (pid=8160) validating NBDB backup in C:\Program Files\Veritas\NetBackupDB\staging
Feb 6, 2019 9:39:46 AM - Info bpdbm (pid=8160) done validating NBDB backup in C:\Program Files\Veritas\NetBackupDB\staging
the requested operation was successfully completed (0)

-------------------------------------------------------------------------------------------------------------------------

Feb 6, 2019 9:38:04 AM - Info nbjm (pid=4256) starting backup job (jobid=3459) for client co-itsrv10.int..com, policy CatalogBK, schedule Full
Feb 6, 2019 9:38:04 AM - Info nbjm (pid=4256) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=3459, request id:{FC85DCDB-AF4E-4921-B067-DE5CAE3DE3E4})
Feb 6, 2019 9:38:04 AM - requesting resource Any
Feb 6, 2019 9:38:04 AM - requesting resource co-itsrv10.int..com.NBU_CLIENT.MAXJOBS.co-itsrv10.int..com
Feb 6, 2019 9:38:04 AM - requesting resource co-itsrv10.int..com.NBU_POLICY.MAXJOBS.CatalogBK
Feb 6, 2019 9:38:04 AM - granted resource co-itsrv10.int..com.NBU_CLIENT.MAXJOBS.co-itsrv10.int..com
Feb 6, 2019 9:38:04 AM - granted resource co-itsrv10.int..com.NBU_POLICY.MAXJOBS.CatalogBK
Feb 6, 2019 9:38:04 AM - granted resource PW8003
Feb 6, 2019 9:38:04 AM - granted resource HP.ULTRIUM4-SCSI.001
Feb 6, 2019 9:38:04 AM - granted resource Drive1
Feb 6, 2019 9:38:04 AM - estimated 0 kbytes needed
Feb 6, 2019 9:38:04 AM - Info nbjm (pid=4256) started backup (backupid=co-itsrv10.int..com_1549433284) job for client co-itsrv10.int..com, policy CatalogBK, schedule Full on storage unit Drive1
Feb 6, 2019 9:38:05 AM - Info bpbrm (pid=8540) co-itsrv10.int..com is the host to backup data from
Feb 6, 2019 9:38:05 AM - Info bpbrm (pid=8540) reading file list for client
Feb 6, 2019 9:38:05 AM - Info bpbrm (pid=8540) listening for client connection
Feb 6, 2019 9:38:05 AM - Info bpbrm (pid=8540) INF - Client read timeout = 300
Feb 6, 2019 9:38:05 AM - Info bpbrm (pid=8540) accepted connection from client
Feb 6, 2019 9:38:05 AM - Info bpbrm (pid=8540) start bpbkar on client
Feb 6, 2019 9:38:05 AM - started process bpbrm (pid=8540)
Feb 6, 2019 9:38:05 AM - connecting
Feb 6, 2019 9:38:05 AM - connected; connect time: 0:00:00
Feb 6, 2019 9:38:06 AM - Info dbclient (pid=5780) Backup started
Feb 6, 2019 9:38:06 AM - Info dbclient (pid=8160) Backup started
Feb 6, 2019 9:38:06 AM - Info dbclient (pid=8160) change time comparison:<disabled>
Feb 6, 2019 9:38:06 AM - Info dbclient (pid=8160) archive bit processing:<enabled>
Feb 6, 2019 9:38:06 AM - Info bptm (pid=7216) start
Feb 6, 2019 9:38:06 AM - Info bptm (pid=7216) using 65536 data buffer size
Feb 6, 2019 9:38:07 AM - Info bptm (pid=7216) setting receive network buffer to 263168 bytes
Feb 6, 2019 9:38:07 AM - Info bptm (pid=7216) using 30 data buffers
Feb 6, 2019 9:38:07 AM - Info bptm (pid=7216) start backup
Feb 6, 2019 9:38:07 AM - Info bptm (pid=7216) Waiting for mount of media id PW8003 (copy 1) on server co-itsrv10.int..com.
Feb 6, 2019 9:38:07 AM - mounting PW8003
Feb 6, 2019 9:39:00 AM - Info bptm (pid=7216) media id PW8003 mounted on drive index 1, drivepath {2,0,1,0}, drivename HP.ULTRIUM4-SCSI.001, copy 1
Feb 6, 2019 9:39:00 AM - mounted PW8003; mount time: 0:00:53
Feb 6, 2019 9:39:05 AM - Warning bptm (pid=7216) overwriting ANSI-format data on media id PW8003, allowed by host property settings
Feb 6, 2019 9:39:05 AM - positioning PW8003 to file 1
Feb 6, 2019 9:39:18 AM - positioned PW8003; position time: 0:00:13
Feb 6, 2019 9:39:18 AM - begin writing
Feb 6, 2019 9:39:22 AM - Info dbclient (pid=8160) bpbkar waited 97 times for empty buffer, delayed 97 times.
Feb 6, 2019 9:39:22 AM - Info dbclient (pid=8160) done. status: 0
Feb 6, 2019 9:39:22 AM - Info bptm (pid=7216) waited for full buffer 105 times, delayed 228 times
Feb 6, 2019 9:39:29 AM - Info bptm (pid=7216) EXITING with status 0 <----------
Feb 6, 2019 9:39:29 AM - Info bpbrm (pid=8540) validating image for client co-itsrv10.int..com
Feb 6, 2019 9:39:40 AM - Error bpbrm (pid=8540) cannot send mail to root on client co-itsrv10.int..com
Feb 6, 2019 9:39:40 AM - Info dbclient (pid=8160) done. status: 0: the requested operation was successfully completed
Feb 6, 2019 9:39:40 AM - end writing; write time: 0:00:22
the requested operation was successfully completed (0)

-----------------------------------------------------------------------------------------------------------------

Feb 6, 2019 9:39:52 AM - Info nbjm (pid=4256) starting backup job (jobid=3460) for client co-itsrv10.int..com, policy CatalogBK, schedule Full
Feb 6, 2019 9:39:52 AM - Info nbjm (pid=4256) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=3460, request id:{26F212BD-0899-4B36-9738-0B0BD7A4CB4F})
Feb 6, 2019 9:39:52 AM - requesting resource Any
Feb 6, 2019 9:39:52 AM - requesting resource co-itsrv10.int..com.NBU_CLIENT.MAXJOBS.co-itsrv10.int..com
Feb 6, 2019 9:39:52 AM - requesting resource co-itsrv10.int..com.NBU_POLICY.MAXJOBS.CatalogBK
Feb 6, 2019 9:39:52 AM - granted resource co-itsrv10.int..com.NBU_CLIENT.MAXJOBS.co-itsrv10.int..com
Feb 6, 2019 9:39:52 AM - granted resource co-itsrv10.int..com.NBU_POLICY.MAXJOBS.CatalogBK
Feb 6, 2019 9:39:52 AM - granted resource PW8002
Feb 6, 2019 9:39:52 AM - granted resource HP.ULTRIUM4-SCSI.005
Feb 6, 2019 9:39:52 AM - granted resource Drive1
Feb 6, 2019 9:39:52 AM - estimated 0 kbytes needed
Feb 6, 2019 9:39:52 AM - Info nbjm (pid=4256) started backup (backupid=co-itsrv10.int..com_1549433392) job for client co-itsrv10.int..com, policy CatalogBK, schedule Full on storage unit Drive1
Feb 6, 2019 9:39:53 AM - Info bpbrm (pid=8584) co-itsrv10.int..com is the host to backup data from
Feb 6, 2019 9:39:53 AM - Info bpbrm (pid=8584) reading file list for client
Feb 6, 2019 9:39:53 AM - Info bpbrm (pid=8584) starting bpbkar32 on client
Feb 6, 2019 9:39:53 AM - Info bpbkar32 (pid=2928) Backup started
Feb 6, 2019 9:39:53 AM - Info bpbkar32 (pid=2928) change time comparison:<disabled>
Feb 6, 2019 9:39:53 AM - Info bpbkar32 (pid=2928) archive bit processing:<enabled>
Feb 6, 2019 9:39:53 AM - Info bptm (pid=5408) start
Feb 6, 2019 9:39:53 AM - Info bptm (pid=5408) using 65536 data buffer size
Feb 6, 2019 9:39:53 AM - Info bptm (pid=5408) setting receive network buffer to 263168 bytes
Feb 6, 2019 9:39:53 AM - Info bptm (pid=5408) using 30 data buffers
Feb 6, 2019 9:39:53 AM - started process bpbrm (pid=8584)
Feb 6, 2019 9:39:53 AM - connecting
Feb 6, 2019 9:39:53 AM - connected; connect time: 0:00:00
Feb 6, 2019 9:39:54 AM - Info bptm (pid=5408) start backup
Feb 6, 2019 9:39:54 AM - Info bptm (pid=5408) Waiting for mount of media id PW8002 (copy 1) on server co-itsrv10.int..com.
Feb 6, 2019 9:39:54 AM - mounting PW8002
Feb 6, 2019 9:40:46 AM - Info bptm (pid=5408) media id PW8002 mounted on drive index 5, drivepath {2,0,15,0}, drivename HP.ULTRIUM4-SCSI.005, copy 1
Feb 6, 2019 9:40:46 AM - mounted PW8002; mount time: 0:00:52
Feb 6, 2019 9:40:52 AM - Warning bptm (pid=5408) overwriting ANSI-format data on media id PW8002, allowed by host property settings
Feb 6, 2019 9:40:52 AM - positioning PW8002 to file 1
Feb 6, 2019 9:41:06 AM - positioned PW8002; position time: 0:00:14
Feb 6, 2019 9:41:06 AM - begin writing
Feb 6, 2019 9:43:45 AM - Info bpbkar32 (pid=2928) bpbkar waited 487 times for empty buffer, delayed 563 times.
Feb 6, 2019 9:43:45 AM - Info bptm (pid=5408) waited for full buffer 7002 times, delayed 9452 times
Feb 6, 2019 9:44:02 AM - Info bptm (pid=5408) EXITING with status 0 <----------
Feb 6, 2019 9:44:03 AM - Info bpbrm (pid=8584) validating image for client co-itsrv10.int..com
Feb 6, 2019 9:44:05 AM - Error bpbrm (pid=8584) cannot send mail to root on client co-itsrv10.int..com
Feb 6, 2019 9:44:05 AM - Info bpbkar32 (pid=2928) done. status: 0: the requested operation was successfully completed
Feb 6, 2019 9:44:05 AM - end writing; write time: 0:02:59
the requested operation was successfully completed (0)

Netdigest

Parent job still tries to finish while all the child jobs are finished.(and for you information , When I cancel the parent , nothing happens. today  after I rebooted the server and I rerun catalog policy manually,  the child jobs were completed successfully but in whole last week the child jobs were stuck in " running" state, it seems that one restart solved half of my issue)

this issue is new in my environment and I have not seen this before .

it makes me nervous .

Netdigest

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Maybe a good idea to log a Support call with Veritas. 

There was a similar issue over here: https://vox.veritas.com/t5/NetBackup/Catalog-backup-stuck-NBU-7-7-3/td-p/827121  but the user stopped responding. 

About your STUs - I think you are missing the logic between physical devices and Storage Units. 

Under Devices, you will see the robot and tape drives that are configured in NBU.

A Storage unit is a collection of tape drives in the same robot, that are the same type/density, attached to the same media server. 

The 1st STU in your screenshot (co-itsrv10.int...... ) is correct - seems there are 16 drives. 

The rest of the STUs (Drive1 - Drive16) are uneccessary - you cannot pinpoint STU to specific tape drive. If you check the properties of these STUs, you will see that everything is the same - 1 drive in robot 0 with density hcart.

All of these uneccessary STUs puts additional uneccessary strain on the resource broker and EMM. 
Please get rid of them. 

thank you @Marianne , you are absolutly right , I've corrected my STU 's configuration as you recommended.

 

Netdigest

Anshu_Pathak
Level 5

@Kasra_Hashemi

Additional things that happen in catalog backup is DR file + drpackage creation -> writting it to NFS share or locally on another partition and optionally sending an email with these files as attachment. This can also cause delay or hung catalog backup jobs if there is an issue with filesystem, network filesystem access or email client.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

@Kasra_Hashemi

What has in the meantime happened to your Catalog backup?
Does the parent job complete successfully? 

nothing

unfortunately I am encountering same problem on some other policies too.

 

begin catalog backup: start notify script

info RUNCMD ( PID=8620) started

info RUNCMD ( PID=8620) exiting with status:0

 

Netdigest

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Please look at the suggestions in @Anshu_Pathak's post. 

Other than that, time to log a support call with Veritas.