Tanmoy1
12 years agoLevel 4
AIR Replication Jobs are getting queued.
We are experiencing a huge AIR replication job queue over the last two weeks and also many active jobs are running very slow. Also observed few of the oldest replications are failing with error code 227.My questions are ...
1.Is there any way we can improve the performance of the replication jobs considering the below environment settings.
2.Is there any way we can control the number of active replication jobs running at any point of time ?
Please help me...Thanks in advance.
Environment Details:
Master / Media Server : Netbackup Appliance 5220 2.5.2
Netbackup Version : 7.5.0.5
Replication with AIR
Replication Bandwidth limitation :
master1:/disk/etc/puredisk # cat agent.cfg | grep bandwidth
# A bandwidth limit, in KiB/sec.
bandwidthlimit=1280
SLP parameters:
master1:/usr/openv/netbackup/db/config # cat LIFECYCLE_PARAMETERS
AUTO_CREATE_IMPORT_SLP = 1
MAX_GB_SIZE_PER_DUPLICATION_JOB = 100
MIN_GB_SIZE_PER_DUPLICATION_JOB = 25
Replication failed with 227 (detailed log):
06/17/2013 01:11:48 - requesting resource LCM_stu_disk_master1
06/17/2013 01:11:48 - Info nbrb (pid=21872) Limit has been reached for the logical resource LCM_stu_disk_master1
07/10/2013 02:42:21 - granted resource LCM_stu_disk_master1
07/10/2013 02:42:23 - started process RUNCMD (pid=6070)
07/10/2013 02:42:24 - Info bpdm (pid=6101) started
07/10/2013 02:42:24 - started process bpdm (pid=6101)
07/10/2013 02:42:24 - requesting resource @aaaac
07/10/2013 02:42:24 - reserving resource @aaaac
07/10/2013 02:42:24 - resource @aaaac reserved
07/10/2013 02:42:24 - granted resource MediaID=@aaaac;DiskVolume=PureDiskVolume;DiskPool=dp_disk_master1;Path=PureDiskVolume;StorageServer=master1;MediaServer=master1
07/10/2013 02:44:42 - Info master1 (pid=6101) Using OpenStorage to replicate backup id Client1-db_1371393521, media id @aaaac, storage server master1, disk volume PureDiskVolume
07/10/2013 02:44:42 - Info master1 (pid=6101) Replicating images to target storage server hkx1bak03.apac.experian.local, disk volume PureDiskVolume
07/17/2013 11:19:46 - Info master1 (pid=6101) StorageServer=PureDisk:master1; Report=PDDO Stats for (master1): scanned: 24790571 KB, CR sent: 24838491 KB, CR sent over FC: 0 KB, dedup: 0.0%
07/17/2013 11:19:46 - Info bpdm (pid=6101) EXITING with status 0
07/17/2013 11:19:46 - Replicated backup id Client1-db_1371393521 successfully
07/17/2013 11:19:47 - Info bpdm (pid=3444) started
07/17/2013 11:19:47 - started process bpdm (pid=3444)
07/17/2013 11:19:47 - requesting resource @aaaac
07/17/2013 11:19:47 - granted resource MediaID=@aaaac;DiskVolume=PureDiskVolume;DiskPool=dp_disk_master1;Path=PureDiskVolume;StorageServer=master1;MediaServer=master1
07/17/2013 11:21:29 - Info master1 (pid=3444) Using OpenStorage to replicate backup id Client1-db_1371393625, media id @aaaac, storage server master1, disk volume PureDiskVolume
07/17/2013 11:21:30 - Info master1 (pid=3444) Replicating images to target storage server hkx1bak03.apac.experian.local, disk volume PureDiskVolume
07/19/2013 02:55:00 - Info master1 (pid=3444) StorageServer=PureDisk:master1; Report=PDDO Stats for (master1): scanned: 4 KB, CR sent: 1 KB, CR sent over FC: 0 KB, dedup: 75.0%
07/19/2013 02:55:00 - Info bpdm (pid=3444) EXITING with status 0
07/19/2013 02:55:00 - Error nbreplicate (pid=6070) Failed to update image copy state for BID Client1-db_1371393625, replica copy 102. EMM error code = 2020005. Replication WAS successful
no entity was found (227)
You would need to check for that image on both sites to see what its state is to determine if it is OK
You can also try nbstlutil pendimplist on the target site and nbstlutil replist on the "sending" site to see if the image is classed as complete
also the nbstlutil stlilist -backupid Client1-db_1371393625 to see what it says about the image in question on both sites
Finally make sure it does appear in the target site when doing a verify in the catalog and that it appears in the BAR GUI