AIR Replication Jobs are getting queued.

Question

&nbsp;
We are experiencing a huge AIR replication job queue over the last two weeks and also many active jobs are running very slow. Also observed few of the oldest replications are failing with error code 227.My questions are ...
&nbsp;
1.Is there any way we can improve the performance of the replication jobs considering the below environment settings. 
2.Is there any way we can control the number of active replication jobs running at any point of time ?
&nbsp;
Please help me...Thanks in advance.
&nbsp;
Environment Details:
Master / Media Server : Netbackup Appliance 5220 2.5.2&nbsp;
Netbackup Version : 7.5.0.5
Replication with AIR
&nbsp;
Replication Bandwidth limitation :
master1:/disk/etc/puredisk # cat agent.cfg | grep bandwidth
# A bandwidth limit, in KiB/sec.
bandwidthlimit=1280
&nbsp;
SLP parameters:
master1:/usr/openv/netbackup/db/config # cat LIFECYCLE_PARAMETERS&nbsp;
AUTO_CREATE_IMPORT_SLP = 1&nbsp;
MAX_GB_SIZE_PER_DUPLICATION_JOB = 100&nbsp;
MIN_GB_SIZE_PER_DUPLICATION_JOB = 25
&nbsp;
&nbsp;
&nbsp;
Replication failed with 227 (detailed log):
06/17/2013 01:11:48 - requesting resource LCM_stu_disk_master1
06/17/2013 01:11:48 - Info nbrb (pid=21872) Limit has been reached for the logical resource LCM_stu_disk_master1
07/10/2013 02:42:21 - granted resource &nbsp;LCM_stu_disk_master1
07/10/2013 02:42:23 - started process RUNCMD (pid=6070)
07/10/2013 02:42:24 - Info bpdm (pid=6101) started
07/10/2013 02:42:24 - started process bpdm (pid=6101)
07/10/2013 02:42:24 - requesting resource @aaaac
07/10/2013 02:42:24 - reserving resource @aaaac
07/10/2013 02:42:24 - resource @aaaac reserved
07/10/2013 02:42:24 - granted resource &nbsp;MediaID=@aaaac;DiskVolume=PureDiskVolume;DiskPool=dp_disk_master1;Path=PureDiskVolume;StorageServer=master1;MediaServer=master1
07/10/2013 02:44:42 - Info master1 (pid=6101) Using OpenStorage to replicate backup id Client1-db_1371393521, media id @aaaac, storage server master1, disk volume PureDiskVolume
07/10/2013 02:44:42 - Info master1 (pid=6101) Replicating images to target storage server hkx1bak03.apac.experian.local, disk volume PureDiskVolume
07/17/2013 11:19:46 - Info master1 (pid=6101) StorageServer=PureDisk:master1; Report=PDDO Stats for (master1): scanned: 24790571 KB, CR sent: 24838491 KB, CR sent over FC: 0 KB, dedup: 0.0%
07/17/2013 11:19:46 - Info bpdm (pid=6101) EXITING with status 0
07/17/2013 11:19:46 - Replicated backup id Client1-db_1371393521 successfully
07/17/2013 11:19:47 - Info bpdm (pid=3444) started
07/17/2013 11:19:47 - started process bpdm (pid=3444)
07/17/2013 11:19:47 - requesting resource @aaaac
07/17/2013 11:19:47 - granted resource &nbsp;MediaID=@aaaac;DiskVolume=PureDiskVolume;DiskPool=dp_disk_master1;Path=PureDiskVolume;StorageServer=master1;MediaServer=master1
07/17/2013 11:21:29 - Info master1 (pid=3444) Using OpenStorage to replicate backup id Client1-db_1371393625, media id @aaaac, storage server master1, disk volume PureDiskVolume
07/17/2013 11:21:30 - Info master1 (pid=3444) Replicating images to target storage server hkx1bak03.apac.experian.local, disk volume PureDiskVolume
07/19/2013 02:55:00 - Info master1 (pid=3444) StorageServer=PureDisk:master1; Report=PDDO Stats for (master1): scanned: 4 KB, CR sent: 1 KB, CR sent over FC: 0 KB, dedup: 75.0%
07/19/2013 02:55:00 - Info bpdm (pid=3444) EXITING with status 0
07/19/2013 02:55:00 - Error nbreplicate (pid=6070) Failed to update image copy state for BID Client1-db_1371393625, replica copy 102. &nbsp;EMM error code = 2020005. &nbsp;Replication WAS successful
no entity was found &nbsp;(227)
&nbsp;

mark_solutions · Accepted Answer

You would need to check for that image on both sites to see what its state is to determine if it is OK
You can also try nbstlutil pendimplist on the target site and nbstlutil replist on the "sending" site to see if the image is classed as complete
also the nbstlutil stlilist -backupid Client1-db_1371393625 to see what it says about the image in question on both sites
Finally make sure it does appear in the target site when doing a verify in the catalog and that it appears in the BAR GUI
&nbsp;

mark_solutions · Answer

If you throttle it too much everything can just hang
You usually find that if you cancel them all then when the SLP kicks back in&nbsp;they will start to fly through again for a while
Your bandwidth limit of 1200 may be too small for it to make use of - try increasing it (needs a service re-start) - the opinion seems to vary but 1200 equates to about 1 MB/s.
Either way ... once you have hit this state you need to cancel them all and let them fire off again to get any throughput out of the system (you can just cancel the active ones which usually lets the queued ones run though)
Hope this helps

tanmoy1 · Answer

&nbsp;
Thanks a lot Mark for your comments. We have a Symantec case opened for this issue .As per their investigation with the provided logs/command outputs, they confirmed that there are some replication jobs which makes the catalog busy which is the reason for this replication queue. They have suggested to apply&nbsp;SYMC_NBAPP_EEB_ET3105408-2.5.2.0-1.x86_64.rpm on the appliance. After installation of the eeb &amp; service restart the SLP replication jobs are started queuing once again. It will take some time for me to comment on the improvements. But still we are getting some Job failure with 227 which state the following&nbsp;
Error nbreplicate (pid=6070) Failed to update image copy state for BID Client1-db_1371393625, replica copy 102. &nbsp;EMM error code = 2020005. &nbsp;Replication WAS successful
no entity was found &nbsp;(227)
Can you please tell me what does it mean? Does the replication completed successfully for all the image IDs of its partial .
&nbsp;

Forum Discussion

AIR Replication Jobs are getting queued.

3 Replies

Related Content

Replication to multiple domains

Replication

Replicated image retention

Adding Replication Server

Duplication jobs are getting queued

Recent Discussions

NBU restore VMs from AIR_Gapped DR site into Main site

NetBackup Appliance 5.3.0.1 Maintenance Release 2 is now available

nbpxyhelper generating more logs at client end

Late Breaking News and Important Updates about NetBackup Flex Appliances Article

(VMware snapshot) failed with 25: SYM_VMC_FAILED_TO_CREATE_SNAPSHOT