cancel
Showing results for 
Search instead for 
Did you mean: 

Duplication jobs between DD's fails at certain point

dinu4010
Level 3

hi eperts, 

i do have multiple duplication job runs between 2 DD's, one in my DC and another DD in remote site. i have multiple client machine data getting duplicated between this DD's using SLP with out any issues. but one client machine backup alone gives some trouble during duplication.

when duplication starts, it runs for several hours and copies data over 1TB and fails with 191. even now i have running duplication gives trouble when it runs duplication for this one specific client machine data. see below detailed status of the job and do the needful.

also attached a screenshot for refernce

07/06/2018 02:39:32 - requesting resource LCM_brsnsdd6300_1-stu
07/06/2018 02:39:34 - granted resource  LCM_brsnsdd6300_1-stu
07/06/2018 02:39:34 - started process RUNCMD (pid=9592)
07/06/2018 02:39:34 - ended process 0 (pid=9592)
07/06/2018 02:39:35 - begin Duplicate
07/06/2018 02:39:35 - requesting resource brsnsdd6300_1-stu
07/06/2018 02:39:35 - reserving resource @aaabm
07/06/2018 02:39:37 - resource @aaabm reserved
07/06/2018 02:39:37 - granted resource  MediaID=@aaabo;DiskVolume=BRS_NEW_DD;DiskPool=brsnsdd6300_1-dp;Path=BRS_NEW_DD;StorageServer=brsnsdd6300_1.na.ad.:smileyhappy:.com;MediaServer=mhlwabkpms02p.na.ad.:smileyhappy:.com
07/06/2018 02:39:37 - granted resource  brsnsdd6300_1-stu
07/06/2018 02:39:38 - Info Duplicate (pid=9592) Initiating optimized duplication from @aaabm to @aaabo
07/06/2018 02:39:38 - requesting resource @aaabm
07/06/2018 02:39:38 - granted resource  MediaID=@aaabm;DiskVolume=MHL_NEW_DD_1;DiskPool=mhlnsdd6300_1-DP;Path=MHL_NEW_DD_1;StorageServer=MHLNSDD6300_1.na.ad.:cathappy:.com;MediaServer=mhlwabkpms02p.na.ad.:cathappy:.com
07/06/2018 02:39:39 - Info bpduplicate (pid=9592) Suspend window close behavior is not supported for optimized duplications
07/06/2018 02:39:39 - Info bpduplicate (pid=9592) window close behavior: Continue processing the current image
07/06/2018 02:39:42 - Info bpdm (pid=26300) started
07/06/2018 02:39:42 - started process bpdm (pid=26300)
07/06/2018 02:39:44 - Info bpdm (pid=26300) requesting nbjm for media
07/06/2018 02:39:48 - begin writing
07/06/2018 05:49:30 - end writing; write time: 3:09:42
07/06/2018 05:49:31 - begin writing
07/06/2018 13:37:17 - end writing; write time: 7:47:46
07/06/2018 13:37:18 - begin writing
07/07/2018 01:12:54 - Critical bpdm (pid=26300) sts_copy_extent failed: error 2060046 plugin error
07/07/2018 01:12:54 - Critical bpdm (pid=26300) image copy failed: error 2060046: plugin error
07/07/2018 01:12:54 - Error bpdm (pid=26300) cannot copy image from disk, bytesCopied = 18446744073709551615
07/07/2018 01:12:54 - Critical bpdm (pid=26300) sts_get_image_prop failed: error 2060046: plugin error
07/07/2018 01:12:54 - Critical bpdm (pid=26300) Invalid storage device: BRS_NEW_DD file not found
07/07/2018 01:12:55 - Critical bpdm (pid=26300) Invalid image copy for mhlvassql01_1530835561_C3_F1_R1: error 2060018
07/07/2018 01:12:56 - Error bpduplicate (pid=9592) host mhlwabkpms02p.na.ad.crbard.com backup id mhlvassql01_1530835561 optimized duplication failed, media manager - system error occurred (174).
07/07/2018 01:12:57 - Error bpduplicate (pid=9592) Duplicate of backupid mhlvassql01_1530835561 failed, media manager - system error occurred (174).
07/07/2018 01:12:57 - Error bpduplicate (pid=9592) Status = no images were successfully processed.
07/07/2018 01:12:57 - end Duplicate; elapsed time 22:33:22
no images were successfully processed  (191)

 

7 REPLIES 7

Tape_Archived
Moderator
Moderator
   VIP   
  1. What are you DD types and Version?
  2. What is NetBackup Version?
  3. What kind of replication is set between two DD's at the DD level?? (File Based Replication or Snapshot)
  4. Are you both Diskpool set for optimized duplication?? Use nbdevquery command to find the details about the diskpool
  5. Is your DD1 set as Source and DD2 as Target?? optimized duplication will only work if this is set correctly.
  6. Usually Optimzed Duplication job completes in very short time, your job that is running for 23hrs and still duplicating that means it's actually transferring the data to DD2 via WAN tunnel & that's not efficient way.

 

hi, pls find below.

  1. What are you DD types and Version? both are DD6300 and runs with 6.0.2.0
  2. What is NetBackup Version? 7.7.3
  3. What kind of replication is set between two DD's at the DD level?? (File Based Replication or Snapshot) : File based (Mtree)
  4. Are you both Diskpool set for optimized duplication?? Use nbdevquery command to find the details about the diskpool : yes its flged for optimized image
  5. Is your DD1 set as Source and DD2 as Target?? optimized duplication will only work if this is set correctly. : Yes
  6. Usually Optimzed Duplication job completes in very short time, your job that is running for 23hrs and still duplicating that means it's actually transferring the data to DD2 via WAN tunnel & that's not efficient way. : Yes, its runs though WAN

Note : we have other backup images duplicating well and good between this 2 DD's. only few client machine backup images are not getting duplicated, also this images are little big in size. even this images do run for more than 20hrs and transfers over 1TB of data. but after sometimes, it ends up with an error as mentioned in first post.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Have you tried to check the DD logs for 7 July on the target Storage Server brsnsdd6300_1? 

It almost seems as if the remote storage stopped responding: 

Invalid storage device: BRS_NEW_DD file not found
Invalid image copy for mhlvassql01_1530835561_C3_F1_R1: error 2060018

All errors point to DD plugin error : 

sts_copy_extent failed: error 2060046 plugin error

 

 The image size mentioned in the job details seems excessive :

cannot copy image from disk, bytesCopied = 18446744073709551615


Is this really the size? Can you verify the image on the source? 
Is there any way that the backup can be broken up into smaller 'chuncks'?

Logging needed for troubleshooting at NBU end:
bptm and bpdm on source and destination media servers. I suggest level 3 logs. 

hi Marianne,

i dont find any abnormal error logs reported in my source or target DD. as other jobs over 1TB in size completes well at the same time, though it takes time because of WAN link.

i am not sure why the image size in detailed status show that big in size, the actual size of an image specific to that client machine is not more than 1.6TB. pls find below examples.

C:\Program Files\Veritas\NetBackup\bin\admincmd>bpimagelist.exe -backupid mhlvas
sql01_1530576498 -U
Backed Up Expires Files KB C Sched Type On Hold Polic
y
---------------- ---------- -------- -------- - --------------- ------- -----
-------
07/02/2018 20:08 07/23/2018 1335 1663914265 N Cumulative Incr 0 MHL
_VMware_Replicated_Daily_SQL

C:\Program Files\Veritas\NetBackup\bin\admincmd>bpimagelist.exe -backupid mhlvas
sql01_1530144532 -U
Backed Up Expires Files KB C Sched Type On Hold Polic
y
---------------- ---------- -------- -------- - --------------- ------- -----
-------
06/27/2018 20:08 07/18/2018 1804 1656929202 N Cumulative Incr 0 MHL
_VMware_Replicated_Daily_SQL

since we are covering VM level backup for specific client, hence breaking up to smaller chunks is not possible. 

as of now i could see the duplication job for this client is running over 45hrs in my activity monitor screen, i suspect it is running good, but not sure if it will get success or fail after some times later. i will keep tracking the same and post the details of it, if it fails.

Tape_Archived
Moderator
Moderator
   VIP   

Log on to you DD and run options (to display details) for command - ddboost file-replication (eg detailed-file-history) and see if you can find the details about the backup image is replicated or in progress or failed.

Can you also check Status of File base replication in console: Replication -> DD Boost -> File Replication; is there any pending file replication??  There may be current active jobs that may show pending but look specifically for the images that failing to duplicate.

hi,

upon checking in file replication tab of my target DD -> i could see all the backup image ID's are in success state, except for current running duplication jobs. no specific error listed in the tab which you mentioned. 

Tape_Archived
Moderator
Moderator
   VIP   

Check the Stats and performance of your file replication, you will get idea if it's working as expected. You can use the same command or GUI to check that details.

Wait for a while and check the DD and bpdm logs on media server if the duplication fails again, you should get more details.