cancel
Showing results for 
Search instead for 
Did you mean: 

Duplications failed with error code 191

Sisso
Level 5

Hello,

 

We are running Netbackup 7.5.0.6 on windows server 2008 R2 SP1.

All our policies backup to SLP. All the SLP in our environment is built up like the following exmaple:

First Operation: Backup - "Backup the DB/FS/Vmware.. to OST disks"
Second Operaion: Duplication - "run a duplication job that copy the backups to LTO"
Third Operation: Duplication - "run a duplication job that make 3rd copy to another OST disk in a different controller"

Most of the SLPs finish with status 0 but everyday we got some that failed with error code 191.
This is the Job status of the failed job:

12/22/2013 5:58:52 PM - requesting resource LCM_Robot2
12/22/2013 6:00:41 PM - granted resource LCM_Robot2
12/22/2013 6:00:42 PM - started process RUNCMD (8708)
12/22/2013 6:00:42 PM - ended process 0 (8708)
12/22/2013 6:00:42 PM - begin Duplicate
12/22/2013 6:00:47 PM - requesting resource Robot2
12/22/2013 6:00:47 PM - requesting resource @aaab2
12/22/2013 6:00:47 PM - reserving resource @aaab2
12/22/2013 6:05:45 PM - awaiting resource Robot2 Reason: Drives are in use, Media Server: Media7,
  Robot Number: 3, Robot Type: TLD, Media ID: N/A, Drive Name: N/A,
  Volume Pool: Duplicate, Storage Unit: Media7-Robot2, Drive Scan Host: N/A
 
12/22/2013 6:34:01 PM - awaiting resource @aaab2 Reason: Disk pool is unavailable, Media Server: N/A,
  Robot Number: NONE, Robot Type: NONE, Media ID: @aaab2, Drive Name: N/A,
  Volume Pool: N/A, Storage Unit: N/A, Drive Scan Host: N/A
 
12/22/2013 6:35:35 PM - awaiting resource Robot2 Reason: Drives are in use, Media Server: Media7,
  Robot Number: 3, Robot Type: TLD, Media ID: N/A, Drive Name: N/A,
  Volume Pool: Duplicate, Storage Unit: Media7-Robot2, Drive Scan Host: N/A
 
12/23/2013 12:04:08 AM - awaiting resource Robot2 - No drives are available
12/23/2013 12:14:20 AM - awaiting resource Robot2 Reason: Drives are in use, Media Server: Media7,
  Robot Number: 3, Robot Type: TLD, Media ID: N/A, Drive Name: N/A,
  Volume Pool: Duplicate, Storage Unit: Media7-Robot2, Drive Scan Host: N/A
 
12/23/2013 12:17:11 AM - awaiting resource Robot2 - No drives are available
12/23/2013 12:24:05 AM - awaiting resource Robot2 Reason: Drives are in use, Media Server: Media7,
  Robot Number: 3, Robot Type: TLD, Media ID: N/A, Drive Name: N/A,
  Volume Pool: Duplicate, Storage Unit: Media7-Robot2, Drive Scan Host: N/A
 
12/23/2013 12:26:55 AM - awaiting resource Robot2 - No drives are available
12/23/2013 12:40:49 AM - awaiting resource Robot2 Reason: Drives are in use, Media Server: Media7,
  Robot Number: 3, Robot Type: TLD, Media ID: N/A, Drive Name: N/A,
  Volume Pool: Duplicate, Storage Unit: Media7-Robot2, Drive Scan Host: N/A
 
12/23/2013 12:42:46 AM - awaiting resource Robot2 - No drives are available
12/23/2013 1:17:04 AM - awaiting resource Robot2 Reason: Drives are in use, Media Server: Media7,
  Robot Number: 3, Robot Type: TLD, Media ID: N/A, Drive Name: N/A,
  Volume Pool: Duplicate, Storage Unit: Media7-Robot2, Drive Scan Host: N/A
 
12/23/2013 3:06:43 AM - reserved resource @aaab2
12/23/2013 3:06:43 AM - granted resource B01602
12/23/2013 3:06:43 AM - granted resource QNT-0303
12/23/2013 3:06:43 AM - granted resource Media1-Robot2
12/23/2013 3:06:43 AM - granted resource MediaID=@aaab2;DiskVolume=DB2_Store_Backup;DiskPool=StoreOnce_DB2_Backups;Path=DB2_Store_Backup;Stor...
12/23/2013 3:06:46 AM - Info bptm(pid=9104) start           
12/23/2013 3:06:46 AM - started process bptm (9104)
12/23/2013 3:06:49 AM - Info bptm(pid=9104) start backup          
12/23/2013 3:06:51 AM - Info bpdm(pid=13748) started           
12/23/2013 3:06:51 AM - started process bpdm (13748)
12/23/2013 3:06:52 AM - Info bpdm(pid=13748) reading backup image         
12/23/2013 3:06:53 AM - Info bpdm(pid=13748) using 256 data buffers        
12/23/2013 3:06:53 AM - Info bpdm(pid=13748) requesting nbjm for media        
12/23/2013 3:06:54 AM - Info bptm(pid=9104) media id B01602 mounted on drive index 35, drivepath {1,0,43,0}, drivename QNT-0303, copy 2
12/23/2013 3:06:54 AM - Info bptm(pid=9104) INF - Waiting for positioning of media id B01602 on server Media1 for writing.
12/23/2013 3:06:58 AM - begin reading
12/23/2013 3:06:58 AM - Critical bpdm(pid=13748) sts_read_image failed: error 2060022 software error      
12/23/2013 3:06:58 AM - Critical bpdm(pid=13748) image read failed: error 2060022: software error     
12/23/2013 3:06:58 AM - Error bpdm(pid=13748) cannot read image from disk, Invalid argument     
12/23/2013 3:07:11 AM - Error bpduplicate(pid=8708) host Media1 backup id db2server1_1387527793 read failed, media read error (85). 
12/23/2013 3:07:11 AM - Error bpduplicate(pid=8708) host Media1 backupid db2server1_1387527793 write failed, termination requested by administrator (150). 
12/23/2013 3:07:11 AM - Error bpduplicate(pid=8708) db_IMAGE() failed: images are in process (1519)     
12/23/2013 3:07:11 AM - Error bpduplicate(pid=8708) Duplicate of backupid db2server1_1387527793 failed, termination requested by administrator (150).  
12/23/2013 3:07:11 AM - requesting resource @aaab2
12/23/2013 3:07:16 AM - granted resource MediaID=@aaab2;DiskVolume=DB2_Store_Backup;DiskPool=StoreOnce_DB2_Backups;Path=DB2_Store_Backup;Stor...
12/23/2013 3:07:17 AM - Info bptm(pid=12560) start           
12/23/2013 3:07:18 AM - started process bptm (12560)
12/23/2013 3:07:18 AM - requesting resource Media1-Robot2
12/23/2013 3:07:19 AM - granted resource B01602
12/23/2013 3:07:19 AM - granted resource QNT-0303
12/23/2013 3:07:19 AM - granted resource Media1-Robot2
12/23/2013 3:07:22 AM - Info bptm(pid=12560) start backup          
12/23/2013 3:07:24 AM - Info bpdm(pid=14620) started           
12/23/2013 3:07:24 AM - started process bpdm (14620)
12/23/2013 3:07:26 AM - Info bpdm(pid=14620) reading backup image         
12/23/2013 3:07:26 AM - Info bpdm(pid=14620) using 256 data buffers        
12/23/2013 3:07:26 AM - Info bpdm(pid=14620) requesting nbjm for media        
12/23/2013 3:07:26 AM - Info bptm(pid=12560) media id B01602 mounted on drive index 35, drivepath {1,0,43,0}, drivename QNT-0303, copy 2
12/23/2013 3:07:27 AM - Info bptm(pid=12560) INF - Waiting for positioning of media id B01602 on server Media1 for writing.
12/23/2013 3:07:29 AM - begin reading
12/23/2013 3:07:29 AM - Critical bpdm(pid=14620) sts_read_image failed: error 2060022 software error      
12/23/2013 3:07:30 AM - Critical bpdm(pid=14620) image read failed: error 2060022: software error     
12/23/2013 3:07:30 AM - Error bpdm(pid=14620) cannot read image from disk, Invalid argument     
12/23/2013 3:07:43 AM - Error bpduplicate(pid=8708) host Media1 backup id db2server1_1387528330 read failed, media read error (85). 
12/23/2013 3:07:43 AM - Error bpduplicate(pid=8708) host Media1 backupid db2server1_1387528330 write failed, termination requested by administrator (150). 
12/23/2013 3:07:44 AM - Error bpduplicate(pid=8708) db_IMAGE() failed: images are in process (1519)     
12/23/2013 3:07:44 AM - Error bpduplicate(pid=8708) Duplicate of backupid db2server1_1387528330 failed, termination requested by administrator (150).  
12/23/2013 3:07:44 AM - requesting resource @aaab2
12/23/2013 3:07:46 AM - granted resource MediaID=@aaab2;DiskVolume=DB2_Store_Backup;DiskPool=StoreOnce_DB2_Backups;Path=DB2_Store_Backup;Stor...
12/23/2013 3:07:48 AM - Info bptm(pid=13584) start           
12/23/2013 3:07:48 AM - started process bptm (13584)
12/23/2013 3:07:49 AM - requesting resource Media1-Robot2
12/23/2013 3:07:50 AM - granted resource B01602
12/23/2013 3:07:50 AM - granted resource QNT-0303
12/23/2013 3:07:50 AM - granted resource Media1-Robot2
12/23/2013 3:07:53 AM - Info bptm(pid=13584) start backup          
12/23/2013 3:07:55 AM - Info bpdm(pid=18304) started           
12/23/2013 3:07:55 AM - started process bpdm (18304)
12/23/2013 3:07:56 AM - Info bpdm(pid=18304) reading backup image         
12/23/2013 3:07:56 AM - Info bpdm(pid=18304) using 256 data buffers        
12/23/2013 3:07:56 AM - Info bpdm(pid=18304) requesting nbjm for media        
12/23/2013 3:07:57 AM - Info bptm(pid=13584) media id B01602 mounted on drive index 35, drivepath {1,0,43,0}, drivename QNT-0303, copy 2
12/23/2013 3:07:57 AM - Info bptm(pid=13584) INF - Waiting for positioning of media id B01602 on server Media1 for writing.
12/23/2013 3:07:59 AM - begin reading
12/23/2013 3:08:00 AM - Critical bpdm(pid=18304) sts_read_image failed: error 2060022 software error      
12/23/2013 3:08:00 AM - Critical bpdm(pid=18304) image read failed: error 2060022: software error     
12/23/2013 3:08:00 AM - Error bpdm(pid=18304) cannot read image from disk, Invalid argument     
12/23/2013 3:08:15 AM - Error bpduplicate(pid=8708) host Media1 backup id db2server1_1387529027 read failed, media read error (85). 
12/23/2013 3:08:15 AM - Error bpduplicate(pid=8708) host Media1 backupid db2server1_1387529027 write failed, termination requested by administrator (150). 
12/23/2013 3:08:15 AM - Error bpduplicate(pid=8708) db_IMAGE() failed: images are in process (1519)     
12/23/2013 3:08:15 AM - Error bpduplicate(pid=8708) Duplicate of backupid db2server1_1387529027 failed, termination requested by administrator (150).  
12/23/2013 3:08:15 AM - requesting resource @aaab2
12/23/2013 3:08:21 AM - granted resource MediaID=@aaab2;DiskVolume=DB2_Store_Backup;DiskPool=StoreOnce_DB2_Backups;Path=DB2_Store_Backup;Stor...
12/23/2013 3:08:23 AM - Info bptm(pid=18400) start           
12/23/2013 3:08:23 AM - started process bptm (18400)
12/23/2013 3:08:23 AM - requesting resource Media1-Robot2
12/23/2013 3:08:25 AM - granted resource B01602
12/23/2013 3:08:25 AM - granted resource QNT-0303
12/23/2013 3:08:25 AM - granted resource Media1-Robot2
12/23/2013 3:08:29 AM - Info bptm(pid=18400) start backup          
12/23/2013 3:08:30 AM - Info bpdm(pid=15492) started           
12/23/2013 3:08:30 AM - started process bpdm (15492)
12/23/2013 3:08:31 AM - Info bpdm(pid=15492) reading backup image         
12/23/2013 3:08:32 AM - Info bpdm(pid=15492) using 256 data buffers        
12/23/2013 3:08:32 AM - Info bpdm(pid=15492) requesting nbjm for media        
12/23/2013 3:08:33 AM - Info bptm(pid=18400) media id B01602 mounted on drive index 35, drivepath {1,0,43,0}, drivename QNT-0303, copy 2
12/23/2013 3:08:33 AM - Info bptm(pid=18400) INF - Waiting for positioning of media id B01602 on server Media1 for writing.
12/23/2013 3:08:34 AM - begin reading
12/23/2013 3:08:34 AM - Critical bpdm(pid=15492) sts_read_image failed: error 2060022 software error      
12/23/2013 3:08:35 AM - Critical bpdm(pid=15492) image read failed: error 2060022: software error     
12/23/2013 3:08:36 AM - Error bpdm(pid=15492) cannot read image from disk, Invalid argument     
12/23/2013 3:08:51 AM - Error bpduplicate(pid=8708) host Media1 backup id db2server2_1387529660 read failed, media read error (85). 
12/23/2013 3:08:51 AM - Error bpduplicate(pid=8708) host Media1 backupid db2server2_1387529660 write failed, termination requested by administrator (150). 
12/23/2013 3:08:51 AM - Error bpduplicate(pid=8708) db_IMAGE() failed: images are in process (1519)     
12/23/2013 3:08:51 AM - Error bpduplicate(pid=8708) Duplicate of backupid db2server2_1387529660 failed, termination requested by administrator (150).  
12/23/2013 3:08:51 AM - Error bpduplicate(pid=8708) Status = no images were successfully processed.     
no images were successfully processed(191)
12/23/2013 3:08:51 AM - end Duplicate; elapsed time: 09:08:09

5 REPLIES 5

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

12/23/2013 3:06:58 AM - Critical bpdm(pid=13748) sts_read_image failed: error 2060022 software error      
12/23/2013 3:06:58 AM - Critical bpdm(pid=13748) image read failed: error 2060022: software error     
12/23/2013 3:06:58 AM - Error bpdm(pid=13748) cannot read image from disk, Invalid argument  

 

what is the OST device that you are using...?

does the  Media Server: Media7  is the read server for the operation? if yes.. does it have access to the OST disk ?

Sisso
Level 5

Hi Nagalla,

 

I am using HP StoreOnce B6200 as an OST device.

The OST plugin is installed in Media7 and this media server have access to the OST disks

RonCaplinger
Level 6

Based on your snippet of the job output:

Your duplication from OST disk to tape is starting at 6:00pm, and it is allocating the backup image/OST storage device at that time.  But the duplication job is waiting for an available tape drive for 9 more hours and as soon as one is available, the job fails as soon as the tape is positioned and ready for writing. 

Based on my experience with OST images on our Data Domain, there is an internal timeout value on the OST device that basically says, "if the read operation does not request any data within 'X' seconds, consider the job abandoned and release the allocation."  This is a feature of the appliance that prevents things like cancelled or failed jobs from allocating and holding thread resources forever, causing slowdowns and poor performance.  This situation would not be seen by NetBackup until the backup job actually does attempt to read data, exactly after the destination STU was made available. 

If you have some duplications with this SLP and these storage units working fine, but some are not, I would check to see if it is only the ones waiting for a really, really long time for a tape drive.  If so, I would recommend reconfiguring your storage units, duplications, etc. to reduce or eliminate the long waits for resources.  Only as a last resort, check with the HP docs or their Tech Support to see how you can extend the OST timeout.  (On the Data Domain, it involved going into a "service tech" mode and making a change on the appliance.)  You really, really only want to make this change if you have no alternative, as this will cause problems with resource allocations when you really do have a lot of abandoned job threads.

Sisso
Level 5

Hi RonCaplinger,

Firstly, Thank you for advising.

Secondly, I dont know if this is the reason as you described. I got more duplications failed with status code 191 and it wasn't take many hours like happend in the job that I published here.

 

There is another Job Detais for more examples:

1/1/2014 3:29:17 PM - requesting resource LCM_Robot2
1/1/2014 3:29:27 PM - granted resource LCM_Robot2
1/1/2014 3:29:28 PM - begin Duplicate
1/1/2014 3:29:28 PM - started process RUNCMD (34528)
1/1/2014 3:29:28 PM - ended process 0 (34528)
1/1/2014 3:29:29 PM - requesting resource Robot2
1/1/2014 3:29:29 PM - requesting resource PG9547
1/1/2014 3:29:29 PM - reserving resource PG9547
1/1/2014 3:29:42 PM - awaiting resource Robot2 - No drives are available
1/1/2014 4:29:54 PM - awaiting resource Robot2 Reason: Drives are in use, Media Server: Media2,
  Robot Number: 3, Robot Type: TLD, Media ID: N/A, Drive Name: N/A,
  Volume Pool: Duplicate, Storage Unit: Media2-Robot2-lto5, Drive Scan Host: N/A
 
1/1/2014 4:32:31 PM - awaiting resource Robot2 - No drives are available
1/1/2014 4:37:33 PM - awaiting resource Robot2 Reason: Drives are in use, Media Server: Media2,
  Robot Number: 3, Robot Type: TLD, Media ID: N/A, Drive Name: N/A,
  Volume Pool: Duplicate, Storage Unit: Media2-Robot2-lto5, Drive Scan Host: N/A
 
1/1/2014 4:41:05 PM - awaiting resource Robot2 - No drives are available
1/1/2014 4:43:03 PM - reserved resource PG9547
1/1/2014 4:43:03 PM - granted resource B02286
1/1/2014 4:43:03 PM - granted resource QNT-0317
1/1/2014 4:43:03 PM - granted resource Media2-Robot2-lto5
1/1/2014 4:43:03 PM - granted resource PG9547
1/1/2014 4:43:03 PM - granted resource DLG-1131
1/1/2014 4:43:07 PM - Info bptm(pid=9160) start           
1/1/2014 4:43:07 PM - started process bptm (9160)
1/1/2014 4:43:09 PM - Info bptm(pid=9160) start backup          
1/1/2014 4:43:10 PM - Info bptm(pid=7500) start           
1/1/2014 4:43:10 PM - started process bptm (7500)
1/1/2014 4:43:10 PM - Info bpdm(pid=7500) reading backup image         
1/1/2014 4:43:11 PM - Info bptm(pid=9160) Waiting for mount of media id B02286 (copy 2) on server Media2.
1/1/2014 4:43:11 PM - Info bptm(pid=7500) using 16 data buffers        
1/1/2014 4:43:11 PM - started process bptm (9160)
1/1/2014 4:43:11 PM - mounting B02286
1/1/2014 4:43:11 PM - Info bptm(pid=7500) Waiting for mount of media id PG9547 (copy 1) on server Media2.
1/1/2014 4:43:11 PM - Info bptm(pid=9160) INF - Waiting for mount of media id B02286 on server Media2 for writing.
1/1/2014 4:43:11 PM - started process bptm (7500)
1/1/2014 4:43:11 PM - mounting PG9547
1/1/2014 4:43:11 PM - Info bptm(pid=7500) INF - Waiting for mount of media id PG9547 on server Media2 for reading.
1/1/2014 4:43:18 PM - mounted; mount time: 00:00:07
1/1/2014 4:43:18 PM - Info bptm(pid=7500) PG9547           
1/1/2014 4:43:18 PM - Info bptm(pid=7500) INF - Waiting for positioning of media id PG9547 on server Media2 for reading.
1/1/2014 4:43:18 PM - positioning PG9547 to file 5
1/1/2014 4:43:19 PM - positioned PG9547; position time: 00:00:01
1/1/2014 4:43:19 PM - begin reading
1/1/2014 5:11:33 PM - Error bptm(pid=9160) error requesting media, TpErrno = Robot operation failed    
1/1/2014 5:11:33 PM - Warning bptm(pid=9160) media id B02286 load operation reported an error    
1/1/2014 5:11:33 PM - current media B02286 complete, requesting next resource Any
1/1/2014 5:42:38 PM - Error nbjm(pid=34412) NBU status: 98, EMM status: The media server reported a system error
1/1/2014 5:42:39 PM - Info bptm(pid=9160) EXITING with status 98 <----------       
1/1/2014 5:42:45 PM - Error bpduplicate(pid=34528) host Media2 backup id mssqlserver01_1387890776 read failed, termination requested by administrator (150).
1/1/2014 5:42:45 PM - Error bpduplicate(pid=34528) host Media2 backupid mssqlserver01_1387890776 write failed, error requesting media (tpreq) (98). 
1/1/2014 5:42:45 PM - Error bpduplicate(pid=34528) db_IMAGE() failed: images are in process (1519)     
1/1/2014 5:42:45 PM - Error bpduplicate(pid=34528) Duplicate of backupid mssqlserver01_1387890776 failed, error requesting media (tpreq) (98).  
no images were successfully processed(191)
1/1/2014 5:42:45 PM - Error bpduplicate(pid=34528) Status = no images were successfully processed.     
1/1/2014 5:42:45 PM - end Duplicate; elapsed time: 02:13:17

 

 

RonCaplinger
Level 6

This job failure is not the same as the previous one, this one appears to be related to your tape drives:

1/1/2014 4:43:19 PM - begin reading

1/1/2014 5:11:33 PM - Error bptm(pid=9160) error requesting media, TpErrno = Robot operation failed

Based on the time between these two messages, this looks like a timeout due to a communication issue between the media server and the tape drive, possibly because the tape drive path defined on the media server is not valid anymore (if you reboot a media server and do not have persistent binding enabled on your SCSI card, the drives all come back online in a different order and some of the tape drives probably won't work until you delete them from the NetBackup GUI and let NetBackup rediscover them).  In any event, you seem to have more than one problem with your NetBackup setup.