cancel
Showing results for 
Search instead for 
Did you mean: 

191 errors again after placing MP5

Fred2010
Level 6
Hi All,
 
I'm running Netbackup 6.0 on Windows 2003 R2 32 bits Enterprise server. Due to another problem recently, support has asked me to upgrade to MP5.
 
Since then, I am seeing 191 errors when my diskstaging images are being 'de-staged'.
 
The first group of messaged states the destaging went fine.
 
Unfortunately the second group of messages shows errors, because Backupid fsr0024_1188079270 copy 2 already exists  
    
Anybody have any idea why I am seeing these (Again? I think these were also present in MP3?!?)
 
Any help is appreciated !
 
Below the messages I received...
 
 
Details for a job handling image fsr0024_1188079270:
 
26/8/2007 6:16:18 AM - begin Duplicate
26/8/2007 6:16:18 AM - requesting resource app0101-hcart3-destaging
26/8/2007 6:16:18 AM - awaiting resource app0101-hcart3-destaging - will retry physical later
26/8/2007 7:11:54 AM - started process bptm (4424)
26/8/2007 7:11:52 AM - granted resource S00305
26/8/2007 7:11:52 AM - granted resource STK.T10000A.000.0.1.1.2
26/8/2007 7:11:52 AM - granted resource app0101-hcart3-destaging
26/8/2007 7:11:59 AM - started process bpdm (5644)
26/8/2007 7:12:01 AM - begin reading
26/8/2007 7:16:37 AM - end reading; read time: 00:04:36
26/8/2007 7:16:38 AM - begin reading
26/8/2007 7:21:12 AM - end reading; read time: 00:04:34
26/8/2007 7:21:13 AM - begin reading
26/8/2007 7:28:46 AM - end reading; read time: 00:07:33
26/8/2007 7:28:47 AM - begin reading
26/8/2007 7:33:03 AM - end reading; read time: 00:04:16
26/8/2007 7:33:04 AM - begin reading
26/8/2007 7:35:18 AM - end reading; read time: 00:02:14
26/8/2007 7:35:19 AM - begin reading
26/8/2007 7:36:48 AM - end reading; read time: 00:01:29
26/8/2007 7:36:49 AM - begin reading
26/8/2007 7:38:57 AM - end reading; read time: 00:02:08
26/8/2007 7:38:58 AM - begin reading
26/8/2007 7:42:13 AM - end reading; read time: 00:03:15
26/8/2007 7:42:14 AM - begin reading
26/8/2007 7:45:24 AM - end reading; read time: 00:03:10
26/8/2007 7:45:26 AM - begin reading
26/8/2007 7:51:07 AM - end reading; read time: 00:05:41
26/8/2007 7:51:08 AM - begin reading
26/8/2007 7:55:19 AM - end reading; read time: 00:04:11
26/8/2007 7:55:20 AM - begin reading
26/8/2007 8:00:41 AM - end reading; read time: 00:05:21
26/8/2007 8:00:42 AM - begin reading
26/8/2007 8:04:58 AM - end reading; read time: 00:04:16
26/8/2007 8:05:00 AM - begin reading
26/8/2007 8:07:18 AM - end reading; read time: 00:02:18
26/8/2007 8:07:20 AM - begin reading
26/8/2007 8:08:50 AM - end reading; read time: 00:01:30
26/8/2007 8:08:51 AM - begin reading
26/8/2007 8:10:08 AM - end reading; read time: 00:01:17
26/8/2007 8:10:09 AM - begin reading
26/8/2007 8:12:15 AM - end reading; read time: 00:02:06
26/8/2007 8:12:17 AM - begin reading
26/8/2007 8:15:28 AM - end reading; read time: 00:03:11
26/8/2007 8:15:30 AM - begin reading
26/8/2007 8:17:48 AM - end reading; read time: 00:02:18
26/8/2007 8:17:49 AM - begin reading
26/8/2007 8:21:15 AM - end reading; read time: 00:03:26
26/8/2007 8:21:16 AM - begin reading
26/8/2007 8:24:46 AM - end reading; read time: 00:03:30
26/8/2007 8:24:47 AM - begin reading
26/8/2007 8:28:28 AM - end reading; read time: 00:03:41
26/8/2007 8:28:29 AM - begin reading
26/8/2007 8:29:35 AM - end reading; read time: 00:01:06
26/8/2007 8:29:51 AM - end Duplicate; elapsed time: 02:13:33
the requested operation was successfully completed(0)

Details for a (later) job also handling image fsr0024_1188079270:

26/8/2007 7:16:14 AM - begin Duplicate
26/8/2007 7:16:15 AM - requesting resource app0101-hcart3-destaging
26/8/2007 7:16:15 AM - awaiting resource app0101-hcart3-destaging - will retry physical later
26/8/2007 8:49:07 AM - Error bpduplicate(pid=4588) Backupid fsr0024_1188079270 copy 2 already exists      
26/8/2007 8:49:08 AM - Error bpduplicate(pid=4588) Duplicate of backup id fsr0024_1188079270 failed, the entity already exists (226). 
26/8/2007 8:49:08 AM - Error bpduplicate(pid=4588) Status = no images were successfully processed.     
26/8/2007 8:49:06 AM - granted resource S00305
26/8/2007 8:49:06 AM - granted resource STK.T10000A.000.0.1.1.2
26/8/2007 8:49:06 AM - granted resource app0101-hcart3-destaging
26/8/2007 8:49:08 AM - end Duplicate; elapsed time: 01:32:54
no images were successfully processed(191)
 
Thanks!
 
Fredster
 
16 REPLIES 16

Anton_Panyushki
Level 6
Certified
It looks like you have got both vaulting and DSSU in your environment and both vaulting and destaging have similar destination. This happens when vaulting finishes prior to destaging. bpduplicate wants to destage image and then discovers that destination has already have a copy of image it wants to destage. 

Brad_Rodgers
Level 3

I found the same issues with de-staging after upgrading to MP5.  My setup is similar to yours (Win2K3-SP2, 6.0 Ent. MP5).  I’ve opened a case with support and it’s already been escalated (once they finally got back to me from my initial contact).  Apparently there is already an ETRACK on this issue.

 

What I found was that a de-staging job would start and determine what images to process.  Before MP5 the parent job (backup job that starts the duplications) would continue to run until all duplications were complete.  Now with MP5 the parent job completes after all duplications have started or are queued.  So if a second de-staging job is started either by schedule or manually it will try to determine which images to process.  If the first de-staging is not complete the second one will include images that were already selected in the first job but not complete. So now one of the images is setup to de-stage in both jobs.  It will complete for one of the jobs but throw a 191 error for the other one.  My logs are very similar to yours.

 

Seems to me like a pretty obvious bug that wasn’t tested very thoroughly.

 

Brad

 

Fred2010
Level 6
Hi Brad,
 
I've opened a case with support as well, and they did mention another person having logged a very similar case as well.
 
Wonder if they were referring to your case...
 
The case has been escalated to L3 as well and they are currently looking into the logs I've provided...
 
No real news thusfar...
 
Do you have the etrack number you mentioned for me, please?
 
I will keep you posted in this thread.
 
Fred

Fred2010
Level 6
Hi Brad,
 
We've received some new info from support regarding the 191 errors we get.
 
Maybe it's useful for you as well.
 
They have requested some new log files for final determination, but I will have to wait until a full backup run (Sunday) before they appear again.
 
Anyway, here's what I've received from them:
 
We have a theory of what's happening, but don't quite have all of the evidence to support it.  We think we are seeing a situation where DSSU duplication jobs are launching, and the parent job is going to Done, while the child jobs continue to process.  For those child jobs that are longer in duration, when another DSSU duplication job launches, they backupID's of the dup jobs still in progress are added to the new DSSU duplication job "bid file" or job list - thereby generating status 191's for those particular jobs.
 
Hope this helps :)
 
Fred

Brad_Rodgers
Level 3
Fred:
I haven't heard back from support at all (seems to be more and more common).  I was going to ask about the ETRACK number.
Their "theory" is exactly what is happening on my system and the description is almost exactly as I described it to them almost two weeks ago.  They told me that a change was made in the de-staging process to "kill" the parent job after a few minutes to take care of another issue.  Fix one thing .... break another.
 
Brad

Fred2010
Level 6
Hey Brad,
 
I just sent in the extra logs they required (It only happens during the full backup run on Sundays with me).
 
I will keep you posted on any news I might receive on my end, ok?
 
We'll get there ;)
 
With kind regards
 
Fred

Fred2010
Level 6
Hi Brad,
 
I've had some news from support:
 
They do confirm a new bug being introduced in MP5. They have made some suggestion to change the duplication schedule. Maybe you can try their workaround and see if it is a temporary solution for you as well?
 
I've asked for an ETA for the bug fix as well...
 
Anyway, this it what was said:
We are thinking this issue is indeed the same as we suspected.  We would like you to try something if it is possible.  We want you to change the Frequency of your DSSU duplication schedules.  Currently they run with a frequency of every 1 hour.  Can this be reduced to run every 2 hours or every three hours? 
We think this will greatly reduce the Status 191 events.  
This request is a temporary workaround, not a final solution.  This behavior is something which surfaced due to some scheduling changes introduced in 6.0 MP5.  We are still designing a way to resolve the problem within the code but the solution is not yet complete.
 
The root of the problem is that an imaged duplication which lasts longer then 1 hour - are considered eligible for duplication again when the scheduler launches 1 hour later because there is not yet a completed "copy 2'.  However, there is a lock file which is associated with the actively duplicating image which causes the next duplication job to terminate with Status 191.  We are thinking that expanding the duration between duplications will allow most duplication jobs to complete before the next dup job launches thereby completing the creation of "copy 2" and avoiding the problem.
Hope this helps and keep me posted :)
 
With kind regards,
 
Fred

Brad_Rodgers
Level 3

Fred:

I also had a call from a backline engineer Monday afternoon (9/10) and we had a long conversation about the problem.  I’m sure my issue is exactly the same as yours.  I did expand the frequency of the schedule on the DSU’s over the last week and this does help eliminate the 191 errors.  The engineer would not admit that is was a bug but  that it was operating as designed.  My problem is that if it’s not really an error then an error status shouldn’t be returned in activity monitor.

Although I can adjust my schedule for the DSU’s to eliminate some 191 errors, I can never predict exactly how long a duplication job will take some of mine run for 20 or more hours.    If the schedule frequency is increased to a very high value than I may miss or delay duplicating some images.

 I finally got him to admit that the behavior had changed in MP5 and this was a result of a change they made to fix another problem.  He said engineering would discuss this and make a determination if a fix would be made.

He also told me that this "feature" would also work the same in verson 6.5. Too bad, I was hoping to upgrade soon, maybe after the first MP release. 

Good Luck

Brad

 



Message Edited by Brad Rodgers on 09-13-2007 05:17 AM

Fred2010
Level 6
Hi Brad,
 
Strange, as I've actually received a patch for this "feature"...
 
The patch consists of a new file "bpbrmds.exe" which has solved all my 191 issues fully!
 
The patch is 'alpha'  stage and not thoroughly tested (An official fix is targeted for 6.0 MP6. It is currently due in November.)
 
If you are willing to try the patch as well, leave your email and I can send it to you...
 
With kind regards,
 
Fred
 
 

Brad_Rodgers
Level 3
Fred:
I got a patch (new bpbrmds.exe-ET1119211) on Wednesday also.  I've had it i in place over the weekend and it does look like it has solved the problems that I was having.  I guess we are the "de facto" testers for this patch.
 
Thanks.
 


Message Edited by Brad Rodgers on 10-01-2007 05:24 AM

Matt_Rademach1
Level 4
I am seeing these same errors, could you please send me that patch?
 
thanks,
Matt Rademacher
{Removed personal info}
 


Message Edited by John_B on 10-10-2007 07:37 AM

Daryl_Kinnaird
Level 4
Could you send me the patch as well.  I am having the same issue.
{Removed personal info}

Thanks


Message Edited by John_B on 10-10-2007 07:37 AM

OptimusPrime
Level 5


@Daryl Kinnaird wrote:
Could you send me the patch as well.  I am having the same issue.
{Removed personal info}

Thanks


Message Edited by John_B on 10-10-2007 07:37 AM

All, if you need to exchange info please use the forums messanger system.  I have removed your e-mail address to prevent spam.

Thank You

Stumpr2
Level 6
Please do NOT exchange netbackup binaries. All binaries should be gotten directly from symantec support. Open up a support ticket and request the binary. This helps symantec keep count of the number of customers that are effected by the MP5 and also allows them to track the installation history for your site. When you replace a binary in a NB 6.0 MP5 installation then you no longer are NB 6.0 MP5 and instead you become a NB 6.0 MP5/variation.
 
Also future upgrades to newer MP's may require for you to first remove the binary that you inserted into your environment. If you had opened up a support call to symantec then they will have the history in their database and can be more helpful in correcting problems that may or may not be introduced by upgrading to future MP's.
 
Please get your binaries directly from symantec!
 
 
 
 

Stafford
Level 4
To those that have experienced this, were you also having a problem canceling queued duplication jobs?  I'm experiencing the 191's like everyone else but also having a problem canceling anything in queue as well.  Does the new binary change the behavior so the parent jobs stay active or does it fix it another way?


Petros
Level 2

Hey Guys,

 

I recently upgraded to 6.5  and now.3 and i'm still having 191 errors.  See blow, anyone has any ideas that would?? i would greately appreciated.

 

Thanks Pete....

 

 

2/13/2009 7:21:23 PM - end reading; read time: 00:08:49
2/13/2009 7:46:11 PM - Error bpduplicate(pid=4516) socket read failed: errno = 10054 - An existing connection was forcibly closed by the remote host.
2/13/2009 7:46:11 PM - Error bpduplicate(pid=4516) host chi-nt-bkup04 backup id uschifetdbs02_1233875716 read process failed, file read failed (13).
2/13/2009 7:46:12 PM - Error bpduplicate(pid=4516) Duplicate of backupid uschifetdbs02_1233875716 failed, file read failed (13).   
2/13/2009 7:46:12 PM - Error bpduplicate(pid=4516) Status = no images were successfully processed.     
2/13/2009 7:46:13 PM - end Duplicate; elapsed time: 03:19:30