08-26-2007 11:15 AM
26/8/2007 6:16:18 AM - begin Duplicate
26/8/2007 6:16:18 AM - requesting resource app0101-hcart3-destaging
26/8/2007 6:16:18 AM - awaiting resource app0101-hcart3-destaging - will retry physical later
26/8/2007 7:11:54 AM - started process bptm (4424)
26/8/2007 7:11:52 AM - granted resource S00305
26/8/2007 7:11:52 AM - granted resource STK.T10000A.000.0.1.1.2
26/8/2007 7:11:52 AM - granted resource app0101-hcart3-destaging
26/8/2007 7:11:59 AM - started process bpdm (5644)
26/8/2007 7:12:01 AM - begin reading
26/8/2007 7:16:37 AM - end reading; read time: 00:04:36
26/8/2007 7:16:38 AM - begin reading
26/8/2007 7:21:12 AM - end reading; read time: 00:04:34
26/8/2007 7:21:13 AM - begin reading
26/8/2007 7:28:46 AM - end reading; read time: 00:07:33
26/8/2007 7:28:47 AM - begin reading
26/8/2007 7:33:03 AM - end reading; read time: 00:04:16
26/8/2007 7:33:04 AM - begin reading
26/8/2007 7:35:18 AM - end reading; read time: 00:02:14
26/8/2007 7:35:19 AM - begin reading
26/8/2007 7:36:48 AM - end reading; read time: 00:01:29
26/8/2007 7:36:49 AM - begin reading
26/8/2007 7:38:57 AM - end reading; read time: 00:02:08
26/8/2007 7:38:58 AM - begin reading
26/8/2007 7:42:13 AM - end reading; read time: 00:03:15
26/8/2007 7:42:14 AM - begin reading
26/8/2007 7:45:24 AM - end reading; read time: 00:03:10
26/8/2007 7:45:26 AM - begin reading
26/8/2007 7:51:07 AM - end reading; read time: 00:05:41
26/8/2007 7:51:08 AM - begin reading
26/8/2007 7:55:19 AM - end reading; read time: 00:04:11
26/8/2007 7:55:20 AM - begin reading
26/8/2007 8:00:41 AM - end reading; read time: 00:05:21
26/8/2007 8:00:42 AM - begin reading
26/8/2007 8:04:58 AM - end reading; read time: 00:04:16
26/8/2007 8:05:00 AM - begin reading
26/8/2007 8:07:18 AM - end reading; read time: 00:02:18
26/8/2007 8:07:20 AM - begin reading
26/8/2007 8:08:50 AM - end reading; read time: 00:01:30
26/8/2007 8:08:51 AM - begin reading
26/8/2007 8:10:08 AM - end reading; read time: 00:01:17
26/8/2007 8:10:09 AM - begin reading
26/8/2007 8:12:15 AM - end reading; read time: 00:02:06
26/8/2007 8:12:17 AM - begin reading
26/8/2007 8:15:28 AM - end reading; read time: 00:03:11
26/8/2007 8:15:30 AM - begin reading
26/8/2007 8:17:48 AM - end reading; read time: 00:02:18
26/8/2007 8:17:49 AM - begin reading
26/8/2007 8:21:15 AM - end reading; read time: 00:03:26
26/8/2007 8:21:16 AM - begin reading
26/8/2007 8:24:46 AM - end reading; read time: 00:03:30
26/8/2007 8:24:47 AM - begin reading
26/8/2007 8:28:28 AM - end reading; read time: 00:03:41
26/8/2007 8:28:29 AM - begin reading
26/8/2007 8:29:35 AM - end reading; read time: 00:01:06
26/8/2007 8:29:51 AM - end Duplicate; elapsed time: 02:13:33
the requested operation was successfully completed(0)
26/8/2007 7:16:14 AM - begin Duplicate
26/8/2007 7:16:15 AM - requesting resource app0101-hcart3-destaging
26/8/2007 7:16:15 AM - awaiting resource app0101-hcart3-destaging - will retry physical later
26/8/2007 8:49:07 AM - Error bpduplicate(pid=4588) Backupid fsr0024_1188079270 copy 2 already exists
26/8/2007 8:49:08 AM - Error bpduplicate(pid=4588) Duplicate of backup id fsr0024_1188079270 failed, the entity already exists (226).
26/8/2007 8:49:08 AM - Error bpduplicate(pid=4588) Status = no images were successfully processed.
26/8/2007 8:49:06 AM - granted resource S00305
26/8/2007 8:49:06 AM - granted resource STK.T10000A.000.0.1.1.2
26/8/2007 8:49:06 AM - granted resource app0101-hcart3-destaging
26/8/2007 8:49:08 AM - end Duplicate; elapsed time: 01:32:54
no images were successfully processed(191)
08-29-2007 07:32 AM
08-30-2007 07:48 AM
I found the same issues with de-staging after upgrading to MP5. My setup is similar to yours (Win2K3-SP2, 6.0 Ent. MP5). I’ve opened a case with support and it’s already been escalated (once they finally got back to me from my initial contact). Apparently there is already an ETRACK on this issue.
What I found was that a de-staging job would start and determine what images to process. Before MP5 the parent job (backup job that starts the duplications) would continue to run until all duplications were complete. Now with MP5 the parent job completes after all duplications have started or are queued. So if a second de-staging job is started either by schedule or manually it will try to determine which images to process. If the first de-staging is not complete the second one will include images that were already selected in the first job but not complete. So now one of the images is setup to de-stage in both jobs. It will complete for one of the jobs but throw a 191 error for the other one. My logs are very similar to yours.
Seems to me like a pretty obvious bug that wasn’t tested very thoroughly.
Brad
09-04-2007 02:22 AM
09-07-2007 05:33 AM
We have a theory of what's happening, but don't quite have all of the evidence to support it. We think we are seeing a situation where DSSU duplication jobs are launching, and the parent job is going to Done, while the child jobs continue to process. For those child jobs that are longer in duration, when another DSSU duplication job launches, they backupID's of the dup jobs still in progress are added to the new DSSU duplication job "bid file" or job list - thereby generating status 191's for those particular jobs.
09-07-2007 12:22 PM
09-10-2007 12:56 AM
09-11-2007 12:38 AM
We are thinking this issue is indeed the same as we suspected. We would like you to try something if it is possible. We want you to change the Frequency of your DSSU duplication schedules. Currently they run with a frequency of every 1 hour. Can this be reduced to run every 2 hours or every three hours?
We think this will greatly reduce the Status 191 events.
This request is a temporary workaround, not a final solution. This behavior is something which surfaced due to some scheduling changes introduced in 6.0 MP5. We are still designing a way to resolve the problem within the code but the solution is not yet complete.The root of the problem is that an imaged duplication which lasts longer then 1 hour - are considered eligible for duplication again when the scheduler launches 1 hour later because there is not yet a completed "copy 2'. However, there is a lock file which is associated with the actively duplicating image which causes the next duplication job to terminate with Status 191. We are thinking that expanding the duration between duplications will allow most duplication jobs to complete before the next dup job launches thereby completing the creation of "copy 2" and avoiding the problem.
09-13-2007 05:07 AM
Fred:
I also had a call from a backline engineer Monday afternoon (9/10) and we had a long conversation about the problem. I’m sure my issue is exactly the same as yours. I did expand the frequency of the schedule on the DSU’s over the last week and this does help eliminate the 191 errors. The engineer would not admit that is was a bug but that it was operating as designed. My problem is that if it’s not really an error then an error status shouldn’t be returned in activity monitor.
Although I can adjust my schedule for the DSU’s to eliminate some 191 errors, I can never predict exactly how long a duplication job will take some of mine run for 20 or more hours. If the schedule frequency is increased to a very high value than I may miss or delay duplicating some images.
I finally got him to admit that the behavior had changed in MP5 and this was a result of a change they made to fix another problem. He said engineering would discuss this and make a determination if a fix would be made.
He also told me that this "feature" would also work the same in verson 6.5. Too bad, I was hoping to upgrade soon, maybe after the first MP release.
Good Luck
Brad
09-29-2007 03:23 PM
10-01-2007 05:23 AM
10-01-2007 07:03 AM
10-10-2007 06:45 AM
10-10-2007 07:38 AM
All, if you need to exchange info please use the forums messanger system. I have removed your e-mail address to prevent spam.
@Daryl Kinnaird wrote:
Could you send me the patch as well. I am having the same issue.
{Removed personal info}
Thanks
Message Edited by John_B on 10-10-2007 07:37 AM
11-09-2007 08:37 AM
12-19-2007 10:50 AM
02-13-2009 09:46 PM
Hey Guys,
I recently upgraded to 6.5 and now.3 and i'm still having 191 errors. See blow, anyone has any ideas that would?? i would greately appreciated.
Thanks Pete....
2/13/2009 7:21:23 PM - end reading; read time: 00:08:49
2/13/2009 7:46:11 PM - Error bpduplicate(pid=4516) socket read failed: errno = 10054 - An existing connection was forcibly closed by the remote host.
2/13/2009 7:46:11 PM - Error bpduplicate(pid=4516) host chi-nt-bkup04 backup id uschifetdbs02_1233875716 read process failed, file read failed (13).
2/13/2009 7:46:12 PM - Error bpduplicate(pid=4516) Duplicate of backupid uschifetdbs02_1233875716 failed, file read failed (13).
2/13/2009 7:46:12 PM - Error bpduplicate(pid=4516) Status = no images were successfully processed.
2/13/2009 7:46:13 PM - end Duplicate; elapsed time: 03:19:30