Highlighted

Slow SLP Duplication Jobs

hi ,

Im using SLP policies ( backup to disk -Local Disk then duplicate to tape -IBM TS), recently duplication jobs take a lot more time to complete ( around 10-12 hours for each) when i check Netbackup i can see it will find right media right away and loads it in tape drive very fast but the duplication job will hangs at "Waiting for positioning of Media ID" for about 10 hours.

any help here would be apperciated.

thanks!

NBRB logs attached to this post.

Netbackup 7.7

Windows Server 2012 R2

IBM TS Series LTO-6

 

23 Replies
Highlighted

Re: Slow SLP Duplication Jobs

and it's happening to all duplicate backups VM - WIndows .... but when i run direct backup to tape everything works fine.

Highlighted

Re: Slow SLP Duplication Jobs

This sounds like a hardware issue.
I have seen something similar where the tape drives (same make/model) had different levels of firmware.
Media that was written by newer firmware was later on loaded in a drive with older firmware. 
The drive with older firmware was battling to position the media and eventually fail with 'media position error'. 

Can you confirm firmware levels on tape drives?
How old are the tapes? Can you see different behaviour between old (reused) tapes and new media? 

By the time you see 'waiting for positioning'  bptm process on the media server is waiting for tape drive to position and report back success.
We need bptm log to see why positioning is taking so long and not produce an error much earlier. 

bptm log on the media server should be at fairly high logging level - 3 at a minimum. Level 5 will probably be best.

Ensure that you also have VERBOSE entry in vm.conf on the media server.
Restart Device Management service after adding this entry.
Hardware issues will now be logged to Event Viewer System and Application logs. 

Highlighted

Re: Slow SLP Duplication Jobs

thanks for your prompt reply.

actually this is possible because Library came with one drive and after a couple of months i added an aditional drive to it.

but backup to tape works fine it's just duplication jobs that have this problem.

Highlighted

Re: Slow SLP Duplication Jobs

We still need bptm log - as well as Job details for backup job and Job details for duplication job.

Job details will give us timestamps and bptm PIDs that we can use to trace each of the jobs in bptm log. 

nbrb log shows at 10:09 that the robot/tape STU is busy and resource can therefore not be allocated. 

cannot allocate, resource is busy LCM_servernbsrv01-hcart2-robot-tld-0


Same 'resource busy' message at 10:11, 10:14, 10:19, 10:21, 10:24, 10:29, 10:31, 10:34, 10:39, 10:41, 10:44, 10:49, 10:51, etc.
This goes on and on...

This does not seem to be related to your issue with lengthty wait for positioning as resources are allocated at this stage and waiting for tape drive to report successful positioning. 

What else is active and using the tape drives at this point in time? 

Just giving us nbrb log is not helping.... 

Highlighted

Re: Slow SLP Duplication Jobs

If I remember correct 10 hours is a default timeout for tape operations.

Normally I would have guessed on changed tape device files order, but then the backups shouldn't work either

Have you tried to do manual a duplication of images from the disk to tape ?

Did you change anything on the SLP destination tape storage unit other the number of current drives ?

 

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue
Highlighted

Re: Slow SLP Duplication Jobs

i dont know what is this device

LCM_servernbsrv01-hcart2-robot-tld-0

 i dont have any storage unit or Robotic library in Netbackup with this lable, at least i cant see anything in Administration console.

where can i find and remove this stu?

library is managed by Netbackup and no other device or software has access to it .

Highlighted

Re: Slow SLP Duplication Jobs

LCM_servernbsrv01-hcart2-robot-tld-0 is the SLP name for storage unit servernbsrv01-hcart2-robot-tld-0

Is your SLP pointing at servernbsrv01-hcart2-robot-tld-0 ?

 

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue
Highlighted

Re: Slow SLP Duplication Jobs

hi,

no nothing changed , after this problem appeared i removed Library and Drives from netbackup and readded them.

Highlighted

Re: Slow SLP Duplication Jobs

Then your problem can be that the SLP is pointing to an old storage unit, there is a technote changing the storage in all versions of an SLP

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue
Highlighted

Re: Slow SLP Duplication Jobs

As per Michael's previous post - did you update Storage Unit properties to 2 drives after additional drive was added?

I think Michael is on the right track - probably a problem with SLP versioning... 

Highlighted

Re: Slow SLP Duplication Jobs

yes , i update my Library to 2 drives and it was working fine for a couple of months.

also after this problem appeared i removed the Library and Drives and added them again.

 

Highlighted

Re: Slow SLP Duplication Jobs

no ,not anymore .

i removed library "servernbsrv01-hcart2-robot-tld-0" but stu for that library is still there. (i cant delete it also i cant delete old SLP's that pointing to it) . i created a new stu for library with the name of "servernbsrv01-hcart3-robot-tld-0" and creat new slp's that point to it.

 

Highlighted

Re: Slow SLP Duplication Jobs

i have 2 kind of LTO Tapes in my library LTO-5 and LTO-6 , my drives are LTO-6.

should i change media type of lto-5 drives to somthing else? e.g. HCART ( media type for all tapes is HCART3 and drives are configured as HCART3)

Highlighted

Re: Slow SLP Duplication Jobs

That indicates that the old SLP still contains images with incomplete status.

As you have different media types, you should have different densities for both tapes and tape drives.

For example

HCART2 for LTO5 media and tape drives

HCART3 for LTO6 media and tape drives

If you only have LT06 tape drives, you should configure some as HCART2 to match the LTO5 media in this setup

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue
Highlighted

Re: Slow SLP Duplication Jobs

can it be my problem?

Highlighted

Re: Slow SLP Duplication Jobs

i have some difficulty seperating LTO5 and LTO6 tapes , i can change Media type and Type drive type but what should i do with Library?

Library is HCART3 ,  is it possible to use HCART2 and HCART3 Drives in HCART3 Library?

or should i partition the library into two different libraries one with HCART2 Density Library,HCART2 Drive and HCART2 Tapes and another one with HCART3 Density Library , HCART3 Drives , HCART3 Tapes.

Re: Slow SLP Duplication Jobs

i found multiple version of SLP policy that i cant see in NBU administration console.

how can i remove them?

Highlighted

Re: Slow SLP Duplication Jobs

after seperating tapes into two media type Hcart (LTO5) Hcart3 (LTO6) and changing one of drives to HCART, also changing of DIsk fragmantation to 5000 MB.now nbu pass "Waiting for positioning of Media ID"  quickly.

but now duplication job stuck at :

1/05/2016 16:52:55 - Info bptm (pid=15316) Waiting for mount of media id P523L6 (copy 2) on server servernbsrv01.DOMAIN.local.
11/05/2016 16:52:55 - started process bptm (pid=15316)
11/05/2016 16:52:55 - mounting P523L6
11/05/2016 16:52:55 - Info bptm (pid=15316) INF - Waiting for mount of media id P523L6 on server servernbsrv01.DOMAIN.local for writing.
11/05/2016 16:52:56 - begin reading
11/05/2016 16:53:45 - Info bptm (pid=15316) media id P523L6 mounted on drive index 1, drivepath {4,0,2,0}, drivename IBM.ULT3580-HH6.001, copy 2
11/05/2016 16:53:45 - Info bptm (pid=15316) INF - Waiting for positioning of media id P523L6 on server servernbsrv01.DOMAIN.local for writing.
11/05/2016 16:58:08 - end reading; read time: 0:05:12
11/05/2016 16:58:08 - begin reading
11/05/2016 17:01:48 - end reading; read time: 0:03:40
11/05/2016 17:01:48 - begin reading
11/05/2016 17:04:50 - end reading; read time: 0:03:02
11/05/2016 17:04:50 - begin reading
11/05/2016 17:07:46 - end reading; read time: 0:02:56
11/05/2016 17:07:46 - begin reading
11/05/2016 17:10:58 - end reading; read time: 0:03:12
11/05/2016 17:10:59 - begin reading

Highlighted

Re: Slow SLP Duplication Jobs

You need verbose bptm log to monitor progress of PID 15316.