Highlighted

How to tell what a queued duplication job relates to.

We have several queued duplication jobs, but when you click on them in the Activity monitor, there are no details about the policy it is for or the clients, so how do you find this out?

Mike

1 Solution

Accepted Solutions
Highlighted
Accepted Solution!

You can use nbstlutil

You can use nbstlutil stlilist -U

View solution in original post

10 Replies
Highlighted

What is generating the

What is generating the duplication jobs? Vault, SLP, manual?

Highlighted

SLP - so the policy of the

SLP - so the policy of the duplication job itself is "SLP-<storage_life_policy_being_used>.

Mike

Highlighted
Accepted Solution!

You can use nbstlutil

You can use nbstlutil stlilist -U

View solution in original post

Highlighted

SLPs are not policy based

SLPs are not policy based they are storage based. You could check the images

 

Clicking on the job in activityy monitor in the Job Overview  there is a File List: - this should contain your image or images backupid

If you run: bpimagelist -backupid <backupid> -l

The line with IMAGE - the 7 field is the policy. Alternatively if you chage -l to -L in the bpimagelist command above you will get a long listing.

 

bpimagelist -backupid <backupid> -L | grep Policy:   (Unix)

bpimagelist -backupid <backupid> -L | findstr Policy:  (Windows)

 

This will give you the policy related to the image/s in that duplication job.

Highlighted

Thanks for your

Thanks for your replies.

First a bit more information on issue.   Currently we have about 900 "duplicatation" jobs queued, 34 active and about  2200 done.  All backups are to disk and they are also replicated to a second disk at DR and replicated to tape (8 drives) at prod, so in effect we have the actual copy, 2 disk copies and a tape copy, so the tape copy is somewhat overkill so we are not at risk from such a large backlog of duplications (no queued disk backups), but I still need to sort it out.

A couple of months ago, NBU was restarted and the tape backups didn't work after NBU came back up and this wasn't spotted initially and it took Symantec a couple of days to figure out to get tape backups working.  So a few weeks ago we had about 700 duplication jobs queued which were going down and went down to 500, but in the last week, they have started going up quickly.  We have tried cancelling jobs that are more than 5 days old, but they just seem to reschedule.

Back to responses:

In activity monitor the File List is blank - it only get populated once the job is active/Done:

job-overview-blanked.png

The detailed status doesn't give any info either:

job-detailed.png

 

bpimagelist doesn't work:

# bpimagelist -backupid 353490
VxSS authentication failed

"nbstlutil stlilist -U" is the most useful, however they are a few issues:

It gives output of over 15,000 jobs and there are only about 3200 in activity monitor any many of these are disk backups and in nbstlutil output, 8000 say they are "NOT_STARTED" but activity monitor says there are 900

There is no job id in the nbstlutil output so no way to relate these jobs to activity monitor, so if I kill a duplication job in activity monitor I don't know what I am killing, other than how long it has been queued for

There is no policy (policy for job been duplicated) in nbstlutil output, so I don't if it is O/S, Filesystem, DB or archive logs and some clients backup mulitple databases in different policies.

I am reading up on nbstlutil , but haven't got anything more useful out of it yet.

Thanks

Mike

 

Highlighted

Hi Mike, Can you give some

Hi Mike,

 

Can you give some background on why you're on this track, that might be a better idea?

 

If you cancel an SLP job it will just come back, they're like killer zombie robots.
 

Highlighted

I need to reduce the

I need to reduce the duplication back log and stop it increasing, but as the Activity only gives the jobid for queued jobs and zero details about what it is duplicating, this is impossible to investigate from the Activity monitor

nbstlutil has helped a lot, and I have now reduced the initial 15,000+ lines to 2,500 lines with command:

nbstlutil list -rt I -image_state 2

This command give the information I require to identify what the duplication is for which is "Client" and "Policy", but I still don't why I have 2500 lines when I only have about 900 jobs queued as I have used flag "-image_state 2" which only lists jobs that are in IN_PROCESS state

nbstlutil output does not list Job ID - it uses backup ID and the activity monitor and CLI bpdbjobs equiv only uses job ID and doesn't list backup ID, so I cannot correlate the information.

The only way I can currently correlate the output from bpdbjobs to nbstlutil is to use "nbstlutil list -jobid", but I can only do this one job at a time and job id is evidently not indexed as this command takes over 2 mins which means it would take over 30 hours to run for the 900+ jobs in the activity monitor and also "nbstlutil list -jobid" does NOT gives the vital "Policy" information that "nbstlutil list -backupid" gives

So if someone can tell me how to correlate backupid to jobid, then I maybe able to investigate the ever growing list of duplication jobs.  

You said if you cancel an SLP job it will just come back, so how can I permanently cancel a duplication?

Mike

Highlighted

Finally getting somewhere by

Finally getting somewhere by running:

nbstlutil list -rt IC -image_state 2

The critcical change is using "-rt IC", rather than "-rt I" which gives me the COPY record as well as the Image record and the COPY record contains the jobid.  It is not quite so straight forward as there is more than one COPY record, so I need to just look at the "DUPLICATION" copy record.  Even then I get different backup-id's with the same jobid, which I haven't figured out why yet, so I then have to select the last backup-id, so I do all this as follows:

On Solaris MNU master:

nbstlutil list -rt IC -image_state 2 > nbstl-list-image_state_2-rt_IC

Copy file to Linux as Solaris does not have gawk installed which can use the strftime function to convert EPOCH time to a readable format, and then on Linux run:

awk 'BEGIN {dt="%y/%m/%d-%T";print "Backup_ID Client Backup_Time Policy Storage_Lifecycle_Policy Time_In_Process Job_ID" }
        $2 == "I"       {image=$4" "$5" "strftime(dt,$6)" "$7" "$10" "strftime(dt,$12)}
        $2 == "C" && $5 == "2"  {print image, $10}
' nbstl-list-image_state_2-rt_IC | awk '
        jobid != $NF {print saved} {saved=$0; jobid=$NF} END {print saved}'

 

So the first awk above prints fields from Image record + the job id from the copy record of type 2 (duplication) and then the second awk filters this to only print the last backup_id where the job_id's match.

Mike

Highlighted

Hi Mike, Sounds like there

Hi Mike,

 

Sounds like there is too high demand for duplicates and not enough resources.

 

You can cancel is using, but note that once cancelled they won't start again. You'll need to duplicate manually.

nbstlutil cancel -backupid <backupid>
nbstlutil cancel -lifecycle <lifecycle>

 

What I think you need to do is suspend the SLPs (stop secondary operations like duplication) and then enable them one by one. That way you can see what is generating the bulk.

 

For the SLP design, is it very granular? I find it's easier to manage if you have a well layed out design.

Below is a Simple Example split by SITE, Application Type, Frequency. From there you can still expand it to going to more detail on the Application (SQL, Exhange, Oracle, NDMP) etc. That way when the SLP run you don't just SLP_Daily and have no clue who its for.

FDC_APP_Daily
FDC_APP_Weekly
FDC_APP_Monthly
FDC_APP_Yearly
FDC_FS_Daily
FDC_FS_Weekly
FDC_FS_Monthly
FDC_FS_Yearly
JDC_APP_Daily
JDC_APP_Weekly
JDC_APP_Monthly
JDC_APP_Yearly
JDC_FS_Daily
JDC_FS_Weekly
JDC_FS_Monthly
JDC_FS_Yearly

 

 

Highlighted

Thanks Riaan. There is more

Thanks Riaan.

There is more information in https://www-secure.symantec.com/connect/forums/how-tell-what-queued-duplication-job-relates#comment-... for anyone else who might be trying to get information about queued duplication jobs