Forum Discussion

10 Replies

  • Finally getting somewhere by running:

    nbstlutil list -rt IC -image_state 2

    The critcical change is using "-rt IC", rather than "-rt I" which gives me the COPY record as well as the Image record and the COPY record contains the jobid.  It is not quite so straight forward as there is more than one COPY record, so I need to just look at the "DUPLICATION" copy record.  Even then I get different backup-id's with the same jobid, which I haven't figured out why yet, so I then have to select the last backup-id, so I do all this as follows:

    On Solaris MNU master:

    nbstlutil list -rt IC -image_state 2 > nbstl-list-image_state_2-rt_IC

    Copy file to Linux as Solaris does not have gawk installed which can use the strftime function to convert EPOCH time to a readable format, and then on Linux run:

    awk 'BEGIN {dt="%y/%m/%d-%T";print "Backup_ID Client Backup_Time Policy Storage_Lifecycle_Policy Time_In_Process Job_ID" }
            $2 == "I"       {image=$4" "$5" "strftime(dt,$6)" "$7" "$10" "strftime(dt,$12)}
            $2 == "C" && $5 == "2"  {print image, $10}
    ' nbstl-list-image_state_2-rt_IC | awk '
            jobid != $NF {print saved} {saved=$0; jobid=$NF} END {print saved}'

     

    So the first awk above prints fields from Image record + the job id from the copy record of type 2 (duplication) and then the second awk filters this to only print the last backup_id where the job_id's match.

    Mike

  • SLP - so the policy of the duplication job itself is "SLP-<storage_life_policy_being_used>.

    Mike

  • SLPs are not policy based they are storage based. You could check the images

     

    Clicking on the job in activityy monitor in the Job Overview  there is a File List: - this should contain your image or images backupid

    If you run: bpimagelist -backupid <backupid> -l

    The line with IMAGE - the 7 field is the policy. Alternatively if you chage -l to -L in the bpimagelist command above you will get a long listing.

     

    bpimagelist -backupid <backupid> -L | grep Policy:   (Unix)

    bpimagelist -backupid <backupid> -L | findstr Policy:  (Windows)

     

    This will give you the policy related to the image/s in that duplication job.

  • Thanks for your replies.

    First a bit more information on issue.   Currently we have about 900 "duplicatation" jobs queued, 34 active and about  2200 done.  All backups are to disk and they are also replicated to a second disk at DR and replicated to tape (8 drives) at prod, so in effect we have the actual copy, 2 disk copies and a tape copy, so the tape copy is somewhat overkill so we are not at risk from such a large backlog of duplications (no queued disk backups), but I still need to sort it out.

    A couple of months ago, NBU was restarted and the tape backups didn't work after NBU came back up and this wasn't spotted initially and it took Symantec a couple of days to figure out to get tape backups working.  So a few weeks ago we had about 700 duplication jobs queued which were going down and went down to 500, but in the last week, they have started going up quickly.  We have tried cancelling jobs that are more than 5 days old, but they just seem to reschedule.

    Back to responses:

    In activity monitor the File List is blank - it only get populated once the job is active/Done:

    job-overview-blanked.png

    The detailed status doesn't give any info either:

    job-detailed.png

     

    bpimagelist doesn't work:

    # bpimagelist -backupid 353490
    VxSS authentication failed

    "nbstlutil stlilist -U" is the most useful, however they are a few issues:

    It gives output of over 15,000 jobs and there are only about 3200 in activity monitor any many of these are disk backups and in nbstlutil output, 8000 say they are "NOT_STARTED" but activity monitor says there are 900

    There is no job id in the nbstlutil output so no way to relate these jobs to activity monitor, so if I kill a duplication job in activity monitor I don't know what I am killing, other than how long it has been queued for

    There is no policy (policy for job been duplicated) in nbstlutil output, so I don't if it is O/S, Filesystem, DB or archive logs and some clients backup mulitple databases in different policies.

    I am reading up on nbstlutil , but haven't got anything more useful out of it yet.

    Thanks

    Mike

     

  • Hi Mike,

     

    Can you give some background on why you're on this track, that might be a better idea?

     

    If you cancel an SLP job it will just come back, they're like killer zombie robots.
     

  • I need to reduce the duplication back log and stop it increasing, but as the Activity only gives the jobid for queued jobs and zero details about what it is duplicating, this is impossible to investigate from the Activity monitor

    nbstlutil has helped a lot, and I have now reduced the initial 15,000+ lines to 2,500 lines with command:

    nbstlutil list -rt I -image_state 2

    This command give the information I require to identify what the duplication is for which is "Client" and "Policy", but I still don't why I have 2500 lines when I only have about 900 jobs queued as I have used flag "-image_state 2" which only lists jobs that are in IN_PROCESS state

    nbstlutil output does not list Job ID - it uses backup ID and the activity monitor and CLI bpdbjobs equiv only uses job ID and doesn't list backup ID, so I cannot correlate the information.

    The only way I can currently correlate the output from bpdbjobs to nbstlutil is to use "nbstlutil list -jobid", but I can only do this one job at a time and job id is evidently not indexed as this command takes over 2 mins which means it would take over 30 hours to run for the 900+ jobs in the activity monitor and also "nbstlutil list -jobid" does NOT gives the vital "Policy" information that "nbstlutil list -backupid" gives

    So if someone can tell me how to correlate backupid to jobid, then I maybe able to investigate the ever growing list of duplication jobs.  

    You said if you cancel an SLP job it will just come back, so how can I permanently cancel a duplication?

    Mike

  • Hi Mike,

     

    Sounds like there is too high demand for duplicates and not enough resources.

     

    You can cancel is using, but note that once cancelled they won't start again. You'll need to duplicate manually.

    nbstlutil cancel -backupid <backupid>
    nbstlutil cancel -lifecycle <lifecycle>

     

    What I think you need to do is suspend the SLPs (stop secondary operations like duplication) and then enable them one by one. That way you can see what is generating the bulk.

     

    For the SLP design, is it very granular? I find it's easier to manage if you have a well layed out design.

    Below is a Simple Example split by SITE, Application Type, Frequency. From there you can still expand it to going to more detail on the Application (SQL, Exhange, Oracle, NDMP) etc. That way when the SLP run you don't just SLP_Daily and have no clue who its for.

    FDC_APP_Daily
    FDC_APP_Weekly
    FDC_APP_Monthly
    FDC_APP_Yearly
    FDC_FS_Daily
    FDC_FS_Weekly
    FDC_FS_Monthly
    FDC_FS_Yearly
    JDC_APP_Daily
    JDC_APP_Weekly
    JDC_APP_Monthly
    JDC_APP_Yearly
    JDC_FS_Daily
    JDC_FS_Weekly
    JDC_FS_Monthly
    JDC_FS_Yearly