Forum Discussion

Daryl_Gawn's avatar
Daryl_Gawn
Level 4
15 years ago
Solved

constant duplication failure

hi

I have just started a new job and they are using netback 7.0.1 which is failing to complete duplication, backup 2 disc is fine but duplication to tape isnt

I am getting some duplcation jobs failing with an error that has Critical bpdm(pid=7924) image open failed: error 2060013: no more entries. The job then starts up again later in the day and tries again delaying other jobs from starting.

I think the error is happening because it either can find the image on disc to duplicate or the image on disc has errors?

If that is correct im not really interesting in troubleshooting why the image has errors i just want to clear the job from starting up again  prevent it from starting again

I have done some research but need some futher assistance

I think these are old jobs that have failed to duplicate, the reason i think this is i ran /bin/admincmd/nbstlutil stlilist -incomplete and gave me alot of entries of jobs, i know they are old jobs as the policy they ran under is for jobs run earlier in June.

The job ids in the logs are also the ones showing in the file list (in activity monitor )of the same duplication jobs that are failing.

I found this article below on how to clear the jobs which i have done and the jobs temporarily dissapeared after i deleted them then re - running this command to see that they are not in the list anymore

http://www.symantec.com/business/support/index?page=content&id=HOWTO36392

I thought that worked but then i logged in later in the day to find all the same duplication jobs started and failed again.

Then i ran the above command again and the jobs are in their again and there are more old ones

 Is there something else i need to do or something i did wrong? i just want to clear all the old jobs so thjey dont start again or itdoesnt delay other jobs from starting

 

Any helpp is much appriciated

Thanks Daryl

  •  

    *** I've re-run this test, which has changed my summary, apologies, something was amiss with the first test I think which led me to give incorrect results).
     
     
    The status of the SLP Image/ Copy/ Fragement will differ at different stages of the process depending of if the SLP is filxed or capacity managed, though that should not be an issue here.
     
    Once cancelled, the backup id will exist in the EMM DB tables mentioned below , until the image actually expires ...
     
    I ran a SLP controlled job.  (womble_1309779201 is capacity managed)
     
    root@womble netbackup $ nbstlutil list
     
    V6.5 I womble womble_1309779201 womble 1309779201 SLP_Test 0 0 SLP_Test 1 0 *NULL* 1
    V6.5 C womble womble_1309779201 1 2147483647 1310384001 *NULL* 3 0 0 0 0 *NONE* 2147483647
    V6.5 F womble womble_1309779201 1 1 0 @aaaac womble *NULL* 0 6 1 33587200 1 @aaaac
     
    root@womble netbackup $ nbstlutil stlilist
    V6.5 I womble_1309779201 SLP_Test 1
     
     
    I cancelled it using the  nbstlutil cancal -backupid command :
     
    Checking the status again :
     
    root@womble netbackup $ nbstlutil list
     
    V6.5 I womble womble_1309779201 womble 1309779201 SLP_Test 0 0 SLP_Test 3 0 *NULL* 1
    V6.5 C womble womble_1309779201 1 1310384001 1310384001 *NULL* 3 0 0 0 0 *NONE* 1310384001
    V6.5 F womble womble_1309779201 1 1 0 @aaaac womble *NULL* 0 6 1 33587200 1 @aaaac
     
    root@womble netbackup $ nbstlutil stlilist
     
    V6.5 I womble_1309779201 SLP_Test 3
     
    We see the status of the Image/ Copy/ Fragements have changed, eg. stlilist is shows state (3) = complete, which is what the nbstlutil cancel does, sets the database to 'complete'
     
     
    Checking the DB tables for the backup id (nbdb_unload) I find :
     
    /tmp/db/745.dat:'1000002','womble_1309779201','womble','1309779201','SLP_Test','0','0','SLP_Test','3','0','','1','2011-07-04 11:33:27.432567','2011-07-04 11:36:39.160555'
    /tmp/db/746.dat:'1000002','womble_1309779201','1','1310384001','1310384001','','3','0','0','0','0','0','*NONE*','0','0','1310384001','2011-07-04 11:33:27.449117','2011-07-04 11:36:39.135633'
    /tmp/db/747.dat:'1000002','womble_1309779201','1','1','0','@aaaac','1000002','0','0','6','1','33587200','1','@aaaac','2011-07-04 11:33:27.469689','2011-07-04 11:33:27.469701'
     
    This is expected, as the nbstlutil commands are showing output, however, the SLP is marked as complete so will not rerun.
     
    Now I really expire the image ...
     
    bpexpdate -d 0 -force -backupid 
     
    Then the commands show no output ...
     
    root@womble netbackup $ nbstlutil stlilist
    root@womble netbackup $ nbstlutil list
     
     
    More importantly ... (dump the EMM DB and searh the output for the backup id)
     
    nbdb_unload /tmp/db_unload 
     
    root@womble ftp $ ls /tmp/db_unload |head -4
    704.dat
    705.dat
    706.dat
    707.dat    .etc ...
     
     
    ... the grep command does not return the backup id in the db tables anymore  .(745/ 746/ 747.dat)..
     
    cd /tmp/db_unload
    grep womble_1309779201 *
     
    In summary, I have not seen a SLP image that is in 'Complete' state come back to life.
     
    If the backupid is not shown in the nbstlutil command output, then as demonstrated here, that image is not in the EMM DB tables that control SLPs, so there is no way it can 'spring back ' to life that I can see.  SLP can only act on images in the database tables EMM_Image/ ImageCopy/ ImageFragement
     
     
    So, for your canceled backupid (just one will do), does it appear in the nbdb_unload tables after it has been cancelled ???
     
    1.  If the backup id appears in the x3 tables as demonstrated, after it is cancelled, AND, if the nbstlutil commands do not show it in their output, then there is maybe some issue with the DB, if, the images are definately expired.  To be totally accurate, we also need to run nbdelete -allvolumes (deleting the images set the fragements in the db 'to be deleted', nbdele clean these up).
     
    1a. If the backup id cancelled (from SLP) but is NOT expired, I would expect to see the backup id in these DB tables, but it should be showing in the cancelled state, in which case,the SLP will not duplicate it.
     
    1b, If the backupID is in the EMM DB tables, has NOT expired but does not show in the nbstlutil command outputs, I think something could be up with the DB entries, but, this would need further investigation to confirm.
     
     
    2.  IF the backup id does not appear in the db after backkup id is cancelled (and nbdelete -allvolumes run) , then clearly there is still something going on if it re-appears, but I cannot say what, and more investigation would be required.
     
     
     
    Martin

5 Replies

Replies have been turned off for this discussion
  • i have run the command /bin/admincmd/nbstlutil stlilist -image_incomplete  and there are even  more old entries in there now, is this coming from a database?? is there a command i can run to clear all old incompleted jobs from the database

  • It's best to contact Symantec support if your purpose is to clear off those "pending incomplete images" that are bound to start. I believe they will provide you SQL script to remove those entries from EMM database.

    Another way is to check the SLP version to make sure all versions pointed to the same setting, especially destination storage unit. SLP version is introduced in 6.5.5+, when modifying SLP policy, it is going to keep an older version. Due to this feature, it is possible an incomplete image may still be associated to older version. If a storage unit is changed in between versions, the incomplete image can still point to the "unavailable" storage unit, thereby causing duplication error. Not too sure if it's causing the error you seen though.

    To list out all SLP version:

    # nbstl -L -all_versions

    To modify storage unit of a SLP version:

    # nbstl  <policyname>  -modify_version -version <version#> -residence oldSTU,newSTU

    Storage unit is among some of the attributes you can change in SLP. You can refer to the doc in http://www.symantec.com/docs/TECH72969

  • For the quick answer,

    Yes, it's coming from the EMM db

    If nbstlutil cancel does not work., you cannot just clear them ..., well not selectively

     

    Martin

  •  

    *** I've re-run this test, which has changed my summary, apologies, something was amiss with the first test I think which led me to give incorrect results).
     
     
    The status of the SLP Image/ Copy/ Fragement will differ at different stages of the process depending of if the SLP is filxed or capacity managed, though that should not be an issue here.
     
    Once cancelled, the backup id will exist in the EMM DB tables mentioned below , until the image actually expires ...
     
    I ran a SLP controlled job.  (womble_1309779201 is capacity managed)
     
    root@womble netbackup $ nbstlutil list
     
    V6.5 I womble womble_1309779201 womble 1309779201 SLP_Test 0 0 SLP_Test 1 0 *NULL* 1
    V6.5 C womble womble_1309779201 1 2147483647 1310384001 *NULL* 3 0 0 0 0 *NONE* 2147483647
    V6.5 F womble womble_1309779201 1 1 0 @aaaac womble *NULL* 0 6 1 33587200 1 @aaaac
     
    root@womble netbackup $ nbstlutil stlilist
    V6.5 I womble_1309779201 SLP_Test 1
     
     
    I cancelled it using the  nbstlutil cancal -backupid command :
     
    Checking the status again :
     
    root@womble netbackup $ nbstlutil list
     
    V6.5 I womble womble_1309779201 womble 1309779201 SLP_Test 0 0 SLP_Test 3 0 *NULL* 1
    V6.5 C womble womble_1309779201 1 1310384001 1310384001 *NULL* 3 0 0 0 0 *NONE* 1310384001
    V6.5 F womble womble_1309779201 1 1 0 @aaaac womble *NULL* 0 6 1 33587200 1 @aaaac
     
    root@womble netbackup $ nbstlutil stlilist
     
    V6.5 I womble_1309779201 SLP_Test 3
     
    We see the status of the Image/ Copy/ Fragements have changed, eg. stlilist is shows state (3) = complete, which is what the nbstlutil cancel does, sets the database to 'complete'
     
     
    Checking the DB tables for the backup id (nbdb_unload) I find :
     
    /tmp/db/745.dat:'1000002','womble_1309779201','womble','1309779201','SLP_Test','0','0','SLP_Test','3','0','','1','2011-07-04 11:33:27.432567','2011-07-04 11:36:39.160555'
    /tmp/db/746.dat:'1000002','womble_1309779201','1','1310384001','1310384001','','3','0','0','0','0','0','*NONE*','0','0','1310384001','2011-07-04 11:33:27.449117','2011-07-04 11:36:39.135633'
    /tmp/db/747.dat:'1000002','womble_1309779201','1','1','0','@aaaac','1000002','0','0','6','1','33587200','1','@aaaac','2011-07-04 11:33:27.469689','2011-07-04 11:33:27.469701'
     
    This is expected, as the nbstlutil commands are showing output, however, the SLP is marked as complete so will not rerun.
     
    Now I really expire the image ...
     
    bpexpdate -d 0 -force -backupid 
     
    Then the commands show no output ...
     
    root@womble netbackup $ nbstlutil stlilist
    root@womble netbackup $ nbstlutil list
     
     
    More importantly ... (dump the EMM DB and searh the output for the backup id)
     
    nbdb_unload /tmp/db_unload 
     
    root@womble ftp $ ls /tmp/db_unload |head -4
    704.dat
    705.dat
    706.dat
    707.dat    .etc ...
     
     
    ... the grep command does not return the backup id in the db tables anymore  .(745/ 746/ 747.dat)..
     
    cd /tmp/db_unload
    grep womble_1309779201 *
     
    In summary, I have not seen a SLP image that is in 'Complete' state come back to life.
     
    If the backupid is not shown in the nbstlutil command output, then as demonstrated here, that image is not in the EMM DB tables that control SLPs, so there is no way it can 'spring back ' to life that I can see.  SLP can only act on images in the database tables EMM_Image/ ImageCopy/ ImageFragement
     
     
    So, for your canceled backupid (just one will do), does it appear in the nbdb_unload tables after it has been cancelled ???
     
    1.  If the backup id appears in the x3 tables as demonstrated, after it is cancelled, AND, if the nbstlutil commands do not show it in their output, then there is maybe some issue with the DB, if, the images are definately expired.  To be totally accurate, we also need to run nbdelete -allvolumes (deleting the images set the fragements in the db 'to be deleted', nbdele clean these up).
     
    1a. If the backup id cancelled (from SLP) but is NOT expired, I would expect to see the backup id in these DB tables, but it should be showing in the cancelled state, in which case,the SLP will not duplicate it.
     
    1b, If the backupID is in the EMM DB tables, has NOT expired but does not show in the nbstlutil command outputs, I think something could be up with the DB entries, but, this would need further investigation to confirm.
     
     
    2.  IF the backup id does not appear in the db after backkup id is cancelled (and nbdelete -allvolumes run) , then clearly there is still something going on if it re-appears, but I cannot say what, and more investigation would be required.
     
     
     
    Martin
  • hi Martin

    thanks for the info, i cant do much of the above untill next weekend due to various reasons

     

    all the info you have provided looks like what i am after

     

    thanks alot for your help

     

    Cheers Daryl