constant duplication failure
hi
I have just started a new job and they are using netback 7.0.1 which is failing to complete duplication, backup 2 disc is fine but duplication to tape isnt
I am getting some duplcation jobs failing with an error that has Critical bpdm(pid=7924) image open failed: error 2060013: no more entries. The job then starts up again later in the day and tries again delaying other jobs from starting.
I think the error is happening because it either can find the image on disc to duplicate or the image on disc has errors?
If that is correct im not really interesting in troubleshooting why the image has errors i just want to clear the job from starting up again prevent it from starting again
I have done some research but need some futher assistance
I think these are old jobs that have failed to duplicate, the reason i think this is i ran /bin/admincmd/nbstlutil stlilist -incomplete and gave me alot of entries of jobs, i know they are old jobs as the policy they ran under is for jobs run earlier in June.
The job ids in the logs are also the ones showing in the file list (in activity monitor )of the same duplication jobs that are failing.
I found this article below on how to clear the jobs which i have done and the jobs temporarily dissapeared after i deleted them then re - running this command to see that they are not in the list anymore
http://www.symantec.com/business/support/index?page=content&id=HOWTO36392
I thought that worked but then i logged in later in the day to find all the same duplication jobs started and failed again.
Then i ran the above command again and the jobs are in their again and there are more old ones
Is there something else i need to do or something i did wrong? i just want to clear all the old jobs so thjey dont start again or itdoesnt delay other jobs from starting
Any helpp is much appriciated
Thanks Daryl
- *** I've re-run this test, which has changed my summary, apologies, something was amiss with the first test I think which led me to give incorrect results).The status of the SLP Image/ Copy/ Fragement will differ at different stages of the process depending of if the SLP is filxed or capacity managed, though that should not be an issue here.Once cancelled, the backup id will exist in the EMM DB tables mentioned below , until the image actually expires ...I ran a SLP controlled job. (womble_1309779201 is capacity managed)root@womble netbackup $ nbstlutil listV6.5 I womble womble_1309779201 womble 1309779201 SLP_Test 0 0 SLP_Test 1 0 *NULL* 1V6.5 C womble womble_1309779201 1 2147483647 1310384001 *NULL* 3 0 0 0 0 *NONE* 2147483647V6.5 F womble womble_1309779201 1 1 0 @aaaac womble *NULL* 0 6 1 33587200 1 @aaaacroot@womble netbackup $ nbstlutil stlilistV6.5 I womble_1309779201 SLP_Test 1I cancelled it using the nbstlutil cancal -backupid command :Checking the status again :root@womble netbackup $ nbstlutil listV6.5 I womble womble_1309779201 womble 1309779201 SLP_Test 0 0 SLP_Test 3 0 *NULL* 1V6.5 C womble womble_1309779201 1 1310384001 1310384001 *NULL* 3 0 0 0 0 *NONE* 1310384001V6.5 F womble womble_1309779201 1 1 0 @aaaac womble *NULL* 0 6 1 33587200 1 @aaaacroot@womble netbackup $ nbstlutil stlilistV6.5 I womble_1309779201 SLP_Test 3We see the status of the Image/ Copy/ Fragements have changed, eg. stlilist is shows state (3) = complete, which is what the nbstlutil cancel does, sets the database to 'complete'Checking the DB tables for the backup id (nbdb_unload) I find :/tmp/db/745.dat:'1000002','womble_1309779201','womble','1309779201','SLP_Test','0','0','SLP_Test','3','0','','1','2011-07-04 11:33:27.432567','2011-07-04 11:36:39.160555'/tmp/db/746.dat:'1000002','womble_1309779201','1','1310384001','1310384001','','3','0','0','0','0','0','*NONE*','0','0','1310384001','2011-07-04 11:33:27.449117','2011-07-04 11:36:39.135633'/tmp/db/747.dat:'1000002','womble_1309779201','1','1','0','@aaaac','1000002','0','0','6','1','33587200','1','@aaaac','2011-07-04 11:33:27.469689','2011-07-04 11:33:27.469701'This is expected, as the nbstlutil commands are showing output, however, the SLP is marked as complete so will not rerun.Now I really expire the image ...bpexpdate -d 0 -force -backupidThen the commands show no output ...root@womble netbackup $ nbstlutil stlilistroot@womble netbackup $ nbstlutil listMore importantly ... (dump the EMM DB and searh the output for the backup id)nbdb_unload /tmp/db_unloadroot@womble ftp $ ls /tmp/db_unload |head -4704.dat705.dat706.dat707.dat .etc ...... the grep command does not return the backup id in the db tables anymore .(745/ 746/ 747.dat)..cd /tmp/db_unloadgrep womble_1309779201 *In summary, I have not seen a SLP image that is in 'Complete' state come back to life.If the backupid is not shown in the nbstlutil command output, then as demonstrated here, that image is not in the EMM DB tables that control SLPs, so there is no way it can 'spring back ' to life that I can see. SLP can only act on images in the database tables EMM_Image/ ImageCopy/ ImageFragementSo, for your canceled backupid (just one will do), does it appear in the nbdb_unload tables after it has been cancelled ???1. If the backup id appears in the x3 tables as demonstrated, after it is cancelled, AND, if the nbstlutil commands do not show it in their output, then there is maybe some issue with the DB, if, the images are definately expired. To be totally accurate, we also need to run nbdelete -allvolumes (deleting the images set the fragements in the db 'to be deleted', nbdele clean these up).1a. If the backup id cancelled (from SLP) but is NOT expired, I would expect to see the backup id in these DB tables, but it should be showing in the cancelled state, in which case,the SLP will not duplicate it.1b, If the backupID is in the EMM DB tables, has NOT expired but does not show in the nbstlutil command outputs, I think something could be up with the DB entries, but, this would need further investigation to confirm.2. IF the backup id does not appear in the db after backkup id is cancelled (and nbdelete -allvolumes run) , then clearly there is still something going on if it re-appears, but I cannot say what, and more investigation would be required.Martin