10-14-2017 11:00 PM - edited 10-14-2017 11:07 PM
this started with my pending backup images suddenly disappearing after 3 days of being in the queue. please note that they haven't been processed yet for duplication. why? because there is this undocumented "feature" that deletes backup images after 3 days in a queue.
so now, my problem is how do i know that the SLP, there were working years before, is still working? how do you check?
regards,
10-16-2017 09:57 AM
how can you so sure that there is an undocumented feature?
Backup image will never get deleted if it has pending operations unless it met the longest retenction defined in the SLP.
if you are so sure that it is expiring the images before duplication and without reaching the retenction , you need to open a support case with enough details in hand.
Note:- Removing the BID files is different than removing the backup images, i am hoping that you are clear on this part.
10-17-2017 12:07 AM - edited 10-17-2017 12:21 AM
hi,
i have provided the links why i say that. if there is a way to resubmit the removed duplication jobs for the still-to-be-duplicated backup images, kindly inform us.
as for the BID files, this is the first time i've heard of it.
edit: also, when i try to view the backup in the catalog, it doesn't appear anymore.
10-17-2017 01:45 PM
SLP will automatically submit the duplicaiton job if the operation is incompleate, no manual work required.
if you are not able to see the image in catalog, below are the possiblities i am seeing
1) Backup image met its logest retenction and expired
2) someone cancled the SLP operation for this Image, so it consider all pending operations are compleated and expire the image when it met that specific copy expire date.
3) wrong search in catalog while you are looking for images.
if you believe above possiblities are not true in you case, i would suggest you to open a support ticket with enough details.
10-17-2017 02:29 PM
Not sure where my previous reply disappeared to ....
OK, I'm concerned if you say the images are not in the catalog. NBU shouldn't expire images until the expire time, and an SLP image, until it duplicates, is infinity.
Personally, I would run nbdb_unload <some empty dir> and search the image table for the ctime of the backup image.
A more simple check - nbstlutil stlilist - then search the output for the backup ID. Is it listed ? if so, does it have a 1, 2 or a 3 at the end ?
If an image has been manually removed from SLP, it cannot be resubmitted, and would have to be manually duplicated. If the image still exists, is under SLP control (will appear in nbstutil stlilist output, and have a 1 or 2 against it) - then investigation as to why it is 'stuck' would be required.
Another check
vxlogcfg -a -p 51216 -o 226 -s DebugLevel=6 -s DiagnosticLevel=6
Leave log running for a while (keep an eye on disk space) - ideally 24 hours
vxlogcfg -a -p 51216 -o 226 -s DebugLevel=1 -s DiagnosticLevel=1
Logs will be in /usr/openv/logs/nbstlutil - grep the backupid
These are the raw logs, which are good enough for a quick check, to make more readable:
vxlogview -p 51216 -o 226 -t 24:00 >226.txt
This will give the last 24 hours relevant to when you run the command.
Debug 6 will be quite large, depending on how busy the system is, I think the log lines that should give a clue are DebugLevel 4, so you could get away with this as a start, but to get the complete story, 6 will give more detail.
10-17-2017 10:27 PM
unfortunately, someone forgot to renew support license. still under process.
10-17-2017 10:28 PM - edited 10-17-2017 10:32 PM
wow that's a lot. will check.
in the meantime, can i ask how does one troubleshoot SLP running one day and not running the next day?
today i'm expecting to see my oracle SLP in the queue but it's not there. it is there, however, the previous day.
you see, there should be an SLP daily but what i get is it runs every other day; Oct 15, 17. where is 16th? maybe this has something to do or related to the error 12 i experienced with the SLP?
10-17-2017 10:47 PM
i can't run a diag level of 6 due to space (another story, i only have one tape drive running now; no big thanks to Oracle taking 25 days now to deliver the replacement part) so i ran the simple check "nbstlutil stlilist" and grepped my backup id randomly. none were found. :(
so what does that tell us?
note: afaik, no one manually cancelled/deleted anything
10-17-2017 10:57 PM
ok i'm officially worried now. my hd space is gone down, the Oct 16th SLP isn't there, and shouldn't SLP pickup and automatically run when a backup is done? when it doesn't, how to coach it into doing the missed SLP job?
10-17-2017 11:22 PM
If the image is not showing in nbstlutil stlilist output, it's either not under SLP control anymore, or has been completed.
The command shows three states for an image
1 Not started
2 In process
3 Completed
Images that have completed, eventually drop out of the list. IMages that are either 1 or 2 are not complete, and will stay in the list until they are.
(2 - In process does not mean the image is necessarily duplicating at that time, it's just means SLP has started to process it, add it into a batch to be duplicated etc ...).
Yes, SLP should automatically pick up and duplicate images - it can be paused / duplication schedule set etc ... to control when (like a backup window) but by default, images will automatically be duplicated within a short time (depending on how busy system is, how many drives available etc ).
If an image is not in the list, it can't be added, you would have to manually duplicate it.
Suggest running nbdb_unload <some area>
This will give a load of .dat file, and a reload.sql file.
Search in the reload file for INPUT INTO "DBM_MAIN"."DBM_Image"
Just under that line you will see something like:
FROM '/netbackup/db/829.dat' (it will vary depending on the directoy you used and the version in NBU)
grep in that file, for the ctime of the backupid - let's see if there is any trace of the image in the catalog.
10-17-2017 11:46 PM
in 7.7.3 unix, i don't have that "nbdb_unload" command. i already did a "find" nothing came up.
10-18-2017 03:15 AM
You shoudld have, it will be here:
/usr/openv/db/bin/nbdb_unload
10-18-2017 03:33 AM
ok found it. grepped for the backup IDs and none were found :(
10-18-2017 06:22 AM
No, don't grep for the backup ID as it won't be found. The client name is in a different field than the ctime, so, just grep for the ctime.
10-18-2017 11:21 PM
not sure where is the 'ctime' field so i grepped for the policy name that is close or on the backup date. found none.
'462275','1000002',,,,,,'ruh1siebdb01','109','Daily_ruh1siebdb01','1501885700','0','2','4','13','0','0','0','107296','1','1','0','12','9','0','0','0','0','0','0','2','0','8','2147483647','1','0','0','0','0','0','0','0','0','0','0','611393','1','0','0','0','oracle','Default-Application-Backup','','SBPROD_3198601498','Daily_ruh1siebdb01_1501885700_UBAK.f','657','','','614204966','0','0','1501885700','1501885700','12','1','',0x00000000000000000000000000000000,'Oracle_Daily_Policy','3','0','1501885889','Gold','19','0','0','0','1','255','1','0','0','0','0','0','0','2017-08-04 22:28:20.824536','2017-08-06 10:16:40.759505','0','','0' '462275','1000002',,,,,,'ruh1siebdb01','109','Daily_ruh1siebdb01','1501885700','0','2','4','13','0','0','0','107296','1','1','0','12','9','0','0','0','0','0','0','2','0','8','2147483647','1','0','0','0','0','0','0','0','0','0','0','611393','1','0','0','0','oracle','Default-Application-Backup','','SBPROD_3198601498','Daily_ruh1siebdb01_1501885700_UBAK.f','657','','','614204966','0','0','1501885700','1501885700','12','1','',0x00000000000000000000000000000000,'Oracle_Daily_Policy','3','0','1501885889','Gold','19','0','0','0','1','255','1','0','0','0','0','0','0','2017-08-04 22:28:20.824536','2017-08-06 10:16:40.759505','0','','0' '465916','1000002',,,,,,'ruh1siebdb01','109','Daily_ruh1siebdb01','1507752038','0','2','4','13','0','0','0','523777568','11','2','0','12','9','0','0','0','0','0','0','1','0','13127','1508356838','1','0','0','0','0','0','0','0','0','0','0','617389','0','0','0','0','oracle','Default-Application-Backup','','SBPROD_3198601498','Daily_ruh1siebdb01_1507752038_UBAK.f','655','','','628509734','0','0','1507752038','1507752038','12','1','',0x00000000000000000000000000000000,'Oracle_Daily_Policy','2','0','1507765406','Gold','19','0','0','0','1','255','1','0','0','0','0','0','0','2017-10-11 20:00:38.936380','2017-10-18 07:03:06.930102','0','','0' '465916','1000002',,,,,,'ruh1siebdb01','109','Daily_ruh1siebdb01','1507752038','0','2','4','13','0','0','0','523777568','11','2','0','12','9','0','0','0','0','0','0','1','0','13127','1508356838','1','0','0','0','0','0','0','0','0','0','0','617389','0','0','0','0','oracle','Default-Application-Backup','','SBPROD_3198601498','Daily_ruh1siebdb01_1507752038_UBAK.f','655','','','628509734','0','0','1507752038','1507752038','12','1','',0x00000000000000000000000000000000,'Oracle_Daily_Policy','2','0','1507765406','Gold','19','0','0','0','1','255','1','0','0','0','0','0','0','2017-10-11 20:00:38.936380','2017-10-18 07:03:06.930102','0','','0'
i'm looking for Oct 1 2017 and it's not appearing there.
10-18-2017 11:51 PM
The client name is the 8th field, the ctime is the 11th field.
'462275','1000002',,,,,,'ruh1siebdb01','109','Daily_ruh1siebdb01','1501885700','0','2','4','13','0','0','0','107296'
So, taing the 8th and 11th field and sticking in an _, you get the backupid.
ruh1siebd01_1501885700
If the line does not appear in the image table for a given backup, it has been expired.
10-23-2017 12:48 AM
Have you tried to run bpimagelist -L for backupid ruh1siebd01_1501885700 ?
10-23-2017 12:59 AM
yes. and also to the current backup id whose duplication is just now getting ran (after a week).
both returned "no entity was found".
10-23-2017 01:04 AM
10-23-2017 01:06 AM
the affected images are daily so retention for them are one week on tape and delete after duplication.