02-26-2009 05:54 AM
Hi Gents,
Need your kind help/advise/instructions to cure my big problems in netbackup server3.4 running on solaris7 and overland tape library . I have one netbackup 3.4 server and a few media servers running under it. I used to received status code 83 from the backup of media tapes under one of the media servers . I suspect it is due to media tape errors and I changed the tapes under this media server group. But other media tapes under another media server also came out status code 83 and a lot of errors as per follows. PLS PLS HELP ME OUT FROM THIS TROUBLE SINCE I M GOING TO CRAZY.
/var/adm/messages
Feb 26 09:55:52 netbkpserver002 tldcd[22074]: TLD(2) key = 0x0, asc = 0x0, ascq = 0x0, NO ADDITIONAL SENSE INFORMATION
( It is due to backup tape library ? Pls.advise )
Feb 26 09:55:52 netbkpserver002 tldcd[22074]: TLD(2) Move_medium error: CHECK CONDITION
( Really due to media error ? )
Feb 26 10:02:25 netbkpserver002 tldcd[22386]: TLD(2) cannot dismount drive 2, slot 91 already is full
Feb 26 10:02:25 netbkpserver002 tldcd[22386]: TLD(2) key = 0x0, asc = 0x0, ascq = 0x0, NO ADDITIONAL SENSE INFORMATION
Feb 26 10:02:25 netbkpserver002 tldcd[22386]: TLD(2) Move_medium error: CHECK CONDITION
-Feb 26 10:19:55 netbkpserver002 tldcd[22962]: TLD(2) expected barcode (media03 ) in slot 85, found barcode (--------)
Feb 26 10:23:42 netbkpserver002 tldcd[23000]: TLD(2) expected barcode (media03 ) in slot 85, found barcode (media01 )
Feb 26 10:27:28 netbkpserver002 tldcd[23034]: TLD(2) expected barcode (media03 ) in slot 85, found barcode (media01 )
Feb 26 10:28:13 netbkpserver002 tldcd[23055]: TLD(2) expected barcode (media03 ) in slot 85, found barcode (media01 )
Feb 26 10:30:19 netbkpserver002 tldcd[23222]: TLD(2) cannot dismount drive 2, slot 91 already is full
Feb 26 10:31:48 netbkpserver002 ID[RICHPse.monlog.2000]: Disk state amber entered, Action: Move load from busy disks to idle disks
Feb 26 10:33:48 netbkpserver002 ID[RICHPse.monlog.2000]: Disk state white entered, Action: No activity
( I have no idea about the disk state amber and white , what does it mean ? )
Feb 26 20:14:50 netbkpserver002 tldd[2402]: TLD(2) drive 1 (device 0) is being DOWNED, status: Robotic mount failure
( The tape LTOD drives are keeping on down evey 5 to 10 minutes eventhough I changed the tape library drives , why ?)
/usr/openv/netbackup/db/error
1235615750 1 4 16 production 26131 0 0 bkpwindowsATC001 bpsched backup of client bkpwindowsATC001 exited with status 83 (media open error)
1235615750 1 4 16 netbkpserver002 0 0 0 *NULL* bpsched scheduler exiting - media open error (83)
Robot inventory/ media database is not sync with the physical slots in the tape library.
1235651678 1 4 4 netbkpserver002 0 0 0 netbkpserver002 bpsched skipping backup of client netbkpserver002, class bkp_web2, schedule incrbkps because it has exceeded the configured number of tries
1235651678 1 4 4 netbkpserver002 0 0 0 bkpunix006 bpsched skipping backup of client bkpunix006, class bkp_web3, schedule incrbkps because it has exceeded the configured number of tries
( Why so many tapes are not able to mount ? Drive error or media error ? )
1235650486 1 132 16 netbkpserver002 26153 0 0 bkpClass bptm FREEZING media id bkp02, it is unmountable and cannot be used for backups
1235650488 1 132 16 netbkpserver002 26153 0 0 bkpClass bptm FREEZING media id bkp07, it is unmountable and cannot be used for backups
( the tapes are usually freeze every now and then , due to media tape error ? )
************************************************************************************************
bpmedialist
Feb 26 20:14:52 netbkpserver002 vmd[2385]: media ID bkp04 has expired
Feb 26 20:14:52 netbkpserver002 vmd[2385]: media ID bkp05 has expired
Feb 26 20:14:58 netbkpserver002 vmd[2385]: media ID bkp04 has expired
Feb 26 20:14:58 netbkpserver002 vmd[2385]: media ID bkp05 has expired
( How can I use back those media to normal backup ? )
I really really really need your gentlemen help since I am getting a lot of problems with this old backup 3.4 server . Pls. help me out.
Thanks
02-26-2009 06:15 AM
Looks like there is LOT (sorry, had to put a bit of red there!) of mismatches between what the library knows/thinks are in its slots & what NetBackup thinks are there.
Can you get the library to re-register what tapes are where - power off/on? May have to manually unload any tapes that are in any drives (robtest).
Then do an inventory of the library on NetBackup with update configuration (I presume that's possible in 3.4?????) to try & get the two in sync again.
There are some disk errors in your logs but not sure if this is a NetBackup issue.
There may still be some issues to sort once the library & NetBackup are back in sync methinks! Tapes frozen probably due to these mis-matches so probably ok just to unfreeze them. Don't know how 3.4 coped with swapping of tape drives but there may be something to look at there also.
02-26-2009 06:32 AM
Andy,Thanks for your quick response. In fact , i kill all the bkp jobs and reboot the tape libary a few time already. And after reboot, i need to go GUI and update the volume configuration also. This is on the media management -> robot-> use inventory to use volume configuration -> update volume configurattion. After that all tapes are sync with media database again.
How can I solve out this disk error by the help by tape libary vendor.But they said, tape drives are OK and complain netbackup whenever they come. I used to unfreeze " bpmedia -unfreeze -ev ( media id ) -h ( media server id )". But it is quite often freeze ( more than 10times ) even single day. REALLY APPRECIATE YOUR COMMENTS.
Regards
02-26-2009 06:51 AM
If the same tapes are freezing again & again then I would suspect 'faulty' tapes & I would get them replaced.
If it is different tapes each time but the same tape drive then I would suggest the tape drive(s) at fault.
If it is any tape & any drive then I would suspect some NetBackup configuration issue, but I know nothing of 3.4 (started at 4.5 & now running 6.5.1 so things have changed dramatically since then!).
May be an idea to keep a log of all tape issues & what drives then you can use this as evidence to get tapes or drives replaced.
As far as the disk errors are concerned I would imagine that this is more of a hardware issue but I must admit that I don't recognise the error messages. I would hazard a guess at a monitoring application (RICHPse or SE Toolkit) that's running on your checking media server hardware - nothing to do with NetBackup.
02-26-2009 06:57 AM
Hi Andy,
Thanks. As i mentioned i changed some tapes under specific media server that was having problem. I suspect that tape drives also could be issue. But the vendor just came and change again and again and complain to netbackup software and config error.
Is your HW monitoring tool kits (RICHPse or SE Toolkit) able to use with overland tape library 4000 series ?
Appreciate your comments,
Regards,
02-26-2009 07:12 AM
Maybe someone has a bit more knowledge about 3.4 that could guide you through a few checks for your tape drive config as it may need looking at.... (Bob?)
As far as the SE Toolkit is concerned I've never used it so couldn't comment on its interaction with either NetBackup nor your tape library - but I don't see it as being a problem, it only appears to be notifying you of a possible issue?
Sorry I can't help any further.
02-26-2009 08:21 AM
I would start by checking the tape drives outside of NetBackup using the mt and tar commands.
mount a tape, write using tar, unmount tape, mount tape, read tar, unmount tape.
I would do that for every drive on every media server.
prove the tapedrives, device files, connections, media good at the OS level.
Then once I know I got a good base to work with, then I would troubleshoot the library.
Again outside of Netbackup. Do manual manipulations of inventory, moving tapes, mounting tapes,etc.
Then I would jump to Netbackup and run robtest.
Thats a start :)
02-26-2009 06:03 PM
Hi Andy,
Thanks for advise. In fact 3.4 and 4.xx versions are not that different, i think. Do you feel like it is something wrong with the netbackup configuration ? Pls.advise.
Thanks.
02-26-2009 06:48 PM
02-27-2009 05:18 AM
@NMT screen wrote:Do you feel like it is something wrong with the netbackup configuration ? Pls.advise.
Thanks.
yes. You most definitely have a configuration problem and that is exactly why I would go through the painful steps of proving the environment outside of NetBackup with the simple OS level commands. That is what I would do.
Steps to verify device configuration using robtest.
http://support.veritas.com/docs/264193
02-27-2009 06:29 AM
Hi,
Sorry for my late reply. I replaced the tapes which resulted status code 83 with new tapes . After that whenever I run bkp on these tapes ( without killing the jobs and just replace from the tape library ), I noticed mounting time is very long and never happen to backup again for all those new tapes.02/27/2009 00:14:40 - connecting
02/27/2009 00:14:40 - connected; connect
02/27/2009 00:14:40 - mounting media01
started : 02/27/09 00:14:40
Elapsed : 021:06:40
Ended :
/var/log/syslog
, relay=bkpserver001.pureIT.net. [203.166.10 .131], dsn=2.0.0, stat=Sent (n1RDXdm19900 Message accepted for delivery)
Feb 27 21:32:40 bkpserver002 sendmail[21261]: n1RDWao21259: to=bkmail@pureIT.net, ctladdr=root (0/1), delay=00:00:04, xdelay=00:00:04, mailer=relay, pri=120320, relay=bkpserver001.pureIT.net. [203.166.10 .131], dsn=2.0.0, stat=Sent (n1RDXdm19901 Message accepted for delivery)
Feb 27 21:32:41 bkpserver002 sendmail[21290]: n1RDWfH21290: from=root, size=320, class=0, nrcpts=1, msg id=<200902271332.n1RDWfH21290@bkpserver002.>, relay=root@localhost
Feb 27 21:32:42 bkpserver002 sendmail[21292]: n1RDWfH21290: to=bkmail@pureIT.net, ctladdr=root (0/1), delay=00:00:01, xdelay=00:00:01, mailer=relay, pri=120320, relay=bkpserver001.pureIT.net. [203.166.10 .131], dsn=2.0.0, stat=Sent (n1RDXfm19907 Message accepted for delivery)
Feb 27 21:45:01 bkpserver002 sendmail[21788]: n1RDj1421788: from=root, size=271, class=0, nrcpts=1, msg id=<200902271345.n1RDj1421788@bkpserver002.>, relay=root@localhost
Feb 27 21:45:02 bkpserver002 sendmail[21790]: n1RDj1421788: to=root, ctladdr=root (0/1), delay=00:00:01 , xdelay=00:00:00, mailer=local, pri=120271, relay=local, dsn=2.0.0, stat=Sent
/var/adm/messages
Feb 27 21:51:27 bkpserver002 tldcd[21943]: TLD(2) key = 0x0, asc = 0x0, ascq = 0x0, NO ADDITIONAL SENSE INFORMATION
Feb 27 21:51:27 bkpserver002 tldcd[21943]: TLD(2) Move_medium error: CHECK CONDITION
Feb 27 21:52:10 bkpserver002 ID[RICHPse.monlog.2000]: Disk state amber entered, Action: Move load from busy disks to idle disks
Feb 27 21:53:38 bkpserver002 tldcd[22020]: TLD(2) cannot dismount drive 1, slot 77 already is full
Feb 27 21:53:38 bkpserver002 tldcd[22020]: TLD(2) key = 0x0, asc = 0x0, ascq = 0x0, NO ADDITIONAL SENSE INFORMATION
/usr/openv/netbackup/logs/bptm/logs
21:57:18 [22110] <2> bptm: EXITING with status 0 <----------
21:58:20 [22166] <2> bptm: INITIATING: -U
21:58:20 [22166] <2> bptm: EXITING with status 0 <----------
21:58:21 [22169] <2> bptm: INITIATING: -U
21:58:21 [22169] <2> bptm: EXITING with status 0 <----------
21:59:00 [22187] <2> bptm: INITIATING: -count -cmd -rt 8 -rn 2
21:59:00 [22187] <2> bptm: EXITING with status 0 <----------
22:00:03 [22332] <2> bptm: INITIATING: -mlist -cmd
22:00:03 [22332] <2> bptm: EXITING with status 0 <----------
22:01:01 [22385] <2> bptm: INITIATING: -count -cmd -rt 8 -rn 2
22:01:01 [22385] <2> bptm: EXITING with status 0 <----------
22:01:34 [22403] <2> bptm: INITIATING: -count -cmd -rt 8 -rn 2 -stunit LT0-DNS2 -den 6 -mt 2 -masterversion 340000
22:01:34 [22403] <2> bptm: EXITING with status 0 <----------
22:01:35 [22405] <2> bptm: INITIATING: -count -cmd -rt 8 -rn 2 -stunit LTO-bkpserver002 -den 6 -mt 2 -masterversion 340000
22:01:35 [22405] <2> bptm: EXITING with status 0 <----------
22:01:36 [22407] <2> bptm: INITIATING: -count -cmd -rt 8 -rn 2 -stunit LTO-iaccssportal -den 6 -mt 2 -masterversion 340000
22:01:36 [22407] <2> bptm: EXITING with status 0 <----------
22:02:23 [22434] <2> bptm: INITIATING: -count -cmd -rt 8 -rn 0 -stunit DLT -den 13 -mt 2 -masterversion 340000
22:02:23 [22434] <2> bptm: EXITING with status 0 <----------
/usr/openv/netbackup/logs/bpdbm/logs
22:03:45 [22455] <2> image_db: Q_IMAGE_ADD_FRAGMENT (locking)
22:03:45 [22455] <4> bpdbm: request complete: exit status 0
22:03:45 [22456] <4> connected_peer: Connection from host bkpmediaserver01, 203.166.10.100, on non-reserved port 44299
22:03:45 [22456] <2> error_db: Q_ERRADD
22:03:45 [22456] <4> bpdbm: request complete: exit status 0
22:03:46 [22457] <4> connected_peer: Connection from host bkpmediaserver01, 203.166.10.100, on non-reserved port 44300
22:03:46 [22457] <2> image_db: Q_IMAGE_VALIDATE
22:03:46 [22457] <16> Default Retention: No user retention file
22:03:46 [22457] <4> bpdbm: request complete: exit status 0
22:03:47 [22458] <4> connected_peer: Connection from host bkpserver002, 203.166.10.100, on non-reserved port 58603
22:03:47 [22458] <2> error_db: Q_ERRADD
22:03:48 [22458] <4> bpdbm: request complete: exit status 0
/usr/openv/netbackup/logs
bkpserver002# ls Pls.let me know if you want any logs from the following.
bpbkar/ bpcd/ bphdb/ bpsched/ dbbackup/
bpbrm/ bpdbm/ bprd/ bptm/ user_ops/
Regards,
02-27-2009 07:49 AM
@NMT screen wrote:Hi,
Sorry for my late reply. I replaced the tapes which resulted status code 83 with new tapes .
This certainly appears to indicate a config issue - I would follow Bobs suggestion to use robtest & try & identify these mis-configurations.
Also, as a matter of interest how do you load tapes into your library? Do you use the individual 'mail slots' & inventory with "empty media access port" selected or do you manually fill the empty slots?
If the latter, you must take care as not all empty slots are 'empty' - the tape(s) may be loaded into one of your drives for a backup & so if these slots are filled with 'new' tapes the ones in the drives can no longer be returned to their original locations which could explain:
@NMT screen wrote:
/var/adm/messages
Feb 27 21:51:27 bkpserver002 tldcd[21943]: TLD(2) key = 0x0, asc = 0x0, ascq = 0x0, NO ADDITIONAL SENSE INFORMATION
Feb 27 21:51:27 bkpserver002 tldcd[21943]: TLD(2) Move_medium error: CHECK CONDITION
Feb 27 21:53:38 bkpserver002 tldcd[22020]: TLD(2) cannot dismount drive 1, slot 77 already is full
Feb 27 21:53:38 bkpserver002 tldcd[22020]: TLD(2) key = 0x0, asc = 0x0, ascq = 0x0, NO ADDITIONAL SENSE INFORMATION
Our operators did this once (& only once!!) & it caused no end of issues.
02-27-2009 10:33 PM
05-05-2009 06:53 AM
05-05-2009 08:16 AM
It sounds like you might not have the tape drives configured correctly.
Example 2 drives
on server you have rmt0 and rmt1
you configer drives in netbackup
robot drive 1 is rmt1 and robot drive 2 is rmt0
so robot mounts tape in robot drive 1, so media server is looking for a tape to show up in rmt1....
but....
robot drive 1 is NOT REALLY rmt1 it really is rmt0....
so the media server NEVER sees the tape mount in rmt1.
Try to conpare the serial numbers of the drives in your media server, to the drive locations in the library and make sure they match.
In my example robot drive 1 should be rmt0 and robot drive 2 should be rmt1.
05-06-2009 05:33 AM
Andy Welburn - would suspect some NetBackup configuration issue
Andy Welburn - I would follow Bobs suggestion to use robtest & try & identify these mis-configurations.
Stumpr - You most definitely have a configuration problem
J. Hinchcliffe - It sounds like you might not have the tape drives configured correctly.
Steps to verify device configuration using robtest.
http://support.veritas.com/docs/264193
05-06-2009 09:24 AM
05-06-2009 09:31 AM
05-06-2009 09:31 AM
05-06-2009 09:32 AM