End of tape operation

nitinajgaonkar · ‎06-27-2012

Dear All

Following is the isuue that is faced by us. When we start the backup process that data gets written to the cartridges but before all data is written an end of tape error occurs and the tape cartridge goes in a frozen state.

Following is troubleshooting done

1) Checked the media ids by going to the management page of the autoloader and they are proper . that is no misreading of the media id by the tape drive

2) Checked the write protect notch and data can be written to the cartridge

3) Cleaned the head of the tape drive by using cleaning cartridge

4) The data in the tape is a data that can be understood by netbackup

5) Also checked no data from the netbackup catalog (catalog backup) is written on the tape cartridge

Kindly provide a solution to the issue

nitin

Marianne · ‎06-27-2012

Please provide following info:

NBU version on media server

OS (including version number and bit level)

Drivers used for tape drives (especially when Windows media server is used)

OS Event and/or System log (different for all supported OS's).

Please bear the following in mind when troubleshooting media/devices:

As an application, NetBackup has no direct access to a device, instead relying on the operating system (OS) to handle any communication with the device. This means that during a write operation NetBackup asks the OS to write to the device and report back the success or failure of that operation. If there is a failure, NetBackup will merely report that a failure occurred, and any troubleshooting should start at the OS level. If the OS is unable to perform the write, there are three likely causes; OS configuration, a problem on the SCSI path, or a problem with the device.

Handy NetBackup Links

mph999 · ‎06-27-2012

What is the exact error - is it "Physical end of media has been reached" (or similar ) ?

If, so this is the Technote:

http://www.symantec.com/docs/TECH139306

Note, EEB 2182228 is an update binary which is the one that should be requested.

Martin

nitinajgaonkar · ‎06-28-2012

Dear Marianne madam

Thanks for the response

NBU version - Symantec Netbackup 7.1

OS - RHEL 5.3 Server 64 bit edition

For tape drive - It is a Dell autoloader (Powervault T4000) . It is connected to the EMC VNX 5300(NDMP Host)

Thanks and Regards

Nitin

mph999 · ‎06-28-2012

You have completey ignored my post, and not provided all the detials Marianne mentioned ...

We still cannot answer the question, as the there are no detaisl of the issue.

(1)

What is the exact error message you see - does it match the TN / details I posted.

(2)

Please supply the system logs also.

Do part (1) first, then (2).

Then ...

(3) Goto https://www-secure.symantec.com/connect/forums/netbackup-basics-and-how-make-your-life-easier

Read parts (B) and (V)

Many thanks,

Martin

nitinajgaonkar · ‎06-28-2012

Dear Martin,

As stated the error message that we get is a

MEDIA WRITE ERROR MESSAGE

I am attaching the bptm logs as stated by you.

Thanks and regards

Nitin

mph999 · ‎06-28-2012

OK, thank you for the logs. It is always important to give the exact error message as you have now done.

I see in the log, as you explained, the backup is writtensuccessfully to previous media, and the issue happens when another media is loaded.

03:23:01.926 [23918] <2> signal_parent: sending SIGUSR1 to bpbrm (pid = 23915)

03:23:01.926 [23918] <2> io_ioctl: command (5)MTWEOF 1 0x0 from (bptm.c.19634) on drive index 0

03:23:04.620 [23918] <2> io_ioctl: command (2)MTBSF 1 0x0 from (bptm.c.19671) on drive index 0

03:23:07.094 [23918] <2> io_close: closing /usr/openv/netbackup/db/media/tpreq/drive_IBM.ULT3580-HH5.000, from bptm.c.19700

03:23:07.096 [23918] <2> NdmpMediaSession_close_public_and_ndmpagent[0]: Saving ndmp public and ndmpagent sessions (save flag is set)

03:23:07.096 [23918] <2> write_backup: block position check: actual 37936725, expected 37936725

03:23:07.096 [23918] <2> write_backup: EOM encountered --- Fragmenting, TWIN_INDEX 0

EOM here ...

03:25:29.990 [23918] <16> io_write_block: ndmp write error on media id 0010L5, drive index 0, writing header block, error code 7 (NDMP_IO_ERR)

03:25:29.990 [23918] <2> send_MDS_msg: DEVICE_STATUS 1 32 gngbnsws4 0010L5 4000007 IBM.ULT3580-HH5.000 2000008 WRITE_ERROR 0 0

03:25:29.990 [23918] <2> vnet_cached_getaddrinfo_and_update: ../../libvlibs/vnet_addrinfo.c.1370: 0: found in cache name: 127.0.0.1

03:25:29.990 [23918] <2> vnet_cached_getaddrinfo_and_update: ../../libvlibs/vnet_addrinfo.c.1371: 0: found in cache service: NULL

03:25:30.004 [23918] <2> log_media_error: successfully wrote to error file - 06/27/12 03:25:29 0010L5 0 WRITE_ERROR IBM.ULT3580-HH5.000

03:25:30.004 [23918] <2> check_error_history: just tpunmount: called from bptm line 22004, EXIT_Status = 84

03:25:30.004 [23918] <2> io_close: closing /usr/openv/netbackup/db/media/tpreq/drive_IBM.ULT3580-HH5.000, from bptm.c.16264

03:25:37.004 [23918] <2> NdmpMediaSession_close_public_and_ndmpagent[0]: Saving ndmp public and ndmpagent sessions (save flag is set)

The (drive) path used in all the log is the same

03:23:07.394 [12947] <2> open_ndmp_device: ndmp_drive_name is /ndmp:c32t0l0

03:24:44.800 [12947] <2> open_ndmp_device: ndmp_drive_name is /ndmp:c32t0l0

03:24:44.983 [23918] <2> open_ndmp_device: ndmp_drive_name is /ndmp:c32t0l0

03:25:29.896 [23918] <2> open_ndmp_device: ndmp_drive_name is /ndmp:c32t0l0

03:25:37.235 [13025] <2> open_ndmp_device: ndmp_drive_name is /ndmp:c32t0l0

03:26:18.009 [13025] <2> open_ndmp_device: ndmp_drive_name is /ndmp:c32t0l0

03:26:18.824 [13071] <2> open_ndmp_device: ndmp_drive_name is /ndmp:c32t0l0

03:27:02.050 [13071] <2> open_ndmp_device: ndmp_drive_name is /ndmp:c32t0l0

03:28:24.013 [13139] <2> open_ndmp_device: ndmp_drive_name is /ndmp:c32t0l0

03:30:10.239 [13139] <2> open_ndmp_device: ndmp_drive_name is /ndmp:c32t0l0

03:30:10.426 [13071] <2> open_ndmp_device: ndmp_drive_name is /ndmp:c32t0l0

03:30:54.348 [13071] <2> open_ndmp_device: ndmp_drive_name is /ndmp:c32t0l0

03:32:20.217 [13283] <2> open_ndmp_device: ndmp_drive_name is /ndmp:c32t0l0

03:34:02.535 [13283] <2> open_ndmp_device: ndmp_drive_name is /ndmp:c32t0l0

which I guess rules out a previous known issue where the wrong path was used after EOM. That was when the drives were attached I think directly to the filer, but after a new tape was loaded an OS path was used in error ... we do not see that .

But ...

There is a bug at 7.1 where NDMP backups end in status 84 after EOM has occurred, this matches exactly.

S2132186 2131806 NDMP backupswould endwith a Status 84,when EOM was invoked.

It is ETrack 2131806

So first thing is to check out the details of this eTrack.

Martin

mph999 · ‎06-28-2012

OK, would need the ndmpagent log ...

Set it up like this :

vxlogcfg -a -p 51216 -o 134 -s DebugLevel=6 -s DiagnosticLevel=6

BUT ...

This eTrack / Bug appears to ONLY affect BlueArc filers - you have EMC VNX 5300.

Now, is it possible the EMC device is a rebadged BlueArc - I do not know ? If it is, it would mean it is quite likely this issue is as per the etrack.

HDS bought BlueArc I believe - now HDS make disk arrays sold by HP (as the XP range) so it is quite possible that they sell kit to EMC ???

Any how, at 7.1 the log should be in /usr/openv/logs/ndmpagent - if you post up the log files you have that match the date of the error (06/27/12) maybe we will see something. If you do not have the logs, or they are not detailed enough, you'll have to increase the details (command above) and wait for the error again. Then send in the log AND a new bptm log to match (logs must ALWAYS) cover the same time period.

Having just said all that - I see in the etrack logs that the bptm freezes the media with the "external event caused rewind messagae) which I don't see in your log (but only looked in one so far, you posted two up).

This may not mean the bug doesn't match, on different systems the symptoms may be slightly different. In my view - 'it's close enough' is valid enough to continue to investiage a possible cause. Maybe you spend ages investigating it only to prove it is not the issue after all, but that's how it is with troubleshooting such a complex product, some you win, some you lose. If nothing else, you elminate it from the cause of possibilities.

If this is not a rebadged BlueArc filer , I guess you could still have the same/ similar issue. So far it has only been seen on one BlueArc filer (only one customer) but who is to say you are are not the second ...

From the eTrack, this is what we would be looking for in the ndmpagent log ...

The ndmpagent log shows that it recieves the signal to stop, and then attempts 
to repostition media, presuming a rewind, with NDMP_TAPE_MTIO, followed by an 
ABORT message recieved

I can't put any more details than that up on the public forum, but it's easy enough to spot in the log.

Look for a line containing NDMP_TAPE_MTIO followed afterwards (poss a few lines down) by a line containing ABORT.

You can also consider this ...

What is the chance that you have an NDMP device showing the same symptoms as a known issue, at the same version of NBU (7.1) but has a different cause ???

I don't know - but I would certainly say that the possibility of this iss ue being the same should be investigated.

Apologies if it turns out not to be - but until you look you will never know. I have learnt the hard way not to say, yea, it's close but probably isn't the same problem, let's look elsewhere ... Only to come back some time later and find it is the casue ...

It's a gamble ... are you feelin lucky ... :0)

If it is the same, you will have to log a call, as the EEB that was released was written specifically for ine customer, and cannot be given out via a technote, it has to be cleared by backline/ engineering, or even a new EEB produced.

Martin

nitinajgaonkar · ‎06-29-2012

Dear Martin sir

Thanks for your detailed observation . As stated by you i am attaching the ndmpagent logs so that the problem can be further analyzed

Thanks for your prompt help regarding this issue

Nitin

mph999 · ‎06-29-2012

NOt that much in here - I the log level is not too high ...

917 TID:47841130630512 File ID:134 [No context] 1 [CtnLogMsgCB] NDMP_LOG_NORMAL 0 End of tape reached. Load next tape.

27/06/2012 03:23:01.813 [Diagnostic] NB 51216 ndmpagent 134 PID:23917 TID:47841130630512 File ID:134 [No context] 2 V-134-38 [NdmpBackupManager::NotifyMoverPaused] received MOVER_PAUSED reason = 1 (NDMP_MOVER_PAUSE_EOM)

27/06/2012 03:23:01.873 [Diagnostic] NB 51216 ndmpagent 134 PID:23917 TID:47841130630512 File ID:134 [No context] 4 V-134-24 [NdmpBackupManager::UpdateTotalKbytes] Records 5237873 KbytesThisPath 335223872 KbytesThisFragment 335223872

27/06/2012 03:25:37.004 [Debug] NB 51216 ndmpagent 134 PID:23917 TID:47841130630512 File ID:134 [No context] 1 [XmServerControl::ProcessControlMessage] Received ABORT request

27/06/2012 03:25:37.004 [Diagnostic] NB 51216 ndmpagent 134 PID:23917 TID:47841130630512 File ID:134 [No context] 1 V-134-19 [NdmpAgent::SetErrorAndHalt] XmServerControl.cpp(1,121) - error code 150 (termination requested by administrator)

27/06/2012 03:25:37.004 [Diagnostic] NB 51216 ndmpagent 134 PID:23917 TID:47841130630512 File ID:134 [No context] 4 V-134-24 [NdmpBackupManager::UpdateTotalKbytes] Records 5237873 KbytesThisPath 335223872 KbytesThisFragment 335223872

27/06/2012 03:25:37.005 [Application] NB 51216 ndmpagent 134 PID:23917 TID:47841130630512 File ID:134 [No context] [Error] V-134-32 NDMP backup failed, path = /NFS_03a

I don't see any NDMP_TAPE_MTIO lines, but this could be due to the log level ...

I do see ABORT though, so this seems to match the etrack.

So, adding this lot up ... I think you should log a call for further investigation as I think it's heading towards needing BL involvement.

Recomment copying my details from this post into the call, no point in someone doing the work twice.

Please post case number up here.

Regards,

Martin

VOX

End of tape operation