02-28-2014 04:59 AM
Netbackup Env: Windows Server 2008 R2 + 4 media servers,
IBM Robot, EMC Celera fillers,
Master + Media NB version 7.5.0.7
I am trying to backup some emc filers through NDMP on tapes, IBM robot, we have 3 fillers, 2 are working but this one is not.
I attached the joblog, If you wany any other information please just ask (if you ask for other types of logs please provide location of it)
Error bptm() io_ioctl_ndmp (MTFSF) failed on media id IBM037, drive index 7, return code 18 (NDMP_XDR_DECODE_ERR) (bptm.c.7061) Error ndmpagent() connection 0000000000AA8970 ndmp_message_process_one failed, status = 18 (NDMP_XDR_DECODE_ERR) Error ndmpagent() NDMP backup failed, path = UNKNOWN Info bptm() EXITING with status 86 <---------- Info ndmpagent() done. status: 86: media position error end writing media position error(86
Thank You
//George
Solved! Go to Solution.
03-03-2014 06:41 AM
Just way too many hardware errors.
At this point in time, I cannot see that ALL of those drives can be faulty.
Probably a good idea to log a call with your hardware and server support team so that they can troubleshoot together.
Check every piece in the data path - from server hba up to the library and tape drives (including drivers).
What we can say with certainty is that NBU is merely reporting the errors, not causing it.
Please check Event Viewer System log as well.
**** EDIT ****
How would you like to add event logs?
Save them as text files and upload as attachments.
02-28-2014 05:29 AM
does the tape drives connected to filer?
if yes please post the the logs of NDMPd from the filer
does the same tape drives are attached to the other 2 filers which are working fine?
did you check with the other tapes , i am seeing the failures on IBM037
try with other tapes too... and other drives.
02-28-2014 07:19 AM
does the tape drives connected to filer?
if yes please post the the logs of NDMPd from the filer
does the same tape drives are attached to the other 2 filers which are working fine?
did you check with the other tapes , i am seeing the failures on IBM037
try with other tapes too... and other drives
I activated in the policy: Allow multiple data stream
The result is fantastic:
The policy has 4 cifs, each resulted in a child job:
Thanks
//George
02-28-2014 07:32 AM
so , its kind of isolation..
take out the volume that is giving the error. and run the backup for other 3 cifs with out Multple data streams and see how it goes..
if it got successfull, then keep the one that is having issue with to the test or isolated policy and let the other 3 run existing one.
then trigger the backup for issue cifs and collect the ndmpd from the filer admin.
03-03-2014 01:55 AM
So the Job finished as I 1st tested like I sad above:
Application Event:
TLD(0) [5288] Drive 1 (device 3) has not become ready. Last status: Data error (cyclic redundancy check). TLD(0) [5288] Could not get tape parameters for drive 1 (device 3): Data error (cyclic redundancy check). TLD(0) [12968] Drive 1 (device 5) has not become ready. Last status: The requested resource is in use.
I did not find anytghing regarding any of the tape ID in <Install_dir>\VERITAS\NetBackup\db\media\errors
I searched in the <Install_dir>\VERITAS\NetBackup\logs\bptm\<1.3.2014> almost the same time frame, attachment : bptm - time frame log.txt
03-03-2014 02:10 AM
I see the following in the job log:
2014-03-01 01:55:03 - granted resource IBM068 2014-03-01 01:55:03 - granted resource IBM.ULT3580-TD5.Drive1
the tape filled up, and a new tape was mounted. For some or reason, a different tape drive was chosen to carry on with the backup:
2014-03-01 08:39:16 - granted resource IBM047 2014-03-01 08:39:16 - granted resource IBM.ULT3580-TD5.Drive3
So, we have 2 new backup resources - media as well as a different tape drive.
The error reported by ndmpagent is 'Media error'.
2014-03-01 08:39:28 - Error ndmpagent(pid=11460) x: Medium error
Try to re-use this tape by adding it to a test pool and create a small test policy for the Windows media server to backup to its own STU using this test pool.
Let us know what the result is.
PS:
The bptm log was from the wrong media server or from the wrong date.
There is no evidence of jobid 430915 or bptm PID 5420.
03-03-2014 06:12 AM
one tape drive failed with
03/01/14 08:39:29 IBM047 6 TAPE_ALERT IBM.ULT3580-TD5.Drive3 0x00100000 0x00000000
Didn`t found the conversion of this HERE
0x00100000 0x00000000
I will test those tapes and get back with a full report.
03-03-2014 06:21 AM
Just WAY too many errors on different tape drives, different media....
13:28:41.887 [9328.12576] <2> io_read_block: read error on media id IBM046, drive index 8 reading header block, len = 0; No more data is on the tape. (1104) 13:28:41.887 [9328.12576] <16> io_position_for_write: cannot position media id IBM046 for write 13:28:41.887 [9328.12576] <2> send_MDS_msg: DEVICE_STATUS 1 118099786 x IBM046 4002703 IBM.ULT3580-TD5.Drive4 2000498 POSITION_ERROR 0 0 13:33:16.760 [12420.11092] <16> io_ioctl: io_ioctl_ndmp (MTFSF) failed on media id IBM037, drive index 7, return code 18 (NDMP_XDR_DECODE_ERR) (bptm.c.7061) 13:33:16.760 [12420.11092] <2> send_MDS_msg: DEVICE_STATUS 1 118108332 x IBM037 4002694 IBM.ULT3580-TD5.Drive5 2000497 POSITION_ERROR 0 0 13:33:16.775 [12420.11092] <2> log_media_error: successfully wrote to error file - 02/28/14 13:33:16 IBM037 7 POSITION_ERROR IBM.ULT3580-TD5.Drive5
13:43:59.262 [10776.12196] <2> io_read_block: read error on media id IBM069, drive index 7 reading header block, len = 0; No more data is on the tape. (1104) 13:43:59.262 [10776.12196] <16> io_position_for_write: cannot position media id IBM069 for write 13:43:59.262 [10776.12196] <2> send_MDS_msg: DEVICE_STATUS 1 118111934 x IBM069 4002726 IBM.ULT3580-TD5.Drive5 2000497 POSITION_ERROR 0 0 13:44:45.376 [8980.9836] <2> io_read_block: read error on media id XO0312, drive index 6 reading header block, len = 0; No more data is on the tape. (1104) 13:44:45.376 [8980.9836] <16> io_position_for_write: cannot position media id XO0312 for write 13:44:45.376 [8980.9836] <2> send_MDS_msg: DEVICE_STATUS 1 118099301 x XO0312 4002639 IBM.ULT3580-TD5.Drive3 2000496 POSITION_ERROR 0 0
13:51:10.822 [9340.4780] <2> send_MDS_msg: DEVICE_STATUS 1 118106543 x IBM022 4002679 IBM.ULT3580-TD5.Drive2 2000495 TAPE_ALERT 268435456 33554944
13:51:10.837 [9340.4780] <2> log_media_error: successfully wrote to error file - 02/28/14 13:51:10 IBM022 5 TAPE_ALERT IBM.ULT3580-TD5.Drive2 0x10000000 0x02000200
13:51:10.837 [9340.4780] <16> process_tapealert: TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive IBM.ULT3580-TD5.Drive2 (index 5), Media Id IBM022
13:51:10.837 [9340.4780] <8> process_tapealert: TapeAlert Code: 0x27, Type: Warning, Flag: DIAGNOSTICS REQ., from drive IBM.ULT3580-TD5.Drive2 (index 5), Media Id IBM022
13:51:10.837 [9340.4780] <16> process_tapealert: TapeAlert Code: 0x37, Type: Critical, Flag: LOADING FAILURE, from drive IBM.ULT3580-TD5.Drive2 (index 5), Media Id IBM022
03-03-2014 06:41 AM
Just way too many hardware errors.
At this point in time, I cannot see that ALL of those drives can be faulty.
Probably a good idea to log a call with your hardware and server support team so that they can troubleshoot together.
Check every piece in the data path - from server hba up to the library and tape drives (including drivers).
What we can say with certainty is that NBU is merely reporting the errors, not causing it.
Please check Event Viewer System log as well.
**** EDIT ****
How would you like to add event logs?
Save them as text files and upload as attachments.
03-03-2014 06:53 AM
I added for each media error the db\media\error that coresponds to it int eh xlsx file, but here is the file in the attachement.
How would you like to add event logs?
Log Name: Application Source: NetBackup Tape Manager Date: 2014-03-01 08:09:17 Event ID: 0 Task Category: None Level: Error Keywords: Classic User: N/A Computer: x Description: TapeAlert Code: 0x37, Type: Critical, Flag: LOADING FAILURE, from drive IBM.ULT3580-TD5.Drive2 (index 5), Media Id IBM069 Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="NetBackup Tape Manager" /> <EventID Qualifiers="0">0</EventID> <Level>2</Level> <Task>0</Task> <Keywords>0x80000000000000</Keywords> <TimeCreated SystemTime="2014-03-01T07:09:17.000000000Z" /> <EventRecordID>8797608</EventRecordID> <Channel>Application</Channel> <Computer>x</Computer> <Security /> </System> <EventData> <Data>TapeAlert Code: 0x37, Type: Critical, Flag: LOADING FAILURE, from drive IBM.ULT3580-TD5.Drive2 (index 5), Media Id IBM069</Data> </EventData> </Event>
04-14-2014 03:39 AM
Thank you all for reply, I took this problem with ibm and symantec support.
//George
04-14-2014 04:10 AM
so what is the Problem area....
it may be usefull for the one who is looking for solution for similler issue...
04-14-2014 04:14 AM
When we will definitize the problem, I will come with the solution, until now we found tape HEADER error, and the step is to duplicate the data and erase the tapes with this issue.
As well we are investigating the NDMP configuration, drive paths, ibm drives.
//George
04-14-2014 04:41 AM
When you find the REAL solution, please feel free to unmark my post as solution.
We can then mark your own post as Solution.