Forum Discussion

Herb_P's avatar
Herb_P
Level 3
9 years ago
Solved

NetBackup 7.7.2 NDMP Accelerator error file read failed (13)

I recently upgraded my environment (master server running RedHat 6.4, media servers running CentOS 6.4 and NetApp Filers 8.2.1) from 7.6.0.4 to 7.7.2. I have converted 13 of my existing policies to start using the NDMP accelerator and 11 of the work fine but 2 of them are failing with read failed (13)  in the middle of the backup. The backup will run successfully if the accelerator option is deselected.

Here is the output from the job run:

03/09/2016 12:16:45 - Info nbjm (pid=22654) starting backup job (jobid=102658) for client ntap07a-bkup.idt.net, policy ntap07a-vol0, schedule ntap07a-vol0-full

03/09/2016 12:16:45 - Info nbjm (pid=22654) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=102658, request id:{B2FCC714-E61A-11E5-BCE4-8242B446BE8E})

03/09/2016 12:16:45 - requesting resource sombu01_msdp

03/09/2016 12:16:45 - requesting resource pybu01.idt.net.NBU_CLIENT.MAXJOBS.ntap07a-bkup.idt.net

03/09/2016 12:16:45 - requesting resource pybu01.idt.net.NBU_POLICY.MAXJOBS.ntap07a-vol0

03/09/2016 12:16:45 - granted resource  pybu01.idt.net.NBU_CLIENT.MAXJOBS.ntap07a-bkup.idt.net

03/09/2016 12:16:45 - granted resource  pybu01.idt.net.NBU_POLICY.MAXJOBS.ntap07a-vol0

03/09/2016 12:16:45 - granted resource  MediaID=@aaact;DiskVolume=PureDiskVolume;DiskPool=sombu01_pd;Path=PureDiskVolume;StorageServer=sombu01.idt.net;MediaServer=sombu01.idt.net

03/09/2016 12:16:45 - granted resource  sombu01_msdp

03/09/2016 12:16:45 - estimated 9254388 kbytes needed

03/09/2016 12:16:45 - Info nbjm (pid=22654) started backup (backupid=ntap07a-bkup.idt.net_1457543805) job for client ntap07a-bkup.idt.net, policy ntap07a-vol0, schedule ntap07a-vol0-full on storage unit sombu01_msdp

03/09/2016 12:16:46 - Info bpbrm (pid=6933) ntap07a-bkup.idt.net is the host to backup data from

03/09/2016 12:16:46 - Info bpbrm (pid=6933) reading file list for client

03/09/2016 12:16:46 - started process bpbrm (pid=6933)

03/09/2016 12:16:47 - Info bpbrm (pid=6933) accelerator enabled

03/09/2016 12:16:49 - Info bpbrm (pid=6933) starting ndmpagent on client

03/09/2016 12:16:49 - Info ndmpagent (pid=6943) Backup started

03/09/2016 12:16:49 - connecting

03/09/2016 12:16:49 - connected; connect time: 0:00:00

03/09/2016 12:16:50 - Info bpbrm (pid=6933) bptm pid: 6946

03/09/2016 12:16:50 - Info ndmpagent (pid=6943) PATH(s) found in file list = 1

03/09/2016 12:16:50 - Info ndmpagent (pid=6943) PATH[1 of 1]: /vol/vol0

03/09/2016 12:16:50 - Info bptm (pid=6946) start

03/09/2016 12:16:50 - Info bptm (pid=6946) using 512 data buffers

03/09/2016 12:16:50 - Info bptm (pid=6946) using 1048576 data buffer size

03/09/2016 12:16:51 - Info bptm (pid=6946) start backup

03/09/2016 12:16:54 - Info ndmpagent (pid=6943) accelerator optimization is <on>

03/09/2016 12:16:54 - Info ndmpagent (pid=6943) NDMP Remote disk

03/09/2016 12:16:54 - begin writing

03/09/2016 12:16:55 - Info ndmpagent (pid=6943) BASE_DATE will be used to determine changed files for accelerator

03/09/2016 12:16:55 - Info ndmpagent (pid=6943) please ignore references to LEVEL in future messages

03/09/2016 12:16:55 - Info ndmpagent (pid=6943) ntap07a-bkup.idt.net: DUMP: creating "/vol/vol0/../snapshot_for_backup.34020" snapshot.

03/09/2016 12:16:59 - Info ndmpagent (pid=6943) ntap07a-bkup.idt.net: DUMP: Using inowalk incremental dump for Full Volume 

03/09/2016 12:17:20 - Info ndmpagent (pid=6943) ntap07a-bkup.idt.net: DUMP: Date of this level 1 dump: Wed Mar  9 12:16:56 2016.

03/09/2016 12:17:20 - Info ndmpagent (pid=6943) ntap07a-bkup.idt.net: DUMP: Date of last level 0 dump: Sun Mar  6 00:00:35 2016.

03/09/2016 12:17:20 - Info ndmpagent (pid=6943) ntap07a-bkup.idt.net: DUMP: Dumping /vol/vol0 to NDMP connection

03/09/2016 12:17:21 - Info ndmpagent (pid=6943) ntap07a-bkup.idt.net: DUMP: mapping (Pass I)[regular files] 

03/09/2016 12:17:21 - Info ndmpagent (pid=6943) ntap07a-bkup.idt.net: DUMP: mapping (Pass II)[directories]

03/09/2016 12:17:21 - Info ndmpagent (pid=6943) ntap07a-bkup.idt.net: DUMP: estimated 2094269 KB.

03/09/2016 12:17:21 - Info ndmpagent (pid=6943) ntap07a-bkup.idt.net: DUMP: dumping (Pass III) [directories]

03/09/2016 12:17:22 - Info ndmpagent (pid=6943) 0 entries sent to bpdbm

03/09/2016 12:17:22 - Info ndmpagent (pid=6943) ntap07a-bkup.idt.net: DUMP: dumping (Pass IV) [regular files]

03/09/2016 12:18:53 - Info ndmpagent (pid=6943) ntap07a-bkup.idt.net: ENHANCED_DAR_ENABLED is 'T'

03/09/2016 12:18:53 - Info ndmpagent (pid=6943) ntap07a-bkup.idt.net: ACL_START is '2144524288'

03/09/2016 12:18:53 - Info ndmpagent (pid=6943) ntap07a-bkup.idt.net: DUMP: dumping (Pass V) [ACLs]

03/09/2016 12:18:53 - Info ndmpagent (pid=6943) ntap07a-bkup.idt.net: DUMP: 2094269 KB

03/09/2016 12:18:54 - Info ndmpagent (pid=6943) ntap07a-bkup.idt.net: DUMP: DUMP IS DONE

03/09/2016 12:19:00 - Info ndmpagent (pid=6943) ntap07a-bkup.idt.net: DUMP: Deleting "/vol/vol0/../snapshot_for_backup.34020" snapshot.

03/09/2016 12:19:09 - Info ndmpagent (pid=6943) ntap07a-bkup.idt.net: DUMP_DATE is '10047478408'

03/09/2016 12:19:09 - Info ndmpagent (pid=6943) accelerator sent 1369162240 bytes out of 7888437248 bytes to server, optimization 82.6%

03/09/2016 12:19:20 - Info ndmpagent (pid=6943) NDMP backup successfully completed, path = /vol/vol0

03/09/2016 12:19:21 - Error bpbrm (pid=6933) db_FLISTsend failed: file read failed (13)

03/09/2016 12:19:22 - Info ndmpagent (pid=0) done

03/09/2016 12:19:26 - Error bptm (pid=6946) media manager terminated by parent process

03/09/2016 12:19:26 - Info sombu01.idt.net (pid=6946) StorageServer=PureDisk:sombu01.idt.net; Report=PDDO Stats for (sombu01.idt.net): scanned: 7704554 KB, CR sent: 11267 KB, CR sent over FC: 0 KB, dedup: 99.9%, cache disabled

03/09/2016 12:19:27 - Info ndmpagent (pid=0) done. status: 13: file read failed

03/09/2016 12:19:27 - end writing; write time: 0:02:33

file read failed  (13)

 

Has anyone else come across this yet and know what the issue is? I am planning on opening a support ticket for the issue.

 

Thank you.

 

  • Version 2 of EEB #3890236 has resolved my issue. There have been no failures since it was applied.

  • Further update: various EEBs were applied to our 7.7.2 environment and we were still getting the same error. I was requested to upgrade to 7.7.3 and apply EEB 3890236 but the issue still persisted. Engineering created a newer version of this EEB and it has been installed and running ok since 7/26.

    I will provide more feedback as this progresses. Let's hope this does the trick.

    • Herb_P's avatar
      Herb_P
      Level 3

      Version 2 of EEB #3890236 has resolved my issue. There have been no failures since it was applied.

  • Hopefully you have logged a Support call by now.

    We see that bpbrm is faulting with this message:

    Error bpbrm (pid=6933) db_FLISTsend failed: file read failed (13) 

    Do you have level 3 or higher bpbrm log on the media server?

  • Hi Marianne,

    Yes I do have a case open with Support. I have been working wih them since I posted this mesage. They have not seen this particular issue before. At this point they are requesting that I open a case with NetApp to understand some of the error messages they are seeing in the ndmpd logs.

    I will update this thread once a resolution to the issue is found.

  • Latest update: this appears to be a bug and there are some other cases open with the same issue. We are waiting for the engineers to determine the cause and come up with a fix. I will provide additional information as I get it.

  • Hi,

     

    It is also failing for me. I have also this error on the bpdbm:

     

    03:40:52.795 [9476.7900] <8> NBE_CatImage::ndmp_addNode: duplicate NODE for inode 7838707

    03:40:52.795 [9476.7900] <16> NBE_CatImage::DataMatches: createtime changed for inode 7838707 (old: 1461588871, new 1461919572)

    03:40:52.920 [9476.7900] <32> Function: NBE_CatImage::ndmp_addNode: File: ../nbe_cat_image.cpp Line: 1803: invalid input data,

    03:40:52.920 [9476.7900] <16> add_files_alt: Error adding file rec 13

    03:40:52.982 [9476.7900] <2> process_request: request complete: exit status 13 file read failed; query type: 78

     

    I have an open case scalated to BL and they have told me there are also other customers having the same problem.

  • I was just informed that an EEB was released for this issue. It is 7.7.2.ET3875450 and I just installed it today. I will let you know if the problem still happens.

  • I have received also the EEB, but is for 2.7.2 so we are upgrading from 2.7.1 first.

    I am also curious to know if it works :-)

  • So far on the 5 policies being tested there have not been any new incidents encountered, but I want to let it run for a bit before stating that it is resolved as backups have only been running since 6/1.

    After I installed the EEB I was informed that the only track log data had to be removed from the Master server and the Media server in order to completely remove any of the old informaion. For me the Master server (Unix) files are located under /usr/openv/netbackup/db/track and the Media server (Unix) files are located under /usr/openv/netbackup/track. From there they are under Maseter_server/Media_server/NDMP_server/Policy_name.