Status: 23: socket read failed - NDMP Backups

ali_hassan · ‎01-05-2015

Hi Guys,

I recently changed my tape drives from LTO5 to LTO6 and ever since then I am facing issues with backing up the NDMP filer data, I initially had selected to backup the entire volume and when I back up the informtion it fails somewhere in the middle with

==========================

02/01/2015 21:12:33 - Info ndmpagent(pid=7832) iaesha1501: DUMP: Tape write failed.
02/01/2015 21:12:33 - Info ndmpagent(pid=7832) iaesha1501: DUMP: DUMP IS ABORTED
02/01/2015 21:12:36 - Info ndmpagent(pid=7832) iaesha1501: DUMP: Deleting "/vol/Replicated/../snapshot_for_backup.27" snapshot.
02/01/2015 21:12:37 - Error ndmpagent(pid=7832) iaesha1501: DATA: Operation terminated: EVENT: I/O ERROR (for /vol/Replicated/)
02/01/2015 21:12:37 - Error ndmpagent(pid=7832) NDMP backup failed, path = /vol/Replicated/
02/01/2015 21:12:37 - Error bptm(pid=8108) none of the NDMP backups for client iaesha1501 completed successfully
02/01/2015 21:13:39 - Info bptm(pid=8108) EXITING with status 99 <----------
02/01/2015 21:13:39 - Info ndmpagent(pid=0) done. status: 99: NDMP backup failure
02/01/2015 21:13:39 - end writing; write time: 5:57:18
NDMP backup failure(99)

===========================

So I decided the break the backup into chunks and backup each folder in a separate job so now I get this message.

=============================

05/01/2015 14:57:17 - Info ndmpagent(pid=10460) iaesha1502: ENHANCED_DAR_ENABLED is 'T'
05/01/2015 14:57:17 - Info ndmpagent(pid=10460) iaesha1502: ACL_START is '731361428480'
05/01/2015 14:57:23 - Info ndmpagent(pid=10460) iaesha1502: DUMP: dumping (Pass V) [ACLs]
05/01/2015 14:57:23 - Info ndmpagent(pid=10460) iaesha1502: DUMP: 714224243 KB
05/01/2015 14:57:23 - Info ndmpagent(pid=10460) iaesha1502: DUMP: DUMP IS DONE
05/01/2015 14:57:24 - Info ndmpagent(pid=10460) iaesha1502: DUMP: Deleting "/vol/Replicated/../snapshot_for_backup.29" snapshot.
05/01/2015 14:57:26 - Info ndmpagent(pid=10460) NDMP backup successfully completed, path = /vol/Replicated/Organisation
05/01/2015 14:57:26 - Info ndmpagent(pid=10460) 1934523 entries sent to bpdbm
05/01/2015 14:57:38 - Info bptm(pid=8840) EXITING with status 23 <----------
05/01/2015 14:57:38 - Critical bpbrm(pid=11040) unexpected termination of client iaesha1502
05/01/2015 14:57:38 - Info ndmpagent(pid=0) done. status: 23: socket read failed
05/01/2015 14:57:38 - end writing; write time: 2:32:31
socket read failed(23)

=============================

It looks like from the above message tht the data backup is complete and at the end it errors out?

Thanks

A

Abesama · ‎01-05-2015

Hi Ali,

I would like to raise two points.

Firstly, given that the error started after the tape drive change, it is possible that the hardware compatibility is not being met in your environment.
Regardless of what the documents and support matrix say it is true in the real life that some devices simply do not fit in together.
I know it well myself that it does not sound very attractive to go back and check the hardware which has been already purchased and installed with so much effort.
But last month I also had similar issue (socket read failed with NDMP backup) fixed by replacing the HBA module of the NAS box so I thought I'd bring it to your attention.

Secondly and more easily, try checking out the steps in this tech article if you haven't yet - http://www.symantec.com/docs/TECH205899 - this does not involve as much trouble as a backout of the installed hardware.

Hope this helps and good luck.

AK

P.S. The "I/O ERROR" with the full volume backup also make it sound like this is caused by the communication problem between the filer and the tape drive. Also try checking on the system log on the NAS box itself and engage hardware vendor support. If you log ticket with Symantec tech support they would probably ask you to do the same.

ali_hassan · ‎01-05-2015

Hi,

Yes, I completely agree with you because I had a similar issues with the drives when they were installed on a differentr site so I had to revert it back to LTO5 and now everything is back to normal.

The thing is that I use a Quantum i80 library and one of my libraries the LTO6 drive which I have installed are working just fine so I think they are compatible but for some reason they are giving me issues else where..

The firmware of the library and the drives are the same as the ones where its working... and I did look the TECH not and created the files but still no luck. I have now ordered for new Quantum OM3 Fiber Optic cables so see if it is a communiction issue... I already use OM3 cables but wanted to replace them as that the only thing left to do...

ali_hassan · ‎01-06-2015

I had enabled bptm logging in that I see this error...

=========

14:57:31.094 [8840.6244] <4> write_backup: successfully wrote backup id iaesha1502_1420446153, copy 1, fragment 1, 714224384 Kbytes at 78155.478 Kbytes/sec
14:57:31.656 [8840.6244] <8> retry_getaddrinfo_for_real: [vnet_addrinfo.c:1059] getaddrinfo() failed RV=11001 NAME=10.104.96.24 SVC=0
14:57:31.656 [8840.6244] <8> retry_getaddrinfo: [vnet_addrinfo.c:894] retry_getaddrinfo_for_real failed RV=11001 NAME=10.104.96.24 SVC=0
14:57:31.656 [8840.6244] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1643] retry_getaddrinfo() failed RV=11001 NAME=10.104.96.24 SVC=NULL
14:57:31.656 [8840.6244] <8> vnet_cached_getaddrinfo: [vnet_addrinfo.c:1271] vnet_cached_getaddrinfo_and_update() failed 6 0x6
14:57:31.687 [8840.6244] <2> vnet_bind_to_port_addr_extra_ipi: vnet_bind.c.195: 0: Function failed: 10 0x0000000a
14:57:31.702 [8840.6244] <16> NdmpMoverSession[0]: ERROR Start failed
14:57:31.702 [8840.6244] <16> check_and_process_mover_tasks: NDMP Mover Client Setup failed

========

Also on a side note.. I ran a different backup job on the same volume where the folder size was very small.. and it succeeded but some how the large jobs fail..

ali_hassan · ‎01-06-2015

I have just modified the ndpm.cfg file under /db/config folder with the line NDMP_MOVER_CLIENT_DISABLE and I am running the backup again so lets see...

Although I dont really understand the real impact of this procedure, they say the backup should not be impacted but other stuff might? because it does block level file transfer as opposed to streming!

Not sure what that means..:-/

watsons · ‎01-06-2015

Both the jobs that failed with error 99 & 23 took only a short time to fail, so my first guess it is not a timeout issue.

bptm shows a "mover client setup", so it looks related to the agent communication, You can try to enable DebugLevel=6 on ndmpagent (OID=134) logs to see if there is any error message within.

VOX

Status: 23: socket read failed - NDMP Backups