cancel
Showing results forΒ 
Search instead forΒ 
Did you mean:Β 

NDMP Backup failing

steve75
Level 3

Hi

I have an issue with NDMP backups failing which I'm hoping to find a solution for. I have configured a new appliance running 7.6.0.3 and added it into an existing master media config. Master is windows running 7.6.0.2 and the other media is RHEL running 7.6.0.2. The ndmp host is a netapp already configured and I can successfully write ndmp backups from it to disk storage unit on the master/media which is a Quantum DXI disk pool. The new appliance has an MSDP pool, configured as a disk storage unit and I am trying to write ndmp backups to it. I have added a host file entry on the appliance to it is using the correct Netapp interface. I can telnet to the Netapp on that interface from the appliance on 10000 and tpautoconf -verify also works from the appliance. When I start a backup, I can see a new connection on the netapp by running ndmpd status so I know its making a connection. However, that backup fails with error 99. The below are snippets from ndmpagent, bptm and bpbrm

bpbrm

14:03:24.075 [118024] <2> logparams: -backup -S att16bck01 -c ATT16NET01 -ct 19 -ru root -cl DXG16-NMDP-VNAS01 -sched Full -bt 1427979801 -dt 0 -st 0 -b ATT16NET01_1427979801 -mediasvr att16nba01 -jobid 358756 -jobgrpid 358756 -masterversion 760000 -maxfrag 51200 -bpstart_time 1427980103 -reqid -1427812073 -mt 0 -to 0 -stunit att16nba01-shared -rl 2 -rp 1814400 -eari 0 -p PureDiskVolume -mst 6 -flags 2 -storagesvr att16nba01 -sts_type PureDisk -use_ofb -use_otm -jm -secure 1 -kl 28 -rg other -fso -dmplevel 0 -connect_options 16974338
14:03:24.075 [118024] <4> bpbrm main: ensured stderr cannot be used
14:03:24.088 [118024] <2> vnet_pbxConnect: pbxConnectEx Succeeded
14:03:24.088 [118024] <2> logconnections: BPRD CONNECT FROM 10.32.231.12.48233 TO 10.32.231.10.1556 fd = 4
14:03:24.320 [118024] <2> brm_update_local_resiliency: changed = 0
14:03:24.320 [118024] <2> bpbrm main: max_entries_per_add = 5000
14:03:24.320 [118024] <2> ConnectionCache::connectAndCache: Acquiring new connection for host att16bck01, query type 223
14:03:24.328 [118024] <2> vnet_pbxConnect: pbxConnectEx Succeeded
14:03:24.329 [118024] <2> logconnections: BPDBM CONNECT FROM 10.32.231.12.32967 TO 10.32.231.10.1556 fd = 4
14:03:24.407 [118024] <2> db_CLIENTsend: reset client protocol version from 0 to 8
14:03:24.628 [118024] <2> db_getCLIENT: db_CLIENTreceive: no entity was found 227
14:03:24.628 [118024] <2> ConnectionCache::connectAndCache: Acquiring new connection for host att16bck01, query type 223
14:03:24.638 [118024] <2> vnet_pbxConnect: pbxConnectEx Succeeded
14:03:24.638 [118024] <2> logconnections: BPDBM CONNECT FROM 10.32.231.12.47907 TO 10.32.231.10.1556 fd = 4
14:03:24.747 [118024] <2> db_CLIENTsend: reset client protocol version from 0 to 8
14:03:24.979 [118024] <2> db_getCLIENT: db_CLIENTreceive: no entity was found 227
14:03:24.979 [118024] <2> verify_client: ../bpbrm.c.41740: db_getCLIENT failed for CLIENT: att16nba01
 

14:03:25.812 [118024] <2> bpbrm main: ESTIMATE -1 -1 ATT16NET01_1427979801
14:03:34.720 [118024] <2> bpbrm mm_sig: received ready signal from media manager
14:03:34.911 [118024] <2> bpbrm main: client protocol version 2
14:06:05.852 [118024] <2> bpbrm main: client ATT16NET01 EXIT STATUS = 25: cannot connect on socket
14:06:05.852 [118024] <2> bpbrm kill_child_process: start
14:06:05.852 [118024] <2> bpbrm wait_for_child: start
14:06:07.463 [118024] <2> bpbrm wait_for_child: child exit_status = 99 signal_status = 0
14:06:07.464 [118024] <2> bpbrm Exit: client backup EXIT STATUS 99: NDMP backup failure
 

bptm

14:21:32.063 [122725] <2> Media_siginfo_print: 1: delay 2 signo SIGHUP:1 code 0 pid 122701
14:21:32.064 [122725] <2> Media_library_signal_poll: 1:Terminate detected
14:21:32.064 [122725] <8> vnet_sock_ready_abs: [vnet.c:537] errno=11 sock=29
14:21:32.064 [122725] <2> vnet_sock_ready_abs: vnet.c.539: 0: Function failed: 10 0x0000000a
14:21:32.064 [122725] <16> get_exactly_n_bytes_or_eof_abs: select on socket failed: Resource temporarily unavailable (11)
14:21:32.064 [122725] <2> ts_get_long_abs: error reading long from socket: Resource temporarily unavailable (11)
14:21:32.064 [122725] <2> ts_get_adaptable_string: error reading long from socket: Resource temporarily unavailable (11)
14:21:32.064 [122725] <2> db_getdata: ts_get_string_handle() failed: Resource temporarily unavailable (11) network read error (-3)
14:21:32.064 [122725] <2> db_end: no DONE from db_getreply(): network read failed
14:21:32.064 [122725] <2> cleanup_image_disk: image cleanup failed: 42
14:21:32.065 [122725] <2> bptm: EXITING with status 99 <----------
14:21:32.065 [122725] <2> set_job_details: Tfile (358758): LOG 1427980892 4 bptm 122725 EXITING with status 99 <----------

14:21:32.065 [122725] <2> send_job_file: job ID 358758, ftype = 3 msg len = 64, msg = LOG 1427980892 4 bptm 122725 EXITING with status 99 <----------

ndmpagent

0,51216,134,134,22,1427980739695,122724,139941325879040,0:,63:ndmp_connect_to_server: hostname = ATT16NET01, portname = 10000,18:NdmpGlueLogTraceCb,1
0,51216,134,134,23,1427980739719,122724,139941325879040,0:,59:ndmp_authenticate_connection: using NDMP protocol version 4,18:NdmpGlueLogTraceCb,1
0,51216,134,134,24,1427980739757,122724,139941325879040,0:,70:ndmp_enable_extensions: enabling class_id = 0x2050 class_version = 0x3,18:NdmpGlueLogTraceCb,1
0,51216,134,134,25,1427980739757,122724,139941325879040,0:,70:ndmp_enable_extensions: enabling class_id = 0x2051 class_version = 0x1,18:NdmpGlueLogTraceCb,1
0,51216,134,134,26,1427980739757,122724,139941325879040,0:,70:ndmp_enable_extensions: enabling class_id = 0x7ff0 class_version = 0x1,18:NdmpGlueLogTraceCb,1
0,51216,137,134,2,1427980739758,122724,139941325879040,0:,75:[vnet_addrinfo.c:1602] in failed file cache ERR=-2 NAME=ATT16NET01 SVC=NULL,34:vnet_cached_getaddrinfo_and_update,0
0,51216,137,134,3,1427980739758,122724,139941325879040,0:,72:[vnet_addrinfo.c:1271] vnet_cached_getaddrinfo_and_update() failed 6 0x6,23:vnet_cached_getaddrinfo,0
0,51216,134,134,27,1427980739758,122724,139941325879040,0:,87:CreateServer: vnet_cached_getaddrinfo failed, host = ATT16NET01, status = 6, error = -2,22:NdmpAgentLmLogger::Log,1
0,51216,134,134,28,1427980739758,122724,139941325879040,0:,41:Failed to create a listen socket for IPv6,29:NdmpMoverSideShm::MoverListen,1
0,51216,134,134,29,1427980739758,122724,139941325879040,0:,21:Retrying with IPv4...,33:NdmpBackupManager::StartOperation,1
0,51216,134,134,30,1427980739758,122724,139941325879040,0:,37:Using local IP address: 172.31.251.30,22:NdmpAgentLmLogger::Log,1
0,51216,134,134,31,1427980889898,122724,139941325879040,0:,56:ndmp_data_connect failed, status = 23 (NDMP_CONNECT_ERR),28:NdmpDataSideNas::DataConnect,1
2,51216,134,134,32,1427980889898,122724,139941325879040,0:,0:,0:,2,(91|i32:-1|i32:-1|)
1,51216,134,134,33,1427980890333,122724,139941325879040,0:,0:,26:NdmpAgent::SetErrorAndHalt,1,(19|S24:../NdmpBackupManager.cpp|i32:778|S2:25|S24:cannot connect on socket|)
1,51216,134,134,34,1427980890333,122724,139941325879040,0:,0:,36:NdmpBackupManager::UpdateTotalKbytes,4,(24|u64:0|u64:0|u64:0|)
2,51216,134,134,35,1427980890333,122724,139941325879040,0:,0:,0:,2,(32|S22:/vol/DXG16VNAS01_VL01/|)
0,51216,134,134,36,1427980890734,122724,139941325879040,0:,48:Sending EXIT STATUS 25: cannot connect on socket,31:ConnectionToBrm::SendExitStatus,1
1,51216,134,134,37,1427980890734,122724,139941325879040,0:,0:,12:MainShutdown,2,(6|S2:25|u32:122724|)
 

Any help very welcome

Thanks

Steve
 

1 ACCEPTED SOLUTION

Accepted Solutions

mnolan
Level 6
Employee Accredited Certified

Your error is the following:

 

0,51216,134,134,31,1427980889898,122724,139941325879040,0:,56:ndmp_data_connect failed, status = 23 (NDMP_CONNECT_ERR),28:NdmpDataSideNas::DataConnect,1

 

Which is the same as the one I solved over at:  https://www-secure.symantec.com/connect/forums/ndmp-backup-error-ndmpdataconnectv3-failed#comment-10915881

 

Here is a partial quote of the relevant information, adjust for your usage.

"

Port 2859 is within the default range for SERVER_PORT_WINDOW, which is 1025-5000, this range is used as random to open a server for other devices to connect to without keeping it at a constant port that can be attacked.

Per the info "ndmp_data_connect failed, status = 23 (NDMP_CONNECT_ERR)"

Your filer attempted to connect to 157.253.77.49 port 2859 and failed.

Please ensure that both software and hardware firewalls allow this filer to connect to that address on the TCP bidirectional port range 1025-5000"

 

Per our firewall documentation: http://www.symantec.com/docs/TECH136090

"The SERVER_PORT_WINDOW is used inbound from the filer to the media server for remote NDMP and can also be used for efficient catalog file (TIR data) movement with local and 3-way NDMP. "

View solution in original post

6 REPLIES 6

RamNagalla
Moderator
Moderator
Partner    VIP    Certified

172.31.251.30 is this the IP address it suppose to use for NDMP backups...?

0,51216,134,134,30,1427980739758,122724,139941325879040,0:,37:Using local IP address: 172.31.251.30,22:NdmpAgentLmLogger::Log,1
0,51216,134,134,31,1427980889898,122724,139941325879040,0:,56:ndmp_data_connect failed, status = 23 (NDMP_CONNECT_ERR),28:NdmpDataSideNas::DataConnect,1

steve75
Level 3

Hi,

Yes it is This is the interface on the appliance for NDMP backups

 

sdo
Moderator
Moderator
Partner    VIP    Certified

1) Is there a firewall between the appliance and the NetApp?

2) If so, is only port 10000 open?

3) Is the area on the NetApp that you are trying to backup quite large?

4) What are the client_read_timeouts for Win master, RHEL media and appliance?  Use:

> bpgetconfig -M servername client_read_timeout

watsons
Level 6

vnet_cached_getaddrinfo_and_update() failed 6 

..looks like a name resolution issue.

You said tpautoconf -verify is working but is it confirmed working from both master & media server? Note that your master server is Windows-based so the (hostname) case does not matter, but your media server is Linux, so ATT16NET01 is different from att16bck01. (case sensitive)

On media server, check /etc/hosts and make sure the hostname can resolve to its IP address. If necessary, run a "bpclntcmd -clear_host_cache" after you rectify the resolution.

mnolan
Level 6
Employee Accredited Certified

Your error is the following:

 

0,51216,134,134,31,1427980889898,122724,139941325879040,0:,56:ndmp_data_connect failed, status = 23 (NDMP_CONNECT_ERR),28:NdmpDataSideNas::DataConnect,1

 

Which is the same as the one I solved over at:  https://www-secure.symantec.com/connect/forums/ndmp-backup-error-ndmpdataconnectv3-failed#comment-10915881

 

Here is a partial quote of the relevant information, adjust for your usage.

"

Port 2859 is within the default range for SERVER_PORT_WINDOW, which is 1025-5000, this range is used as random to open a server for other devices to connect to without keeping it at a constant port that can be attacked.

Per the info "ndmp_data_connect failed, status = 23 (NDMP_CONNECT_ERR)"

Your filer attempted to connect to 157.253.77.49 port 2859 and failed.

Please ensure that both software and hardware firewalls allow this filer to connect to that address on the TCP bidirectional port range 1025-5000"

 

Per our firewall documentation: http://www.symantec.com/docs/TECH136090

"The SERVER_PORT_WINDOW is used inbound from the filer to the media server for remote NDMP and can also be used for efficient catalog file (TIR data) movement with local and 3-way NDMP. "

steve75
Level 3

Thanks very much

That fixed it :)