Oracle Backup failing with 54

sai_1411 · ‎12-02-2018

Hello All,

Oracle backups are failing for one of the server with error 54.

Its a virtual IP hosted on one of the physical machine. which will backup to a network media server.

Few streams are getting sucessfull without any issue but few are failing with 54.

bptest to that virtual IP is showing like below when i do with verbose option:

20:42:32.818 [27678] <8> retry_getaddrinfo_for_real: [vnet_addrinfo.c:1082] getaddrinfo() failed RV=8 NAME=uspebrms01 SVC=0 errno=0
20:42:32.818 [27678] <8> retry_getaddrinfo: [vnet_addrinfo.c:922] retry_getaddrinfo_for_real failed RV=8 NAME=uspebrms01 SVC=0
20:42:32.818 [27678] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1732] retry_getaddrinfo() failed RV=8 NAME=uspebrms01 SVC=NULL
20:42:32.821 [27678] <8> vnet_cached_getaddrinfo: [vnet_addrinfo.c:1360] vnet_cached_getaddrinfo_and_update() failed 6 0x6
20:42:32.821 [27678] <8> vnet_same_host_and_update: [vnet_addrinfo.c:2971] vnet_cached_getaddrinfo() failed STAT=6 RV=8 NAME1=uspebrms01
20:42:34.150 [27678] <8> retry_getaddrinfo_for_real: [vnet_addrinfo.c:1082] getaddrinfo() failed RV=8 NAME=uspebr02 SVC=0 errno=0
20:42:34.150 [27678] <8> retry_getaddrinfo: [vnet_addrinfo.c:922] retry_getaddrinfo_for_real failed RV=8 NAME=uspebr02 SVC=0
20:42:34.150 [27678] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1732] retry_getaddrinfo() failed RV=8 NAME=uspebr02 SVC=NULL
20:42:34.153 [27678] <8> vnet_cached_getaddrinfo: [vnet_addrinfo.c:1360] vnet_cached_getaddrinfo_and_update() failed 6 0x6
20:42:34.153 [27678] <8> vnet_same_host_and_update: [vnet_addrinfo.c:2971] vnet_cached_getaddrinfo() failed STAT=6 RV=8 NAME1=uspebr02
20:42:34.157 [27678] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1637] in failed cache ERR=8 NAME=uspebrms01 SVC=NULL
20:42:34.157 [27678] <8> vnet_cached_getaddrinfo: [vnet_addrinfo.c:1360] vnet_cached_getaddrinfo_and_update() failed 6 0x6
20:42:34.157 [27678] <8> vnet_same_host_and_update: [vnet_addrinfo.c:2978] vnet_cached_getaddrinfo() failed STAT=6 RV=8 NAME2=uspebrms01
20:42:34.157 [27678] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1637] in failed cache ERR=8 NAME=uspebr02 SVC=NULL
20:42:34.157 [27678] <8> vnet_cached_getaddrinfo: [vnet_addrinfo.c:1360] vnet_cached_getaddrinfo_and_update() failed 6 0x6
20:42:34.157 [27678] <8> vnet_same_host_and_update: [vnet_addrinfo.c:2978] vnet_cached_getaddrinfo() failed STAT=6 RV=8 NAME2=uspebr0220:42:34.627 [27678] <8> vnet_vnetd_push_ipaddr: [vnet_vnetd.c:1820] bptr[i] 135 0x87
20:42:34.628 [27678] <8> vnet_vnetd_push_ipaddr: [vnet_vnetd.c:1820] bptr[i] 209 0xd1
20:42:34.628 [27678] <8> vnet_vnetd_push_ipaddr: [vnet_vnetd.c:1820] bptr[i] 14 0xe
20:42:34.628 [27678] <8> vnet_vnetd_push_ipaddr: [vnet_vnetd.c:1820] bptr[i] 11 0xb

Could someone suggest what might be the issue?

I have verified below things:

Virtual IP exists on the host, and its not an app_cluster, its a single node database.

All entries exists on bp.conf file, ping, telnet to all NBU ports are connecting, traceroute happening and it belongs to same subnet to media server.

Services are fine. all ports are in listen state.

File level backups for physical host are completing fine.

media server: Solaris, NBU version 7.7.2

Please let me know if any further information is required.

NBU35 · ‎12-03-2018

Can u please run name resolution for all interfaces of master/media from client and viceversa suing bpclntcmd and nslookup command.

I am suspecting name resolution issue here, which can be fixed by DNS or /etc/host entries.

Marianne · ‎12-03-2018

I am missing something - why would the Oracle server have a virtual IP if not part of a cluster?

bptestbpcd does not seem to show all of the output.
Could you please share everything? (including the initial command that shows client hostname)

Where did you run bptestbpcd from?
Master or media server?
Important to run command from master and media server.

The bptestbpcd snippet contains 3 hostnames:
uspebrms01
uspebr02
uspebr0220

Could you tell us the roll of each of these names?

Have you checked output of these commands on the client?
bpclntcmd -pn
bpclntcmd -self

If all hostname lookup/connection is good, then there is one more thing to check:
Timeouts on master and media server.

For large databases, it is sometimes necessary to increase CLIENT_CONNECT_TIMEOUT and CLIENT_CONNECT_TIMEOUT on master and media server.
Try 1800 (default is 300).

Handy NetBackup Links

Genericus · ‎12-14-2018

You may need to directly edit the bp.conf to add extended timeout - you are limited in the GUI, but the bp.conf will allow you to go to 1800

NetBackup 9.1.0.1 on Solaris 11, writing to Data Domain 9800 7.7.4.0
duplicating via SLP to LTO5 & LTO8 in SL8500 via ACSLS

VOX

Oracle Backup failing with 54