cancel
Showing results for 
Search instead for 
Did you mean: 

The strange inexistent FQDNs in the bpdbm.logs

liuyl
Level 5

Enviroment: NBU 8.1.1 on RHEL 6.9

Symptom:

1) There are many hostname resolution entries about the vnet_cached_getaddrinfo with their inexistent FQDNs in the bpdbm.logs:
Notes: They are the host short-names used by the master/media server in the bp.conf, those are also resolved via the /etc/hosts.

20:30:23.383 [753] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=sh2db4.sh.sgcc.com.cn SVC=NULL
20:30:32.866 [944] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=sh2db4.sh.sgcc.com.cn SVC=NULL
20:30:35.731 [978] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=shzycdb2.sh.sgcc.com.cn SVC=NULL
20:30:43.873 [1067] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=sh2db4.sh.sgcc.com.cn SVC=NULL
20:30:47.148 [1094] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=shzycdb2.sh.sgcc.com.cn SVC=NULL
20:31:00.222 [1228] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=shzycdb4.sh.sgcc.com.cn SVC=NULL
20:31:06.028 [1323] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=sh2db2.sh.sgcc.com.cn SVC=NULL
20:31:12.725 [1427] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=shzycdb4.sh.sgcc.com.cn SVC=NULL
20:31:21.955 [1532] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=shzycdb2.sh.sgcc.com.cn SVC=NULL
20:31:23.532 [1548] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=shzycdb2.sh.sgcc.com.cn SVC=NULL
20:31:23.944 [1559] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=sh2db2.sh.sgcc.com.cn SVC=NULL
20:31:39.127 [1652] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=sh2db2.sh.sgcc.com.cn SVC=NULL
20:31:51.366 [1696] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=sh2db2.sh.sgcc.com.cn SVC=NULL
20:33:20.655 [2047] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=gw2db1.sh.sgcc.com.cn SVC=NULL
20:33:25.077 [2093] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=gw2db1.sh.sgcc.com.cn SVC=NULL
20:33:42.105 [2161] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=shzycdb1.sh.sgcc.com.cn SVC=NULL
20:33:45.389 [2175] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=gw2db1.sh.sgcc.com.cn SVC=NULL
20:33:59.405 [2236] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=jcsjdb3.sh.sgcc.com.cn SVC=NULL
20:34:00.868 [2253] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=jcsjdb3.sh.sgcc.com.cn SVC=NULL

SERVER = jcbak <--- master server

MEDIA_SERVER = jcsjdw1
MEDIA_SERVER = jcsjdb2
MEDIA_SERVER = jcsjdb3
MEDIA_SERVER = sh2db1
MEDIA_SERVER = sh2db2
MEDIA_SERVER = sh2db3
MEDIA_SERVER = sh2db4
MEDIA_SERVER = sh2db5
MEDIA_SERVER = gmdfwq1
MEDIA_SERVER = gmdfwq2
MEDIA_SERVER = gmdfwq3
MEDIA_SERVER = shzycdb1
MEDIA_SERVER = shzycdb2
MEDIA_SERVER = gw3db3
MEDIA_SERVER = gw3db4
MEDIA_SERVER = ecmweb1
MEDIA_SERVER = gw3db1
MEDIA_SERVER = gw3db2



#
# nbemmcmd -listhosts -verbose -display_server -machinename jcbak -machinetype master
NBEMMCMD, Version: 8.1.1
jcbak
ClusterName = ""
MachineName = "jcbak"
FQName = "jcbak"
GlobalDriveSeed = "VEND:#.:PROD:#.:IDX"
LocalDriveSeed = ""
MachineDescription = ""
MachineFlags = 0x77
MachineNbuType = master (3)
MachineState = active for tape and disk jobs (14)
NetBackupVersion = 8.1.1.0 (811000)
OperatingSystem = linux (16)
ScanAbility = 5
Command completed successfully.
#
#
#
# nbemmcmd -listhosts -verbose -display_server -machinename jcsjdb3 -machinetype media
NBEMMCMD, Version: 8.1.1
jcsjdb3
ClusterName = ""
MachineName = "jcsjdb3"
FQName = "jcsjdb3"
LocalDriveSeed = ""
MachineDescription = ""
MachineFlags = 0x71
MachineNbuType = media (1)
MachineState = active for tape and disk jobs (14)
MasterServerName = "jcbak"
NetBackupVersion = 8.1.1.0 (811000)
OperatingSystem = linux (16)
ScanAbility = 5
Command completed successfully.
#
#
#
# grep -Ei "jcbak|sjdb3" /etc/hosts
10.131.33.60 jcbak
10.131.29.105 jcsjdb3
#
#


2) There are also some hostname resolution entries about the vnet_cached_getaddrinfo with their short-names in the bpdbm.logs:
Notes: Those short hostname are also resolved via the /etc/hosts, and appear normally in the bpdbm.logs:

20:30:20.145 [710] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=srmprd2 SVC=NULL
20:30:20.146 [710] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=srmprd1 SVC=NULL
20:30:20.146 [710] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=erp-audit SVC=NULL
20:30:20.156 [710] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=SCERPDB1 SVC=NULL
20:30:20.157 [710] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1751] in failed file cache ERR=-2 NAME=EDWETL SVC=NULL

MEDIA_SERVER = srmprd1
MEDIA_SERVER = srmprd2
MEDIA_SERVER = scerpdb1


#
#
# nbemmcmd -listhosts -verbose -display_server -machinename srmprd2 -machinetype media
NBEMMCMD, Version: 8.1.1
srmprd2
ClusterName = ""
MachineName = "srmprd2"
FQName = "srmprd2"
LocalDriveSeed = ""
MachineDescription = ""
MachineFlags = 0x13
MachineNbuType = media (1)
MachineState = not reachable by master (4)
MasterServerName = "jcbak"
NetBackupVersion = 8.1.1.0 (811000)
OperatingSystem = rs6000 (5)
ScanAbility = 5
Command completed successfully.
#
#
#
# nbemmcmd -listhosts -verbose -display_server -machinename scerpdb1 -machinetype media
NBEMMCMD, Version: 8.1.1
SCERPDB1
ClusterName = ""
MachineName = "SCERPDB1"
FQName = "SCERPDB1"
LocalDriveSeed = ""
MachineDescription = ""
MachineFlags = 0x17
MachineNbuType = media (1)
MachineState = not active (0)
MasterServerName = "jcbak"
NetBackupVersion = 7.5.0.0 (750000)
OperatingSystem = rs6000 (5)
ScanAbility = 5
Command completed successfully.
#
#
#

9 REPLIES 9

davidmoline
Level 6
Employee

Hi @liuyl 

So? What's the problem? 

In relation to name resolution - what does nsswitch.conf dictate as the order for name lookups (hint if files isn't first in the list then DNS will be used)?

What's the output of the command "getent hosts 10.131.29.105" - which you expect to return jcsjdb3?

Have you tried clearing the NetBackup host cache on the server (bpclntcmd -clear_host_cache)?

Cheers
David

#
#
# getent hosts 10.131.29.105
10.131.29.105 jcsjdb3
#
#

Question:  So where and why the corresponding inexistent FQDNs in the bpdbm.logs come from ?

Yes, I do the clear_host_cache.

 

Can you answer my first question - what is the problem you are trying to resolve?

My final demand is the following:

How to enforce the netbackup just only use /etc/hosts for its hostname resolution within the netbackup layer?

jnardello
Level 6
   VIP    Certified

NetBackup does not do its own hostname resolution, it asks the OS to do that for it.

/etc/nsswitch.conf controls what resources the OS uses to do hostname resolution. Look for the line that says something like:

hosts: files dns myhostname

If you only want it to do resolution via /etc/hosts entries, change it to just say:

hosts: files

 

Then clear your hostname cache (bpclntcmd -clear_host_cache) and you should be good.

StoneRam-Simon
Level 6
Partner    VIP    Accredited Certified

If you do not restrict the OS to just use the hosts file as per @jnardello

I would also suggest that you check what you have in your /etc/resolv.conf too, as you may have additional "search domains" specified which then get added to the short name when the OS is resolving a name, and this is what NBU would then see...

In many cases the contents of the resolv.conf is managed by other network configuration settings / tools if so you need to identify where the "search domains" are being set and update them at the source or they will keep coming back...

 

 

Hi @liuyl 

Interesting choice of words. Other than seeing FQDNs in you logs, what is the problem you are trying to solve. 

If there is no problem stop wasting your and our time. If there is a real problem you are trying to solve can you let us know what isn't working.

Thanks
David

Sometimes many backup jobs would randomly failed with the status code "network connection broken (40)" !

After I disabled the DNS and then run -clear_host_cache,  the previous thousands of domain hosts entries sharply reduced to less than tens of such entries ...!

So why there would be still remaining a few domain hosts entries in the bpdbm.log after the DNS has been disabled ?