02-21-2016 12:23 PM
Hello,
I am facing one issue with device configuration after upgrade from 7.6.1.2 to 7.7.1. When we starting the device configuration wizards , we are getting below error.
unable to retreive the list of devices hosts for Media Manager host Master Server Name.
Is there any changes happen in 7.7?
We are working in cluster environment. Please provide your views to resolve this issue.
We have upgraded only the master server not other media servers(Robot control Host)
Thanks,
Sagar
Solved! Go to Solution.
02-22-2016 03:36 PM
Now, I know you know NetBackup... and so I don't mean to teach you to suck eggs...
...but sometimes we can't see the wood for the trees, and yes it has happened to me more than once too...
...anyway, here's what I would do in an attempt to completely rule out naming issues...
...and yes, the commands which are done twice, need to be done twice in order to spot rotating DNS and or name entries...
...so, on each NetBackup Server node (i.e. all masters and all medias, whether clustered or not, and whether active or passive), do the following:
blclntcmd -clear_host_cache cat /etc/hosts cat /etc/resolv.conf egrep -v "^#|^$" /etc/nsswitch.conf egrep -i "client|server" /usr/openv/netbackup/bp.conf cat /usr/openv/volmgr/vm.conf cat /usr/openv/var/global/server.conf hostname nslookup $HOSTNAME nslookup $HOSTNAME bptestnetconn -v -a -s bpclntcmd -self bpclntcmd -self bpclntcmd -hn $HOSTNAME bpclntcmd -hn $HOSTNAME bpclntcmd -pn nbemmcmd -getemmserver nbemmcmd -listhosts # ...then for the IP address that you think the is the NetBackup IP for the host that you are own, yes do it twice, do: nslookup x.x.x.x nslookup x.x.x.x bpclntcmd -ip x.x.x.x bpclntcmd -ip x.x.x.x # [end]
...so do all the above on every NetBackup "Server" (yes I know the cat of the server.conf should fail on media servers, but think about it for a sec ;)
.
Finally, please can you describe how and where you are running the Java Admin console from?
1) OS version?
2) Are you using jnbSA back via X11 over ssh ?
3) If using Java Admin console on Windows, then definitely started using "Run as administrator" ?
4) On the admin console host (desktop/server) can you run the following:
...if the java admin host is Unix/Linux: uname -a hostname id cat /etc/hosts cat /etc/resolv.conf egrep -v "^#|^$" /etc/nsswitch.conf ifconfig nslookup $HOSTNAME nslookup $HOSTNAME nslookup x.x.x.x nslookup x.x.x.x ... ...if the java admin host is Windows: ver hostname whoami type C:\Windows\System32\drivers\etc\hosts ipconfig /all nslookup %computername% nslookup %computername% nslookup x.x.x.x nxlookup x.x.x.x
.
And after all this there may still be routing issues to sort out.
02-21-2016 03:44 PM
02-22-2016 07:12 AM
Is there any connectivity issues you are seeing between the master and media server ?
Check if the communications looks good between them
02-22-2016 11:00 AM
Hello,
Communication is happening properly with media servers(robot hosts) but for some server we are unable to telnet on 13701 port. But for the same media servers vmoprcmd output is working fine.
abc1
MachineName = "abc1"
FQName = "abc1"
MachineDescription = ""
MachineNbuType = server (6)
efgh
ClusterName = "abc1"
MachineName = "efgh"
FQName = "efgh"
GlobalDriveSeed = "VEND:#.:PROD:#.:IDX"
LocalDriveSeed = ""
MachineDescription = ""
MachineFlags = 0xe7
MachineNbuType = master (3)
MachineState = active for tape and disk jobs (14)
NetBackupVersion = 7.7.1.0 (771000)
OperatingSystem = solaris (2)
ScanAbility = 5
abcd
ClusterName = "abc1"
MachineName = "abcd"
FQName = "abcd"
GlobalDriveSeed = "VEND:#.:PROD:#.:IDX"
LocalDriveSeed = ""
MachineDescription = ""
MachineFlags = 0xe7
MachineNbuType = master (3)
MachineState = active for disk jobs (12)
NetBackupVersion = 7.7.1.0 (771000)
OperatingSystem = solaris (2)
ScanAbility = 5
Command completed successfully.
nbemmcmd -getemmserver === this command will take more time to complete.
02-22-2016 11:27 AM
Without some detail around actual master server name and media server names and any description of which server you executed those commands upon, then well, we remain blind. It's impossible to tell what the above is actually about. :(
02-22-2016 11:31 AM
oops !!!
dakcmstnetbkp
MachineName = "dakcmstnetbkp"
FQName = "dakcmstnetbkp"
MachineDescription = ""
MachineNbuType = server (6)
hbdakcmstn2
ClusterName = "dakcmstnetbkp"
MachineName = "hbdakcmstn2"
FQName = "hbdakcmstn2"
GlobalDriveSeed = "VEND:#.:PROD:#.:IDX"
LocalDriveSeed = ""
MachineDescription = ""
MachineFlags = 0xe7
MachineNbuType = master (3)
MachineState = active for tape and disk jobs (14)
NetBackupVersion = 7.7.1.0 (771000)
OperatingSystem = solaris (2)
ScanAbility = 5
hbdakcmstn1
ClusterName = "dakcmstnetbkp"
MachineName = "hbdakcmstn1"
FQName = "hbdakcmstn1"
GlobalDriveSeed = "VEND:#.:PROD:#.:IDX"
LocalDriveSeed = ""
MachineDescription = ""
MachineFlags = 0xe7
MachineNbuType = master (3)
MachineState = active for disk jobs (12)
NetBackupVersion = 7.7.1.0 (771000)
OperatingSystem = solaris (2)
ScanAbility = 5
Command completed successfully.
02-22-2016 11:38 AM
Now we are facing one other issue with robots. PFA snapshot.
02-22-2016 12:41 PM
PFA bpjava-msvc logs.
00:01:33.428 [8272] <16> session_dispatch: Request count = 0 tag = 510
00:01:33.444 [8272] <2> populateCertificatePath: Certificate to be used for SSL [/usr/openv/var/vxss/credentials/dakcmstnetbkp]
00:01:33.445 [8272] <4> command_SECURE_CHANNEL_INIT: Using certificate [/usr/openv/var/vxss/credentials/dakcmstnetbkp] and Responding SECURE_CHANNEL_PROCEED.
00:01:33.707 [8272] <4> session_secure_lookup: Initiating SSL Accept
00:01:33.708 [8272] <2> populateCertificatePath: Certificate to be used for SSL [/usr/openv/var/vxss/credentials/dakcmstnetbkp]
00:01:33.756 [8272] <4> tls_accept: io.c.3330: SSL Channel established for fd[0]
00:01:33.756 [8272] <4> session_secure_lookup: SSL Connection Accepted!
00:01:33.766 [8272] <16> session_dispatch: Request count = 1 tag = 118
00:01:34.031 [8272] <16> session_dispatch: Request count = 2 tag = 101
00:01:34.148 [8272] <8> retry_getnameinfo: [vnet_addrinfo.c:1146] getnameinfo() failed 8 0x8
00:01:34.148 [8272] <8> vnet_cached_getnameinfo: [vnet_addrinfo.c:1917] retry_getnameinfo() failed RV=8 IPSTR=10.226.34.166 PORT=56081
00:01:34.177 [8272] <16> newAuthenticate: bad pam_status 3, Error in underlying service module
00:01:34.179 [8272] <16> command_LOGON_TO_MSERVER: putenv(BPJAVA_MASTER_IPC_STRING=) failed
00:01:34.210 [8277] <16> isVxssActive: authentication determination failed, assume none required: (193) VxSS authentication is requested but not allowed
00:01:34.211 [8277] <2> establishAuthorization: CORBA_CONTEXT_ID_AUTH is enabled: 1
00:01:34.211 [8277] <4> establishAuthorization: NON-NBAC user cert generation.
00:01:34.223 [8277] <2> userCertLogin: Trying to generate user cert in non-NBAC host
00:01:34.224 [8277] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1601] in failed file cache ERR=8 NAME=dakcmstnetbkp.hdfcsec.com SVC=NULL
00:01:34.224 [8277] <8> vnet_cached_getaddrinfo: [vnet_addrinfo.c:1270] vnet_cached_getaddrinfo_and_update() failed 6 0x6
00:01:34.224 [8277] <8> vnet_same_host_and_update: [vnet_addrinfo.c:2867] vnet_cached_getaddrinfo() failed STAT=6 RV=8 NAME2=dakcmstnetbkp.hdfcsec.com
00:01:34.229 [8277] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1547] in failed cache ERR=8 NAME=dakcmstnetbkp.hdfcsec.com SVC=NULL
00:01:34.229 [8277] <8> vnet_cached_getaddrinfo: [vnet_addrinfo.c:1270] vnet_cached_getaddrinfo_and_update() failed 6 0x6
00:01:34.229 [8277] <8> vnet_same_host_and_update: [vnet_addrinfo.c:2867] vnet_cached_getaddrinfo() failed STAT=6 RV=8 NAME2=dakcmstnetbkp.hdfcsec.com
00:01:34.229 [8277] <8> vnet_getfqdn: [vnet_fqdn.c:163] vnet_cached_get_aliases returned no valid aliases dakcmstnetbkp.HBCTXDOM.COM
00:01:34.229 [8277] <8> vnet_getfqdn: [vnet_fqdn.c:168] hostptr appears good - using it dakcmstnetbkp.HBCTXDOM.COM
00:01:34.847 [8277] <2> userCertLogin: VssExportCredential() returned [0]
00:01:34.847 [8277] <4> userCertLogin: Login PASSED and user cert [PASSED]
00:01:34.847 [8277] <2> establishAuthorization: User Certificate generated at //.vxss/credentials
00:01:43.107 [8272] <16> readCharByChar: read failed(?), sockFd = 0, count = 0 (want = 3072), errno = 131 = Connection reset by peer
00:01:43.107 [8272] <16> session_dispatch: recv failed, sockFd = 0, errno = 131 = Connection reset by peer
00:01:44.111 [8270] <16> poll_listen: can't find file descriptor in polling table
00:54:11.633 [26809] <16> session_dispatch: Request count = 0 tag = 510
00:54:11.673 [26809] <2> populateCertificatePath: Certificate to be used for SSL [/usr/openv/var/vxss/credentials/dakcmstnetbkp]
00:54:11.673 [26809] <4> command_SECURE_CHANNEL_INIT: Using certificate [/usr/openv/var/vxss/credentials/dakcmstnetbkp] and Responding SECURE_CHANNEL_PROCEED.
00:54:11.943 [26809] <4> session_secure_lookup: Initiating SSL Accept
00:54:11.944 [26809] <2> populateCertificatePath: Certificate to be used for SSL [/usr/openv/var/vxss/credentials/dakcmstnetbkp]
00:54:12.065 [26809] <4> tls_accept: io.c.3330: SSL Channel established for fd[0]
00:54:12.065 [26809] <4> session_secure_lookup: SSL Connection Accepted!
00:54:12.081 [26809] <16> session_dispatch: Request count = 1 tag = 118
00:54:12.389 [26809] <16> session_dispatch: Request count = 2 tag = 101
00:54:12.507 [26809] <8> retry_getnameinfo: [vnet_addrinfo.c:1146] getnameinfo() failed 8 0x8
00:54:12.507 [26809] <8> vnet_cached_getnameinfo: [vnet_addrinfo.c:1917] retry_getnameinfo() failed RV=8 IPSTR=10.226.34.166 PORT=56302
00:54:12.592 [26809] <16> newAuthenticate: bad pam_status 3, Error in underlying service module
00:54:12.595 [26809] <16> command_LOGON_TO_MSERVER: putenv(BPJAVA_MASTER_IPC_STRING=) failed
00:54:12.649 [26813] <16> isVxssActive: authentication determination failed, assume none required: (193) VxSS authentication is requested but not allowed
00:54:12.650 [26813] <2> establishAuthorization: CORBA_CONTEXT_ID_AUTH is enabled: 1
00:54:12.651 [26813] <4> establishAuthorization: NON-NBAC user cert generation.
00:54:12.663 [26813] <2> userCertLogin: Trying to generate user cert in non-NBAC host
00:54:12.676 [26813] <8> retry_getaddrinfo_for_real: [vnet_addrinfo.c:1053] getaddrinfo() failed RV=8 NAME=dakcmstnetbkp.hdfcsec.com SVC=0 errno=0
00:54:12.676 [26813] <8> retry_getaddrinfo: [vnet_addrinfo.c:893] retry_getaddrinfo_for_real failed RV=8 NAME=dakcmstnetbkp.hdfcsec.com SVC=0
00:54:12.676 [26813] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1642] retry_getaddrinfo() failed RV=8 NAME=dakcmstnetbkp.hdfcsec.com SVC=NULL
00:54:12.681 [26813] <8> vnet_cached_getaddrinfo: [vnet_addrinfo.c:1270] vnet_cached_getaddrinfo_and_update() failed 6 0x6
00:54:12.681 [26813] <8> vnet_same_host_and_update: [vnet_addrinfo.c:2867] vnet_cached_getaddrinfo() failed STAT=6 RV=8 NAME2=dakcmstnetbkp.hdfcsec.com
00:54:12.700 [26813] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1547] in failed cache ERR=8 NAME=dakcmstnetbkp.hdfcsec.com SVC=NULL
00:54:12.700 [26813] <8> vnet_cached_getaddrinfo: [vnet_addrinfo.c:1270] vnet_cached_getaddrinfo_and_update() failed 6 0x6
00:54:12.700 [26813] <8> vnet_same_host_and_update: [vnet_addrinfo.c:2867] vnet_cached_getaddrinfo() failed STAT=6 RV=8 NAME2=dakcmstnetbkp.hdfcsec.com
00:54:12.700 [26813] <8> vnet_getfqdn: [vnet_fqdn.c:163] vnet_cached_get_aliases returned no valid aliases dakcmstnetbkp.HBCTXDOM.COM
00:54:12.700 [26813] <8> vnet_getfqdn: [vnet_fqdn.c:168] hostptr appears good - using it dakcmstnetbkp.HBCTXDOM.COM
00:54:13.486 [26813] <2> userCertLogin: VssExportCredential() returned [0]
00:54:13.486 [26813] <4> userCertLogin: Login PASSED and user cert [PASSED]
00:54:13.486 [26813] <2> establishAuthorization: User Certificate generated at //.vxss/credentials
00:54:22.660 [26809] <16> readCharByChar: read failed(?), sockFd = 0, count = 0 (want = 3072), errno = 131 = Connection reset by peer
00:54:22.660 [26809] <16> session_dispatch: recv failed, sockFd = 0, errno = 131 = Connection reset by peer
00:54:23.664 [26805] <16> poll_listen: can't find file descriptor in polling table
02-22-2016 12:54 PM
On both master server cluster member nodes try this to clear out corrupt NetBackup name cache:
bpclntcmd -clear_host_cache
...then on the active master server member node try these query commands:
nbemmcmd -getemmserver nbemmcmd -listhosts vmoprcmd -devmon hs
Then try restarting the Java Admin Console via the active master server member node.
02-22-2016 01:01 PM
Hi,
Every command is working fine and I have already tried above suggestions.
Have u checked above logs. We have 2 cluster environments out of that 1 is working fine.
02-22-2016 01:19 PM
Wondering whether we have a naming issue:
00:01:34.229 [8277] <8> vnet_same_host_and_update: [vnet_addrinfo.c:2867] vnet_cached_getaddrinfo() failed STAT=6 RV=8 NAME2=dakcmstnetbkp.hdfcsec.com 00:01:34.229 [8277] <8> vnet_getfqdn: [vnet_fqdn.c:163] vnet_cached_get_aliases returned no valid aliases dakcmstnetbkp.HBCTXDOM.COM
The same hostname has two different suffixes.
I don't understand how they are related to each other, so I don't know if it's a problem or not.
The status <8> above might just be a red herring.
.
The status <16> however, these are a problem... if we filter out just the entries for thread 26809:
00:54:11.633 [26809] <16> session_dispatch: Request count = 0 tag = 510 00:54:11.673 [26809] <2> populateCertificatePath: Certificate to be used for SSL [/usr/openv/var/vxss/credentials/dakcmstnetbkp] 00:54:11.673 [26809] <4> command_SECURE_CHANNEL_INIT: Using certificate [/usr/openv/var/vxss/credentials/dakcmstnetbkp] and Responding SECURE_CHANNEL_PROCEED. 00:54:11.943 [26809] <4> session_secure_lookup: Initiating SSL Accept 00:54:11.944 [26809] <2> populateCertificatePath: Certificate to be used for SSL [/usr/openv/var/vxss/credentials/dakcmstnetbkp] 00:54:12.065 [26809] <4> tls_accept: io.c.3330: SSL Channel established for fd[0] 00:54:12.065 [26809] <4> session_secure_lookup: SSL Connection Accepted! 00:54:12.081 [26809] <16> session_dispatch: Request count = 1 tag = 118 00:54:12.389 [26809] <16> session_dispatch: Request count = 2 tag = 101 00:54:12.507 [26809] <8> retry_getnameinfo: [vnet_addrinfo.c:1146] getnameinfo() failed 8 0x8 00:54:12.507 [26809] <8> vnet_cached_getnameinfo: [vnet_addrinfo.c:1917] retry_getnameinfo() failed RV=8 IPSTR=10.226.34.166 PORT=56302 00:54:12.592 [26809] <16> newAuthenticate: bad pam_status 3, Error in underlying service module 00:54:12.595 [26809] <16> command_LOGON_TO_MSERVER: putenv(BPJAVA_MASTER_IPC_STRING=) failed 00:54:22.660 [26809] <16> readCharByChar: read failed(?), sockFd = 0, count = 0 (want = 3072), errno = 131 = Connection reset by peer 00:54:22.660 [26809] <16> session_dispatch: recv failed, sockFd = 0, errno = 131 = Connection reset by peer
We see a status <8> getnameinfo failed immediately before some status <16> errors.
.
Are you sure that your NetBackup server naming is robust/good?
02-22-2016 01:24 PM
This poster:
https://www.veritas.com/community/forums/admin-console-opening-backup-archive-and-restore
...who also had:
newAuthenticate: bad pam_status 3, Error in underlying service module
...found that they had been attempting to logon to the wrong server name.
02-22-2016 01:33 PM
Yeah ... According to me naming convention is good. Backups also happening through Master successfully. However If I want ot check the issue with the name , how can I proceed ?
Thanks,
Sagar
02-22-2016 03:36 PM
Now, I know you know NetBackup... and so I don't mean to teach you to suck eggs...
...but sometimes we can't see the wood for the trees, and yes it has happened to me more than once too...
...anyway, here's what I would do in an attempt to completely rule out naming issues...
...and yes, the commands which are done twice, need to be done twice in order to spot rotating DNS and or name entries...
...so, on each NetBackup Server node (i.e. all masters and all medias, whether clustered or not, and whether active or passive), do the following:
blclntcmd -clear_host_cache cat /etc/hosts cat /etc/resolv.conf egrep -v "^#|^$" /etc/nsswitch.conf egrep -i "client|server" /usr/openv/netbackup/bp.conf cat /usr/openv/volmgr/vm.conf cat /usr/openv/var/global/server.conf hostname nslookup $HOSTNAME nslookup $HOSTNAME bptestnetconn -v -a -s bpclntcmd -self bpclntcmd -self bpclntcmd -hn $HOSTNAME bpclntcmd -hn $HOSTNAME bpclntcmd -pn nbemmcmd -getemmserver nbemmcmd -listhosts # ...then for the IP address that you think the is the NetBackup IP for the host that you are own, yes do it twice, do: nslookup x.x.x.x nslookup x.x.x.x bpclntcmd -ip x.x.x.x bpclntcmd -ip x.x.x.x # [end]
...so do all the above on every NetBackup "Server" (yes I know the cat of the server.conf should fail on media servers, but think about it for a sec ;)
.
Finally, please can you describe how and where you are running the Java Admin console from?
1) OS version?
2) Are you using jnbSA back via X11 over ssh ?
3) If using Java Admin console on Windows, then definitely started using "Run as administrator" ?
4) On the admin console host (desktop/server) can you run the following:
...if the java admin host is Unix/Linux: uname -a hostname id cat /etc/hosts cat /etc/resolv.conf egrep -v "^#|^$" /etc/nsswitch.conf ifconfig nslookup $HOSTNAME nslookup $HOSTNAME nslookup x.x.x.x nslookup x.x.x.x ... ...if the java admin host is Windows: ver hostname whoami type C:\Windows\System32\drivers\etc\hosts ipconfig /all nslookup %computername% nslookup %computername% nslookup x.x.x.x nxlookup x.x.x.x
.
And after all this there may still be routing issues to sort out.
02-22-2016 07:31 PM