Solved: Issue with Device configuration wizard after upgra...

Sagar_Kolhe · ‎02-21-2016

Hello,

I am facing one issue with device configuration after upgrade from 7.6.1.2 to 7.7.1. When we starting the device configuration wizards , we are getting below error.

unable to retreive the list of devices hosts for Media Manager host Master Server Name.

Is there any changes happen in 7.7?

We are working in cluster environment. Please provide your views to resolve this issue.

We have upgraded only the master server not other media servers(Robot control Host)

Thanks,

Sagar

sdo · ‎02-22-2016

Now, I know you know NetBackup... and so I don't mean to teach you to suck eggs...

...but sometimes we can't see the wood for the trees, and yes it has happened to me more than once too...

...anyway, here's what I would do in an attempt to completely rule out naming issues...

...and yes, the commands which are done twice, need to be done twice in order to spot rotating DNS and or name entries...

...so, on each NetBackup Server node (i.e. all masters and all medias, whether clustered or not, and whether active or passive), do the following:

blclntcmd -clear_host_cache

cat /etc/hosts

cat /etc/resolv.conf

egrep -v "^#|^$" /etc/nsswitch.conf

egrep -i "client|server" /usr/openv/netbackup/bp.conf

cat /usr/openv/volmgr/vm.conf

cat /usr/openv/var/global/server.conf

hostname

nslookup $HOSTNAME

nslookup $HOSTNAME

bptestnetconn -v -a -s

bpclntcmd -self

bpclntcmd -self

bpclntcmd -hn $HOSTNAME

bpclntcmd -hn $HOSTNAME

bpclntcmd -pn

nbemmcmd -getemmserver

nbemmcmd -listhosts

# ...then for the IP address that you think the is the NetBackup IP for the host that you are own, yes do it twice, do:

nslookup x.x.x.x

nslookup x.x.x.x

bpclntcmd -ip x.x.x.x

bpclntcmd -ip x.x.x.x

# [end]

...so do all the above on every NetBackup "Server" (yes I know the cat of the server.conf should fail on media servers, but think about it for a sec ;)

.

Finally, please can you describe how and where you are running the Java Admin console from?

1) OS version?

2) Are you using jnbSA back via X11 over ssh ?

3) If using Java Admin console on Windows, then definitely started using "Run as administrator" ?

4) On the admin console host (desktop/server) can you run the following:

...if the java admin host is Unix/Linux:

uname -a

hostname

id

cat /etc/hosts

cat /etc/resolv.conf

egrep -v "^#|^$" /etc/nsswitch.conf

ifconfig

nslookup $HOSTNAME

nslookup $HOSTNAME

nslookup x.x.x.x

nslookup x.x.x.x

...

...if the java admin host is Windows:

ver

hostname

whoami

type C:\Windows\System32\drivers\etc\hosts

ipconfig /all

nslookup %computername%

nslookup %computername%

nslookup x.x.x.x

nxlookup x.x.x.x

.

And after all this there may still be routing issues to sort out.

View solution in original post

Marianne · ‎02-21-2016

Why do you need to run Device Config wizard after NBU upgrade? What is the output of these commands : nbemmcmd -listhosts -verbose nbemmcmd -getemmserver

Handy NetBackup Links

nbutech · ‎02-22-2016

Is there any connectivity issues you are seeing between the master and media server ?

Check if the communications looks good between them

Sagar_Kolhe · ‎02-22-2016

Hello,

Communication is happening properly with media servers(robot hosts) but for some server we are unable to telnet on 13701 port. But for the same media servers vmoprcmd output is working fine.

abc1
        MachineName = "abc1"
        FQName = "abc1"
        MachineDescription = ""
        MachineNbuType = server (6)
efgh
        ClusterName = "abc1"
        MachineName = "efgh"
        FQName = "efgh"
        GlobalDriveSeed = "VEND:#.:PROD:#.:IDX"
        LocalDriveSeed = ""
        MachineDescription = ""
        MachineFlags = 0xe7
        MachineNbuType = master (3)
        MachineState = active for tape and disk jobs (14)
        NetBackupVersion = 7.7.1.0 (771000)
        OperatingSystem = solaris (2)
        ScanAbility = 5
abcd
        ClusterName = "abc1"
        MachineName = "abcd"
        FQName = "abcd"
        GlobalDriveSeed = "VEND:#.:PROD:#.:IDX"
        LocalDriveSeed = ""
        MachineDescription = ""
        MachineFlags = 0xe7
        MachineNbuType = master (3)
        MachineState = active for disk jobs (12)
        NetBackupVersion = 7.7.1.0 (771000)
        OperatingSystem = solaris (2)
        ScanAbility = 5
Command completed successfully.

nbemmcmd -getemmserver === this command will take more time to complete.

sdo · ‎02-22-2016

Without some detail around actual master server name and media server names and any description of which server you executed those commands upon, then well, we remain blind. It's impossible to tell what the above is actually about. :(

Sagar_Kolhe · ‎02-22-2016

oops !!!

dakcmstnetbkp
        MachineName = "dakcmstnetbkp"
        FQName = "dakcmstnetbkp"
        MachineDescription = ""
        MachineNbuType = server (6)
hbdakcmstn2
        ClusterName = "dakcmstnetbkp"
        MachineName = "hbdakcmstn2"
        FQName = "hbdakcmstn2"
        GlobalDriveSeed = "VEND:#.:PROD:#.:IDX"
        LocalDriveSeed = ""
        MachineDescription = ""
        MachineFlags = 0xe7
        MachineNbuType = master (3)
        MachineState = active for tape and disk jobs (14)
        NetBackupVersion = 7.7.1.0 (771000)
        OperatingSystem = solaris (2)
        ScanAbility = 5
hbdakcmstn1
        ClusterName = "dakcmstnetbkp"
        MachineName = "hbdakcmstn1"
        FQName = "hbdakcmstn1"
        GlobalDriveSeed = "VEND:#.:PROD:#.:IDX"
        LocalDriveSeed = ""
        MachineDescription = ""
        MachineFlags = 0xe7
        MachineNbuType = master (3)
        MachineState = active for disk jobs (12)
        NetBackupVersion = 7.7.1.0 (771000)
        OperatingSystem = solaris (2)
        ScanAbility = 5
Command completed successfully.

Sagar_Kolhe · ‎02-22-2016

Now we are facing one other issue with robots. PFA snapshot.

Sagar_Kolhe · ‎02-22-2016

PFA bpjava-msvc logs.

00:01:33.428 [8272] <16> session_dispatch: Request count = 0 tag = 510
00:01:33.444 [8272] <2> populateCertificatePath: Certificate to be used for SSL [/usr/openv/var/vxss/credentials/dakcmstnetbkp]
00:01:33.445 [8272] <4> command_SECURE_CHANNEL_INIT: Using certificate [/usr/openv/var/vxss/credentials/dakcmstnetbkp] and Responding SECURE_CHANNEL_PROCEED.
00:01:33.707 [8272] <4> session_secure_lookup: Initiating SSL Accept
00:01:33.708 [8272] <2> populateCertificatePath: Certificate to be used for SSL [/usr/openv/var/vxss/credentials/dakcmstnetbkp]
00:01:33.756 [8272] <4> tls_accept: io.c.3330: SSL Channel established for fd[0]
00:01:33.756 [8272] <4> session_secure_lookup: SSL Connection Accepted!
00:01:33.766 [8272] <16> session_dispatch: Request count = 1 tag = 118
00:01:34.031 [8272] <16> session_dispatch: Request count = 2 tag = 101
00:01:34.148 [8272] <8> retry_getnameinfo: [vnet_addrinfo.c:1146] getnameinfo() failed 8 0x8
00:01:34.148 [8272] <8> vnet_cached_getnameinfo: [vnet_addrinfo.c:1917] retry_getnameinfo() failed RV=8 IPSTR=10.226.34.166 PORT=56081
00:01:34.177 [8272] <16> newAuthenticate: bad pam_status 3, Error in underlying service module
00:01:34.179 [8272] <16> command_LOGON_TO_MSERVER: putenv(BPJAVA_MASTER_IPC_STRING=) failed
00:01:34.210 [8277] <16> isVxssActive: authentication determination failed, assume none required: (193) VxSS authentication is requested but not allowed
00:01:34.211 [8277] <2> establishAuthorization: CORBA_CONTEXT_ID_AUTH is enabled: 1
00:01:34.211 [8277] <4> establishAuthorization: NON-NBAC user cert generation.
00:01:34.223 [8277] <2> userCertLogin: Trying to generate user cert in non-NBAC host
00:01:34.224 [8277] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1601] in failed file cache ERR=8 NAME=dakcmstnetbkp.hdfcsec.com SVC=NULL
00:01:34.224 [8277] <8> vnet_cached_getaddrinfo: [vnet_addrinfo.c:1270] vnet_cached_getaddrinfo_and_update() failed 6 0x6
00:01:34.224 [8277] <8> vnet_same_host_and_update: [vnet_addrinfo.c:2867] vnet_cached_getaddrinfo() failed STAT=6 RV=8 NAME2=dakcmstnetbkp.hdfcsec.com
00:01:34.229 [8277] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1547] in failed cache ERR=8 NAME=dakcmstnetbkp.hdfcsec.com SVC=NULL
00:01:34.229 [8277] <8> vnet_cached_getaddrinfo: [vnet_addrinfo.c:1270] vnet_cached_getaddrinfo_and_update() failed 6 0x6
00:01:34.229 [8277] <8> vnet_same_host_and_update: [vnet_addrinfo.c:2867] vnet_cached_getaddrinfo() failed STAT=6 RV=8 NAME2=dakcmstnetbkp.hdfcsec.com
00:01:34.229 [8277] <8> vnet_getfqdn: [vnet_fqdn.c:163] vnet_cached_get_aliases returned no valid aliases dakcmstnetbkp.HBCTXDOM.COM
00:01:34.229 [8277] <8> vnet_getfqdn: [vnet_fqdn.c:168] hostptr appears good - using it dakcmstnetbkp.HBCTXDOM.COM
00:01:34.847 [8277] <2> userCertLogin: VssExportCredential() returned [0]
00:01:34.847 [8277] <4> userCertLogin: Login PASSED and user cert [PASSED]
00:01:34.847 [8277] <2> establishAuthorization: User Certificate generated at //.vxss/credentials
00:01:43.107 [8272] <16> readCharByChar: read failed(?), sockFd = 0, count = 0 (want = 3072), errno = 131 = Connection reset by peer
00:01:43.107 [8272] <16> session_dispatch: recv failed, sockFd = 0, errno = 131 = Connection reset by peer
00:01:44.111 [8270] <16> poll_listen: can't find file descriptor in polling table
00:54:11.633 [26809] <16> session_dispatch: Request count = 0 tag = 510
00:54:11.673 [26809] <2> populateCertificatePath: Certificate to be used for SSL [/usr/openv/var/vxss/credentials/dakcmstnetbkp]
00:54:11.673 [26809] <4> command_SECURE_CHANNEL_INIT: Using certificate [/usr/openv/var/vxss/credentials/dakcmstnetbkp] and Responding SECURE_CHANNEL_PROCEED.
00:54:11.943 [26809] <4> session_secure_lookup: Initiating SSL Accept
00:54:11.944 [26809] <2> populateCertificatePath: Certificate to be used for SSL [/usr/openv/var/vxss/credentials/dakcmstnetbkp]
00:54:12.065 [26809] <4> tls_accept: io.c.3330: SSL Channel established for fd[0]
00:54:12.065 [26809] <4> session_secure_lookup: SSL Connection Accepted!
00:54:12.081 [26809] <16> session_dispatch: Request count = 1 tag = 118
00:54:12.389 [26809] <16> session_dispatch: Request count = 2 tag = 101
00:54:12.507 [26809] <8> retry_getnameinfo: [vnet_addrinfo.c:1146] getnameinfo() failed 8 0x8
00:54:12.507 [26809] <8> vnet_cached_getnameinfo: [vnet_addrinfo.c:1917] retry_getnameinfo() failed RV=8 IPSTR=10.226.34.166 PORT=56302
00:54:12.592 [26809] <16> newAuthenticate: bad pam_status 3, Error in underlying service module
00:54:12.595 [26809] <16> command_LOGON_TO_MSERVER: putenv(BPJAVA_MASTER_IPC_STRING=) failed
00:54:12.649 [26813] <16> isVxssActive: authentication determination failed, assume none required: (193) VxSS authentication is requested but not allowed
00:54:12.650 [26813] <2> establishAuthorization: CORBA_CONTEXT_ID_AUTH is enabled: 1
00:54:12.651 [26813] <4> establishAuthorization: NON-NBAC user cert generation.
00:54:12.663 [26813] <2> userCertLogin: Trying to generate user cert in non-NBAC host
00:54:12.676 [26813] <8> retry_getaddrinfo_for_real: [vnet_addrinfo.c:1053] getaddrinfo() failed RV=8 NAME=dakcmstnetbkp.hdfcsec.com SVC=0 errno=0
00:54:12.676 [26813] <8> retry_getaddrinfo: [vnet_addrinfo.c:893] retry_getaddrinfo_for_real failed RV=8 NAME=dakcmstnetbkp.hdfcsec.com SVC=0
00:54:12.676 [26813] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1642] retry_getaddrinfo() failed RV=8 NAME=dakcmstnetbkp.hdfcsec.com SVC=NULL
00:54:12.681 [26813] <8> vnet_cached_getaddrinfo: [vnet_addrinfo.c:1270] vnet_cached_getaddrinfo_and_update() failed 6 0x6
00:54:12.681 [26813] <8> vnet_same_host_and_update: [vnet_addrinfo.c:2867] vnet_cached_getaddrinfo() failed STAT=6 RV=8 NAME2=dakcmstnetbkp.hdfcsec.com
00:54:12.700 [26813] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1547] in failed cache ERR=8 NAME=dakcmstnetbkp.hdfcsec.com SVC=NULL
00:54:12.700 [26813] <8> vnet_cached_getaddrinfo: [vnet_addrinfo.c:1270] vnet_cached_getaddrinfo_and_update() failed 6 0x6
00:54:12.700 [26813] <8> vnet_same_host_and_update: [vnet_addrinfo.c:2867] vnet_cached_getaddrinfo() failed STAT=6 RV=8 NAME2=dakcmstnetbkp.hdfcsec.com
00:54:12.700 [26813] <8> vnet_getfqdn: [vnet_fqdn.c:163] vnet_cached_get_aliases returned no valid aliases dakcmstnetbkp.HBCTXDOM.COM
00:54:12.700 [26813] <8> vnet_getfqdn: [vnet_fqdn.c:168] hostptr appears good - using it dakcmstnetbkp.HBCTXDOM.COM
00:54:13.486 [26813] <2> userCertLogin: VssExportCredential() returned [0]
00:54:13.486 [26813] <4> userCertLogin: Login PASSED and user cert [PASSED]
00:54:13.486 [26813] <2> establishAuthorization: User Certificate generated at //.vxss/credentials
00:54:22.660 [26809] <16> readCharByChar: read failed(?), sockFd = 0, count = 0 (want = 3072), errno = 131 = Connection reset by peer
00:54:22.660 [26809] <16> session_dispatch: recv failed, sockFd = 0, errno = 131 = Connection reset by peer
00:54:23.664 [26805] <16> poll_listen: can't find file descriptor in polling table

sdo · ‎02-22-2016

On both master server cluster member nodes try this to clear out corrupt NetBackup name cache:

bpclntcmd -clear_host_cache

...then on the active master server member node try these query commands:

nbemmcmd -getemmserver

nbemmcmd -listhosts

vmoprcmd -devmon hs

Then try restarting the Java Admin Console via the active master server member node.

Sagar_Kolhe · ‎02-22-2016

Hi,

Every command is working fine and I have already tried above suggestions.

Have u checked above logs. We have 2 cluster environments out of that 1 is working fine.

sdo · ‎02-22-2016

Wondering whether we have a naming issue:

00:01:34.229 [8277] <8> vnet_same_host_and_update: [vnet_addrinfo.c:2867] vnet_cached_getaddrinfo() failed STAT=6 RV=8 NAME2=dakcmstnetbkp.hdfcsec.com
00:01:34.229 [8277] <8> vnet_getfqdn: [vnet_fqdn.c:163] vnet_cached_get_aliases returned no valid aliases dakcmstnetbkp.HBCTXDOM.COM

The same hostname has two different suffixes.

I don't understand how they are related to each other, so I don't know if it's a problem or not.

The status <8> above might just be a red herring.

.

The status <16> however, these are a problem... if we filter out just the entries for thread 26809:

00:54:11.633 [26809] <16> session_dispatch: Request count = 0 tag = 510
00:54:11.673 [26809] <2> populateCertificatePath: Certificate to be used for SSL [/usr/openv/var/vxss/credentials/dakcmstnetbkp]
00:54:11.673 [26809] <4> command_SECURE_CHANNEL_INIT: Using certificate [/usr/openv/var/vxss/credentials/dakcmstnetbkp] and Responding SECURE_CHANNEL_PROCEED.
00:54:11.943 [26809] <4> session_secure_lookup: Initiating SSL Accept
00:54:11.944 [26809] <2> populateCertificatePath: Certificate to be used for SSL [/usr/openv/var/vxss/credentials/dakcmstnetbkp]
00:54:12.065 [26809] <4> tls_accept: io.c.3330: SSL Channel established for fd[0]
00:54:12.065 [26809] <4> session_secure_lookup: SSL Connection Accepted!
00:54:12.081 [26809] <16> session_dispatch: Request count = 1 tag = 118
00:54:12.389 [26809] <16> session_dispatch: Request count = 2 tag = 101
00:54:12.507 [26809] <8> retry_getnameinfo: [vnet_addrinfo.c:1146] getnameinfo() failed 8 0x8
00:54:12.507 [26809] <8> vnet_cached_getnameinfo: [vnet_addrinfo.c:1917] retry_getnameinfo() failed RV=8 IPSTR=10.226.34.166 PORT=56302
00:54:12.592 [26809] <16> newAuthenticate: bad pam_status 3, Error in underlying service module
00:54:12.595 [26809] <16> command_LOGON_TO_MSERVER: putenv(BPJAVA_MASTER_IPC_STRING=) failed
00:54:22.660 [26809] <16> readCharByChar: read failed(?), sockFd = 0, count = 0 (want = 3072),  errno = 131 = Connection reset by peer
00:54:22.660 [26809] <16> session_dispatch: recv failed, sockFd = 0, errno = 131 = Connection reset by peer

We see a status <8> getnameinfo failed immediately before some status <16> errors.

.

Are you sure that your NetBackup server naming is robust/good?

sdo · ‎02-22-2016

This poster:

https://www.veritas.com/community/forums/admin-console-opening-backup-archive-and-restore

...who also had:

newAuthenticate: bad pam_status 3, Error in underlying service module

...found that they had been attempting to logon to the wrong server name.

Sagar_Kolhe · ‎02-22-2016

Yeah ... According to me naming convention is good. Backups also happening through Master successfully. However If I want ot check the issue with the name , how can I proceed ?