cancel
Showing results for 
Search instead for 
Did you mean: 

NBU7.5.0.6: nbemmcmd hangs

Ivy_Yang
Level 6

hi,

I have a problem which tortured me several days.  thanks for your patience in advance!

we have a linux host lx0024nbumast(CentOS 5.9).

1)  It was a remote media server and worked well. but then due to some reason, we need to tranfer it to be NBU master server.

2) at first we just re-install NBU server software, but during the installation log, the following error showed up.

Populating the database tables.  This will take some time.
EMM interface initialization failed, status = 77

and after the install, the command "nbemmcmd -listhosts" hanged as below

[root@lx0024nbumast netbackup]# nbemmcmd  -listho
NBEMMCMD, Version: 7.5.0.6
Failed to initialize EMM connection.  Verify that network access to the EMM server is available and that the services nbemm and pbx_exchange are running on the EMM server. (195)
Command did not complete successfully.

3) we opened case with Symantec, and they suggest that  we add “REQUIRED_INTERFACE = 10.124.2.39” to bp.conf and restart NBU services as well as PBX, since we have multiple active NICs on the linux NBU master.  we tried that but didn't work.

4) and then I tried to un-install and re-install NBU7.5 again and again but with NO luck;

5) I had no choice, but decided to re-install OS, and then installed the box to be NBU master in a total refresh OS environment.

I made this yesterday: re-installed OS, installed NBU7.5 with single active NIC, upgrade to NBU7.5.0.6, enable all NICs, restart the OS and verify that everything seemed good!!!

6) but today the issue appears again!!!   I found only1 of 3 policy succeed, the other failed....and I looked into the NBU only to find that "nbemmcmd" hangs again........

anyone who can show me how to fix this annoying issue?

thank you very much !!!

 

17 REPLIES 17

Ivy_Yang
Level 6

1) and if I stop NBU services, "nbemm" can;t be killed by "netbackup stop" or bp.kill_all

 

2) this is my bp.conf

[root@lx0024nbumast netbackup]# cat bp.conf
SERVER = lx0024nbumast.active.local
CLIENT_NAME = lx0024nbumast.active.local
CONNECT_OPTIONS = localhost 1 0 2
USE_VXSS = PROHIBITED
VXSS_SERVICE_TYPE = INTEGRITYANDCONFIDENTIALITY
EMMSERVER = lx0024nbumast.active.local
HOST_CACHE_TTL = 3600
VXDBMS_NB_DATA = /usr/openv/db/data
OPS_CENTER_SERVER_NAME = ws0034nbops01.active.tan
LIST_FS_IMAGE_HEADERS = NO
VERBOSE = 0
SERVER_CONNECT_TIMEOUT = 60
CLIENT_CONNECT_TIMEOUT = 500
CLIENT_READ_TIMEOUT = 500
BPSTART_TIMEOUT = 500
BPEND_TIMEOUT = 500
TELEMETRY_UPLOAD = YES

3) this is my /etc/hosts

# that require network functionality will fail.
127.0.0.1       localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6


10.124.2.39     lx0024nbumast.active.local lx0024nbumast

Ivy_Yang
Level 6

I also run these:

[root@lx0024nbumast netbackup]# /usr/openv/db/bin/dbadm
                1)  Select/Restart Database and Change Password
                2)  Database Space and Memory Management
                3)  Transaction Log Management
                4)  Database Validation Check and Rebuild
                5)  Move Database
SQL Anywhere Validation Utility Version 11.0.1.2958
VALIDATE DATABASE
VALIDATE TABLE "ADTR_MAIN"."ADTR_AuditRecord"
VALIDATE TABLE "ADTR_MAIN"."ADTR_AuditRecord_Details"
VALIDATE TABLE "ADTR_MAIN"."ADTR_AuditRecord_Placeholder"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_ArchiveLog"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_Backup"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_Configuration"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_ControlFile"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_Controlfile_Schema"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_Database"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_Datafile"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_Datafile_Backups"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_Init_Params"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_Instance_Params_Map"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_RedoLog"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_Spfile"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_Tablespace"
....
VALIDATE TABLE "dbo"."sync_passthrough_script" 
VALIDATE TABLE "dbo"."sync_passthrough_status"
VALIDATE TABLE "rs_systabgroup"."rs_lastcommit"           
VALIDATE TABLE "rs_systabgroup"."rs_threads"       
No errors reported
(1) Down 5 (2) Bottom (3) Up 5 (4) Top (q) Quit

[root@lx0024nbumast netbackup]# /usr/openv/db/bin/nbdb_ping
Database [NBDB] is alive and well on server [NB_lx0024nbumast].

[root@lx0024nbumast netbackup]# /usr/openv/netbackup/bin/nbdbms_start_stop stop
[root@lx0024nbumast netbackup]# /usr/openv/netbackup/bin/nbdbms_start_stop start

Marianne
Level 6
Partner    VIP    Accredited Certified

Best to reinstall OS with RHEL or some other supported OS.

CentOS is only supported as media server, not Master. I am surprised that Support did not point this out to you...

See NetBackup 7 Operating System (CL) :    http://www.symantec.com/docs/TECH76648

 

Ivy_Yang
Level 6

thank you Marianne, but we have 9 NBU masters currently which are all CentOS .................so I don't think this is reason. but thank you for your reminder we will look into this.

as I mentioned , this box was used to be a NBU media server. I looked into the log and found the error "Machine name lx0024nbumast.active.local is not recognizable as media server, NDMP filer or cluster"

So I jump to another NBU master only to find that this box is still showing up in the "nbemmcmd -listhost" of that NBU master......not sure if this is the issue.....

 

[root@lx0024nbumast home]# vxlogview -o 111 -t 24:00:00 | grep Error
10/16/2013 04:00:45.184 [Error] V-111-1049 EMMServer generic error = machineName can't be null or empty
10/16/2013 04:00:48.644 [Error] V-111-1125 Machine name lx0024nbumast.active.local is not recognizable as media server, NDMP filer or cluster
10/16/2013 04:00:48.651 [Error] V-111-1049 EMMServer generic error = Configuration does not exist, retval = < 2001045 >
10/16/2013 05:58:24.235 [Error] V-111-1089 Time based query failed because some records were delete

 

 

 

Marianne
Level 6
Partner    VIP    Accredited Certified

Please post output of:

nbemmcmd -listhosts -verbose

nbemmcmd -getemmserver

And contents of these .conf files:

/usr/openv/db/data/vxdbms.conf 
/usr/openv/var/global/server.conf
/usr/openv/var/global/databases.conf

 

Ivy_Yang
Level 6

[root@lx0024nbumast /]# nbemmcmd -getemmserver
NBEMMCMD, Version: 7.5.0.6
Failed to initialize EMM connection.  Verify that network access to the EMM server is available and that the services nbemm and pbx_exchange are running on the EMM server. (195)
Command did not complete successfully.


[root@lx0024nbumast /]# nbemmcmd -listhosts -verbose
NBEMMCMD, Version: 7.5.0.6
Failed to initialize EMM connection.  Verify that network access to the EMM server is available and that the services nbemm and pbx_exchange are running on the EMM server. (195)
Command did not complete successfully.

 

[root@lx0024nbumast ~]# cat /usr/openv/db/data/vxdbms.conf
VXDBMS_NB_SERVER = NB_lx0024nbumast
VXDBMS_NB_PORT = 13785
VXDBMS_NB_DATABASE = NBDB
VXDBMS_AZ_DATABASE = NBAZDB
VXDBMS_NB_DATA = /usr/openv/db/data
VXDBMS_NB_INDEX = /usr/openv/db/data
VXDBMS_NB_TLOG = /usr/openv/db/data
VXDBMS_NB_STAGING = /usr/openv/db/staging
VXDBMS_NB_PASSWORD = 4c9896bf030687f895c7f3090d7368bf7c89fc9fbe23411c
AZ_DB_PASSWORD = Jj8mkP3sKTo=


[root@lx0024nbumast ~]# cat /usr/openv/var/global/server.conf
 -n NB_lx0024nbumast
   -x tcpip(LocalOnly=YES;ServerPort=13785)  -gp 4096 -gd DBA -gk DBA -gl DBA -ti 0 -c 100M -ch 1024M -cl 100M -zl -os 1M -m -o /usr/openv/db//log/server.log
 -ud


[root@lx0024nbumast ~]# cat /usr/openv/var/global/databases.conf
"/usr/openv/db/data/NBDB.db" -n NBDB
"/usr/openv/db/data/NBAZDB.db" -n NBAZDB

Marianne
Level 6
Partner    VIP    Accredited Certified

I see inconsistency as far as hostnames are concerned.

bp.conf contains FQDN:

SERVER = lx0024nbumast.active.local
EMMSERVER = lx0024nbumast.active.local

EMM .conf files have all shortnames:

VXDBMS_NB_SERVER = NB_lx0024nbumast

 -n NB_lx0024nbumast

How did this happen?

Please check responses in installation log: /usr/openv/tmp/nstall_trace.####

It seems nbemm is not running. Please do the following:

Stop NBU. Check that all processes terminate.

Start NBU. Check processes. Wheen nbemm stops, run 'vxlogview -o 111 -t 00:10:00' and post output.

I suspect the problem is because of inconsistent naming convention.
 

Ivy_Yang
Level 6

during the last installation I saw "Database server is NB_lx0024nbumast" in the installation log.

but I didn;t realized that it will casue inconsistency issue.

 

let me re-install to use short names in all the prompts....

 

Marianne
Level 6
Partner    VIP    Accredited Certified

So, shortname was specified as database server but FQDN as master server, right?

Names need to be consistent during installation. 

I prefer shortnames during installation. So, when installation promps to use FQDN, I say no, then provide shortname.  Always easy to add FQDN as alias later on.
We see way too many users on this forum needing to change domain name. If master was installed with FQDN, it is seen as hostname change which needs consulting.

trv
Level 4
Certified

I'am not sure you are right. We always install NBU with FQDNs and the installer will put shortname in vxdbms.conf nad server.conf files everytime. It may be because of sybase db name requirements or something - no dots etc. allowed for example.

Ivy_Yang
Level 6

shortname was specified as database server but FQDN as master server, right?  ======>right

 

but I am not sure if this is the root cause, since we use FQDN often.

 

but since CentOS is not supported, so I plan to reinstall it to oracle linux and re-install NBU again~~~~

plus change the IP so as to cut off everything with another NBU master(remember I mentioned that it was a media server)..........

 

thanks everybody for your help !!!really appreicated

Marianne
Level 6
Partner    VIP    Accredited Certified

On a side note - was this server properly decommissioned from original master server?

Jaime_Vazquez
Level 6
Employee

Having just worked a similar support case situation, try this:

On the hanging client, run "ifconfig -a" to get ip address information of all interfaces.

Do a host name lookup of each IP and ensure they are valid.  Then do a IP lookup of the host names returned from the reverse lookup. This is called "endpoint selection" process.  From what I can understand, nbemm does this at startup time. If any of this is 'bad', then it will fail with rc=77.

nslookup $ip_addess

Using the returned hostname:  nslookup $hostname

They must match up.

What is happening is that nbemm is checking see if something is local to it or not.  If it feels it is not local to it it will try to contact the hostname, assuming it is external to it.  If the value is invalid, the end to end connection is not made and the initialization fails.

In my case I had a customer who configured a VLAN with an IP address that was not set up in their '/etc/hosts' file or in DNS.  But the DNS response did not fail, but rather returned a bogus host name value.  That was the IP address in reverse order plus the domain name.

"nslookup 1.2.3.4"  returned "4.3.2.1.domain_name".  A effort to contact server "4.3.2.1.domain_name"  naturally failed. rc=77.

 

 

 

Ivy_Yang
Level 6

still not.

so we decide to change both the hostname and IP address so that cut off everything with original master server......

then start to install NBU on a supported OS instead of Centos....

thanks everybody

Marianne
Level 6
Partner    VIP    Accredited Certified

Suspect Jaime's post may be very relevant - some incorrect DNS entry perhaps?

This server was also never decommissioned from original master server which could add to issues seen, not to mention unsupported OS for master server....

Curious to see if new OS installation with new hostname and IP address fixed the issue.

KBN
Level 3

hi marrianne,

 

   Here i am attaching ipconfig /all screen shots.

   please find the attachments.

   This is my mail id kbnaidu13@gmail.com

   please may i know u r mail id.

 

Thanks

Naidu 

 

hostname :KBN-PC

 

 

Marianne
Level 6
Partner    VIP    Accredited Certified