cancel
Showing results forΒ 
Search instead forΒ 
Did you mean:Β 

Netbackup Solaris 10 Media Server issue

Nizmo
Level 3

Hi guys just installed a Netbackup media server.

However I have hit a problem and cant seem to resolve it.

Media server is installed and can be seen by the master backup server though Host properties. It also connects and shows the correct version which is 6.5.2..

Problem I have is when I run the device configuration I get GlobalDB not Set

Also if I go to Media servers and click on the new server and then click the stop/restart Media Manager Device demon it hangs and then comes back with cannot connect. i have also seen unknown.

I have checked the Bp.conf which is fine. it lists the master server and media servers. Master backup server also has the media server listed in its bp.conf.


I have seen some unusually errors in BPCD logs.

13:11:48.913 [26951] <2> ListenForConnection: Unexpected working directory: /root
13:11:48.914 [26951] <2> bpcd main: offset to GMT 0
13:11:48.914 [26951] <2> logconnections: BPCD ACCEPT FROM xx xx xx xx .47881 TO xx.xx.xx.xx.xx. .13724
13:11:48.914 [26951] <2> bpcd main: setup_sockopts complete
13:11:48.915 [26951] <2> bpcd peer_hostname: Connection from host xxxxx(xx.xx.xx.xx) port 47881
13:11:48.915 [26951] <2> bpcd valid_server: comparing xxxxxxx and xxxxxx
13:11:48.915 [26951] <4> bpcd valid_server: hostname comparison succeeded
13:11:48.915 [26951] <2> bpcd main: output socket port number = 1
13:11:48.929 [26951] <2> bpcd main: Duplicated vnetd socket on stderr
13:11:48.929 [26951] <2> bpcd main: <---- NetBackup 6.5 0 ------------initiated
13:11:48.929 [26951] <2> bpcd main: VERBOSE = 0
13:11:48.929 [26951] <2> bpcd main: Not using VxSS authentication with xxxxxxxxxxxxx
13:11:49.027 [26951] <2> bpcd main: BPCD_GET_PROCESS_STATUS_RQST
13:11:49.027 [26951] <2> find_processes: total malloc-ed = 4096
13:11:49.038 [26951] <2> is_it_a_keeper: This proc's zone (1) DOES NOT match my zone (0).

13:11:49.041 [26951] <2> is_it_a_keeper: This proc's zone (3) DOES NOT match my zone (0).

13:11:49.041 [26951] <2> is_it_a_keeper: This proc's zone (2) DOES NOT match my zone (0).

13:11:49.053 [26951] <2> get_unxwre_processes: 329 files considered, 327 files opened/read
13:11:49.053 [26951] <2> find_processes: progs_list_size = 4096 == total calculated size = 4096 == (total_space_used = 1261 + whats_left = 2835)
13:11:49.053 [26951] <2> find_processes: realloc down to 1261
13:11:49.053 [26951] <16> bpcd main: strlen(pProcList) = 1260
13:11:49.053 [26951] <16> bpcd main: char_count = 1261, .line_count = 11
13:11:49.134 [26951] <2> bpcd main: BPCD_DISCONNECT_RQST
13:11:49.134 [26951] <2> bpcd exit_bpcd: exit status 0 ----------->exiting
13:11:53.674 [26956] <2> ListenForConnection: Unexpected working directory: /root
13:11:53.674 [26956] <2> bpcd main: offset to GMT 0
13:11:53.674 [26956] <2> logconnections: BPCD ACCEPT FROM xxxxxxxxxxxxx.59289 TO xxxxxxxxxxxxxx.13724
13:11:53.674 [26956] <2> bpcd main: setup_sockopts complete
13:11:53.675 [26956] <2> bpcd peer_hostname: Connection from host xxxxxxxxxxxxxxx(xxxxxxxxxxxxx) port 59289
13:11:53.675 [26956] <2> bpcd valid_server: comparing xxxxxand xxxxxxxx
13:11:53.676 [26956] <2> bpcd valid_server: comparing xxxxxxxxx and xxxxxxxxxx
13:11:53.676 [26956] <2> bpcd valid_server: comparing xxxxxxxxxx and xxxxxxxxxx
13:11:53.676 [26956] <2> bpcd valid_server: comparing xxxxxxxxxxxx and xxxxxxxx
13:11:53.676 [26956] <4> bpcd valid_server: hostname comparison succeeded
13:11:53.676 [26956] <2> bpcd main: output socket port number = 1
13:11:53.681 [26956] <2> bpcd main: Duplicated vnetd socket on stderr
13:11:53.681 [26956] <2> bpcd main: <---- NetBackup 6.5 0 ------------initiated
13:11:53.681 [26956] <2> bpcd main: VERBOSE = 0
13:11:53.681 [26956] <2> bpcd main: Not using VxSS authentication with xxxxxxxxxxxxxxxx
13:11:53.681 [26956] <2> bpcd main: BPCD_GET_PROCESS_STATUS_RQST
13:11:53.681 [26956] <2> find_processes: total malloc-ed = 4096
13:11:53.692 [26956] <2> is_it_a_keeper: This proc's zone (1) DOES NOT match my zone (0).

13:11:53.695 [26956] <2> is_it_a_keeper: This proc's zone (3) DOES NOT match my zone (0).

13:11:53.695 [26956] <2> is_it_a_keeper: This proc's zone (2) DOES NOT match my zone (0).

13:11:53.699 [26956] <8> get_unxwre_processes: ignoring open error of file /proc/26957/psinfo, (2):No such file or directory
13:11:53.707 [26956] <2> get_unxwre_processes: 332 files considered, 329 files opened/read
13:11:53.707 [26956] <2> find_processes: progs_list_size = 4096 == total calculated size = 4096 == (total_space_used = 1404 + whats_left = 2692)
13:11:53.707 [26956] <2> find_processes: realloc down to 1404
13:11:53.707 [26956] <16> bpcd main: strlen(pProcList) = 1403
13:11:53.707 [26956] <16> bpcd main: char_count = 1404, .line_count = 12
13:11:53.708 [26956] <2> bpcd main: BPCD_DISCONNECT_RQST
13:11:53.708 [26956] <2> bpcd exit_bpcd: exit status 0 ----------->exiting


The process seems to be stuck in a loop and keep going on online then off.
The below worries me a little. but im not sure if its enough to cause issuse. We have got zones configured on this server.


13:11:53.692 [26956] <2> is_it_a_keeper: This proc's zone (1) DOES NOT match my zone (0).

13:11:53.695 [26956] <2> is_it_a_keeper: This proc's zone (3) DOES NOT match my zone (0).

13:11:53.695 [26956] <2> is_it_a_keeper: This proc's zone (2) DOES NOT match my zone (0).


Tpommand logs show something interseting aswell. The cobra error doesnt seem right but i cant tell whats causing it.


07:50:30.349 [18020] <4> tpconfig: emmserver_name = xxxxxxxxxxxxxx
07:50:30.349 [18020] <4> tpconfig:main(): ./tpconfig
07:50:30.349 [18020] <2> mm_getnodename: cached_hostname xxxxxxxxxxxxx, cached_method 3
07:50:30.350 [18020] <2> mm_getnodename: (3) hostname xxxxxxxxxxxxx(from mm_master_config.mm_server_name)
07:50:30.350 [18020] <4> InitThisHostName: ThisHost is xxxxxxxxxx
07:50:30.350 [18020] <4> tpconfig: emmserver_port = 1556
07:50:30.359 [18020] <2> Orb::init: initializing ORB Default_CLIENT_Orb with: tpconfig -ORBSvcConfDirective "-ORBDottedDecimalAddresses 0" -ORBSvcConfDirective "static PBXIOP_Factory ''" -ORBSvcConfDirective "static EndpointSelectorFactory ''" -ORBSvcConfDirective "static Resource_Factory '-ORBProtocolFactory PBXIOP_Factory'" -ORBSvcConfDirective "static Resource_Factory '-ORBProtocolFactory IIOP_Factory'" -ORBSvcConfDirective "static PBXIOP_Evaluator_Factory '-orb Default_CLIENT_Orb'" -ORBSvcConfDirective "static Resource_Factory '-ORBConnectionCacheMax 1024 '" -ORBSvcConf /dev/null -ORBSvcConfDirective "static Server_Strategy_Factory '-ORBMaxRecvGIOPPayloadSize 268435456'"(Orb.cpp:736)
07:51:00.527 [18020] <2> init_cache: vnet_hosts.c.915: host_cache_size: 200 0x000000c8
07:51:00.527 [18020] <2> init_cache: vnet_hosts.c.916: cache_time: 3600 0x00000e10
07:51:00.527 [18020] <2> init_cache: vnet_hosts.c.928: host_failed_cache_size: 40 0x00000028
07:51:00.527 [18020] <2> init_cache: vnet_hosts.c.929: cache_time: 3600 0x00000e10
07:51:00.527 [18020] <2> init_cache: vnet_hosts.c.915: host_cache_size: 200 0x000000c8
07:51:00.527 [18020] <2> init_cache: vnet_hosts.c.916: cache_time: 3600 0x00000e10
07:51:00.527 [18020] <2> init_cache: vnet_hosts.c.928: host_failed_cache_size: 40 0x00000028
07:51:00.527 [18020] <2> init_cache: vnet_hosts.c.929: cache_time: 3600 0x00000e10
07:52:15.897 [18020] <16> emmlib_initializeEx: (-) Exception! CORBA::TIMEOUT
07:52:15.897 [18020] <16> emmlib_initializeEx: (-) system exception, ID 'IDL:omg.org/CORBA/TIMEOUT:1.0'
TAO exception, minor code = 3e (timeout during send; low 7 bits of errno: 62 Timer expired), completed = NO
07:54:21.491 [18020] <4> pdconfig_get_list: Got EMM configuration information: <xxxxxxxxxxxxxxxxx>


Any help guys would be great. Please note i have a symantec engineer working on this sisnce monday but still no resolution.


Cheers
Z
1 ACCEPTED SOLUTION

Accepted Solutions

mph999
Level 6
Employee Accredited
Make sure every network interface is resoveable by the media server and evrery net interface on the media is resolvable by the master.

Martin

View solution in original post

16 REPLIES 16

netbackup_rooki
Level 4
Certified
was it added to using nbemmcmd -addhost do you see the media server nbemmcmd -listhosts. can you check if you can telnet <hostname> 1556 (pbx port) from both master to media and media to server.

Nizmo
Level 3
Hi initially the server added it self... but as testing has been going on i have removed it and re-added it using -addhost cmd.

Telnet works both ways.

-Z

Amit_Karia
Level 6
can you run vmglob -get_gdbhost command and send the o/p

Nizmo
Level 3

it comes back with the master backup server name. which i expect.





Zaka

Amit_Karia
Level 6

Have you added MEDIA_SERVER in bp.conf of master server, also let us know if you are just doing device configuration of newly added media server or for every thing

 

Nizmo
Level 3
Amit ,
The Bp.conf on the master server has the correct entries. this has been checked multiple times...

the actuall server connects fine thorugh the media server list.

when running device config. i am only running it on the new media server which comes back with globaldb hostname has not been set mm status 4

mph999
Level 6
Employee Accredited
Make sure every network interface is resoveable by the media server and evrery net interface on the media is resolvable by the master.

Martin

netbackup_rooki
Level 4
Certified
is the media server name exist in VM.conf

MM_SERVER_NAME = media server name

Nizmo
Level 3
Network interface seem ok. The VM.conf has the MM_Server_name entry which is the media server itself.


im running out of ideas now and cant seem to get this bugger to work.


Z

Srikanth_Gubbal
Level 6
Certified
i doubt you globalDB is not updated properly

check if you see your media server from the output of nbemmcmd -listhosts
if yes
then excecute nbemmcmd -getemmserver
check if your media servers are listed along with their emmserver name(master server)

get me that result.

Nizmo
Level 3

Hi Srikanth
on the media server if i run ./nbemmcmd -listhosts or -getemmsever  it just hangs

 

on the master server -listhosts shows the media server..

 

 

Srikanth_Gubbal
Level 6
Certified
hi,

logs stated above are similar to the logs highligted, can you check this workaround and let me know.

nbevtmgr service would fail to start and bpstulist or nbemmcmd command would hang even though nbemm daemon is running.
--------------------------------------------------------------------------------
Details:
If customer is getting the error as per Technote in 295171 but the same issue persist where nbevtmgr service still fail to start, then check the following log:

Admin log shows:

18:58:24.614 [8140.2700] <4> db_getSTUNITlist: emmserver_name = TEST01
18:58:24.614 [8140.2700] <4> db_getSTUNITlist: emmserver_port = 1556

19:07:40.572 [8140.2700] <2> TAO: TAO (8140|2700) PBXIOP connection to peer <10.2.1.10:1556> on 440
19:07:40.572 [8140.2700] <2> TAO: TAO (8140|2700) - Leader_Follower[19097104]::wait_for_event, (leader) exit reactor event loop
19:07:40.572 [8140.2700] <2> TAO: (8140|2700) PBXIOP_Connection_Handler::send_service_id error receiving ack: (errno: Function not implemented)
19:07:40.572 [8140.2700] <2> TAO: TAO (8140|2700) - TAO_Transport_cleanup_queue_i[440], cleaning up complete queue
19:07:40.572 [8140.2700] <2> TAO: TAO (8140|2700) - PBXIOP_Connector::make_connection, connection to <TEST01:1556:EMM> failed (errno: Function not implemented)
19:07:40.572 [8140.2700] <16> emmlib_initializeEx: (-) Exception! CORBA::TRANSIENT
19:07:40.572 [8140.2700] <16> emmlib_initializeEx: (-) system exception, ID 'IDL:omg.org/CORBA/TRANSIENT:1.0'

OMG minor code (2), described as '*unknown description*', completed = NO
19:07:45.558 [8140.2700] <16> db_getSTUNITlist: (-) Translating EMM_ERROR_CorbaTransient(3000001) to 25 in the NetBackup context
19:07:45.558 [8140.2700] <16> db_getSTUNITlist: error (25) initializing EMM interface.
19:07:45.558 [8140.2700] <16> bpstsinfo/update ERROR: unable to retrieve stu list from server (25)


Unified nbevtmgr log (231)
12/11/2008 21:14:17.618 [Debug] NB 51216 137 PID:7356 TID:4408 File ID:231 [No context] 1 [TAO] Error during load of "E:\Program Files\VERITAS\NetBackup\var\TaoNotifSvcTopologyCache.xml".

Once it is confirmed that it was having issue loading the TaoNotifSvc*.* file, then this file could be corrupted.

To fix this, please do the following:

1. Stop all Netbackup services
2. Stop Symantec Private Branch Exchange (PBX)
3. Move all TaoNotifSvc*.* files from ..\Netbackup\var to a different location.
4. Start Symantec Private Branch Exchange (PBX)
5. Start Netbackup daemons
6. Event Manager should start now.
7. If issue persist, collect the nbevtmgr (OID 231) log for further investigation.

let me know if this works




Srikanth_Gubbal
Level 6
Certified
hi,
if the above workaround does not work; check this one;

emm connection failure on new media server. The command "nbemmcmd -getemmserver" on a new UNIX based NetBackup 6.5 media server produces the error: "Failed to initialize EMM connection".
--------------------------------------------------------------------------------
Details:
ISSUE:
The command "nbemmcmd -getemmserver" on a new UNIX based NetBackup 6.5 media server produces the error: "Failed to initialize EMM connection"

ERROR CODE/MESSAGE:
Failed to initialize EMM connection. Verify that network access to the Enterprise Media Manager EMM server is available and that the services nbemm and pbx_exchange are running on the EMM server. (195)

DIAGNOSIS:
- The new media server hostname has already been added to the SERVER list of the EMM server and master server
- The new media server hostname has already been added to EMM Database on the EMM server via the "nbemmcmd -addhost -machinename newhost -machinetype media ..." command. Reviewing the output from the "nbemmcmd -listhost" command confirmed the correct new media server hostname has been entered
- Running "bpclntcmd -pn" command on the new media server confirmed the master server was able to resolve the new media server's hostname and ip address correctly
- telnet to master server's PBX port 1556 also OK
- The following errors were reported in admin debug log on the new media server :
... <32> Orb::init: USE_VXSS is required, but we did not retrieve a credential from disk. Fatal exception will be thrown
... <8> Orb::init: CORBA exception: system exception, ID 'IDL:omg.org/CORBA/NO_PERMISSION:1.0'
OMG minor code (0), described as '*unknown description*', completed = NO
during orb initialization
... <16> emmlib_initializeEx: (-) Exception! CORBA::NO_PERMISSION
... <2> emmlib_initialize: (-) remote server does not authorise connection
... <16> nbemmcmd: Run time failure: Failed to initialize EMM connection. Verify that network access to the EMM server is available and that the services nbemm and pbx_exchange are running on the EMM server. (195)
However, master server was not configured to use VxSS

SOLUTION/WORKAROUND:
The problem was resolved after the following entry was added to the local /etc/hosts file on the new media server and then NetBackup was restarted:
127.0.0.1 localhost

Nizmo
Level 3
Guys quick update... the issue was actually with the servers DNS entries. they were incorrect hence causing netbackup issue.

Amended the resolve.conf to with the entries that the master backup server uses... Server is now configured and backing up


Thanks for every ones help..

Zaka

Stumpr2
Level 6

mph999 gave good information early in this thread.
"Make sure every network interface is resoveable by the media server and evrery net interface on the media is resolvable by the master."

Pay close attention to suggestions offered and it may decrease your downtime. Martin was spot on.

Stumpr2
Level 6

mph999 was first to determine the solution to be resolving the server names and IP addresses