cancel
Showing results for 
Search instead for 
Did you mean: 

"Unable to obtain the device list from the EMM server. Unable to connect to the EMM server (77)"

weigojmi
Level 3

Hello,

We are constantly but seemingly randomly getting this popup message during backups which results in jobs failing with status 13,23,24, etc. We have troubleshooted the network side and no changes have helped.  We have tried some media server timeout setting changes to no avail.  Any suggetions would be much appreciated.

Thanks,

Jamie

13 REPLIES 13

Sujay24
Level 4
Employee

Technical Solution: Unable to connect to the EMM server (77) after network changes

To resolve the issue:

1. Flush the cache by running:

UNIX platform: /usr/openv/netbackup/bin/bpclntcmd -clear_host_cache

Windows platform: <install dir>\Veritas\NetBackup\bin\bpclntcmd -clear_host_cache

2. Restart NetBackup daemons on server:-

For Windows:-

<install dir>\Veritas\NetBackup\bin\bpdown -v -f

Restart PBX:- services.msc-->Symantec Private Branch Exchange->Stop/Start or Restart service

<install dir>\Veritas\NetBackup\bin\bpup -v -f

For Linux/Unix:-

/usr/openv/netbackup/bin/goodies/netbackup stop

/opt/VRTSpbx/bin/vxpbx_exchanged stop

/opt/VRTSpbx/bin/vxpbx_exchanged start

/usr/openv/netbackup/bin/goodies/netbackup start

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

The fact that the errors are intermittent says to me that there were no 'recent network changes', right?

Tell us more about master and media servers - same location and subnet or media server(s) at remote location?
Are you using hosts files or DNS for comms?
Are all servers (master and media) W2008 or any other OS's as well?

Are errors seen specifically during high load?
Are OS updates up to date? (There have been specific OS-related issues with W2008 under high I/O)

What kind of network troubleshooting has been done?
What about intermittent DNS issues?
Any continuous monitoring during backup time? 
Something like a continuous ping?
What about resource monitoring on the master? (memory, cpu, network)

Have you enabled any kind of logs that may indicate 'emm heartbeat' timeouts?
See this TN for logs that are needed to troubleshoot emm comms:
https://www.veritas.com/support/en_US/article.100021074

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

One more thing - status 23 and 24 are 100% network-related issues.

Herewith good explanation of these errors:

https://vox.veritas.com/t5/NetBackup/Error-23/m-p/738836#M201891

Has anyone EVER encountered a network team who were prepared to admit that the problem might be network-related?

Oh! And hopefully you are aware that your NBU version ran out of support about a year ago?
Not that the NBU version is causing the problems, it just means that you cannot log a support call with Veritas.
Support has tools to assist with network troubleshooting.

mph999
Level 6
Employee Accredited

Has anyone EVER encountered a network team who were prepared to admit that the problem might be network-related?

Nope, I don't think so ....

 

Hi all,

Thanks for the replies.  Both NICS on the master server/media sever are now changed to 1000 FULL on server and switch side (from auto).  Since failures are with so many clients we will check those as a future test.  I also cleared the cache as advised as well. 

Regarding the other questions:

- Master and media server are same box

- Using DNS and it appears to be working fine

- We have 2008 and 2012 in our environment

- I've noticed no load issues during the failures or as backups are running.

- OS updates are current.

- Regarding network troubleshooting just verifying if there are any port errors and we've tried some different teaming settings.  I mentioned current status on that above.  We have changed master timout settings (I know probably il advised).  Here are current settings:

Client connect: 3600

Client read: 3600

Backup start...: 300

Backup end...: 300

File browse: 1800

Media server connect: 30

Use OS dependent: "not checked"

 

- No continuous monitoring yet.  Partly because of how intermittent this issue is, I am often looking at the console as they fail and have checked connectivity.  There never seems to be any visible issues at the time.

-  No logs enabled yet, is your "heartbeat" suggestion still valid if master/media is same box?

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Thanks for the additional information :

.... Both NICS on the master server/media sever  ....

Master and media server are same box

All very important info that we did not know about.

The 2 NICs  - can we assume that they are not bonded and installed for different purposes?
Do they have IP addresses on different subnets/VLans?
And linked to different hostnames?

This is important as the master/media also communicates between different processes via TCP/IP ports. Even though it's on the same server.
If the IP's for the 2 NICs are on the same subnet and linked to the same hostname, then TCP/IP is going to 'round robin' between the NICs/IPs, causing major issues, such as request going out on one IP with response back from the other IP. The outgoing request is in the meantime expecting response on same IP.

Please confirm config on the master/media with the following commands:

ipconfig /all

blclntcmd -self    (in ...\netbackup\bin)
nbemmcmd -listhosts -all     (in ...\netbackup\bin\admincmd)
nbemmcmd -getemmserver

The logs will be handy to trace internal comms.
We once picked up an error with NIC config on a media server via admin log on the master server (request going to one IP on the media server and response coming back from different IP). 

I also remember some time back where @RiaanBadenhorst picked up an issue with 2 NICs on a master and PBX comms... will see if I can find it. 

 **** Found it ****

https://vox.veritas.com/t5/NetBackup/PBX/m-p/418924

 

Hi, so...

The 2 NICs  - can we assume that they are not bonded and installed for different purposes?

Correct.  Only 1 IP addresses is configured.  These are HP servers using their network teaming software to handle how thye work together.  This software is used on all master/media servers in our enterprise and we do see a few of these status failures from time to time but nothing like this.  I have attached a screenshot for more info.  As a test we could dissolve the team and ensure only 1 of the 2 could ever be used but I would expect the current "NFT" config we have is effectively doing thesame thing

Do they have IP addresses on different subnets/VLans?
And linked to different hostnames?

No.

 

ipconfig /all

Windows IP Configuration

   Host Name . . . . . . . . . . . . : US010099
   Primary Dns Suffix  . . . . . . . : schaeffler.com
   Node Type . . . . . . . . . . . . : Peer-Peer
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No
   DNS Suffix Search List. . . . . . : root.hld
                                       schaeffler.com
                                       de.ina.com
                                       emea.ina.com
                                       emea.**bleep**.com
                                       emea.luk.com
                                       na.luk.com
                                       na.**bleep**.com
                                       na.ina.com
                                       sa.ina.com
                                       ap.ina.com
                                       ap.**bleep**.com
                                       ina.com
                                       luk.com
                                       **bleep**.com
                                       lat-suhl.de

Ethernet adapter Local Area Connection 5:

   Connection-specific DNS Suffix  . : schaeffler.com
   Description . . . . . . . . . . . : HP Network Team #1
   Physical Address. . . . . . . . . : 2C-44-FD-81-14-18
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::61c4:419c:fecb:91af%19(Preferred)
   IPv4 Address. . . . . . . . . . . : 10.217.2.198(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Lease Obtained. . . . . . . . . . : Tuesday, January 02, 2018 1:30:04 PM
   Lease Expires . . . . . . . . . . : Monday, February 11, 2154 3:16:50 PM
   Default Gateway . . . . . . . . . : 10.217.2.254
   DHCP Server . . . . . . . . . . . : 10.217.2.205
   DHCPv6 IAID . . . . . . . . . . . : 506217725
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-20-2C-D8-B5-2C-44-FD-81-14-18
   DNS Servers . . . . . . . . . . . : 10.217.2.205
                                       10.217.2.218
                                       10.216.2.246
   Primary WINS Server . . . . . . . : 10.217.2.193
   Secondary WINS Server . . . . . . : 10.160.160.73
   NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter Local Area Connection 4:

   Media State . . . . . . . . . . . : Media disconnected
   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : HP Ethernet 1Gb 4-port 331FLR Adapter #4
   Physical Address. . . . . . . . . : 2C-44-FD-81-14-1B
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes

Ethernet adapter Local Area Connection 3:

   Media State . . . . . . . . . . . : Media disconnected
   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : HP Ethernet 1Gb 4-port 331FLR Adapter #3
   Physical Address. . . . . . . . . : 2C-44-FD-81-14-1A
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes

Tunnel adapter isatap.schaeffler.com:

   Media State . . . . . . . . . . . : Media disconnected
   Connection-specific DNS Suffix  . : schaeffler.com
   Description . . . . . . . . . . . : Microsoft ISATAP Adapter
   Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes

Tunnel adapter isatap.{2A5119FB-C5EE-4556-AED2-5105BAAF1105}:

   Media State . . . . . . . . . . . : Media disconnected
   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Microsoft ISATAP Adapter #2
   Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes

Tunnel adapter isatap.{FB136663-96B7-4A31-87AF-1AE45EB73772}:

   Media State . . . . . . . . . . . : Media disconnected
   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Microsoft ISATAP Adapter #3
   Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes

 

bplclntcmd -self

gethostname() returned: nbuus002
host nbuus002: us010099.schaeffler.com at 10.217.2.198
aliases:     us010099.schaeffler.com     nbuus002     10.217.2.198
getfqdn(nbuus002) returned: us010099.schaeffler.com

 

nbemmcmd -listhosts -all     

NBEMMCMD, Version: 7.6.0.3
The following hosts were found:
server           nbuus002
master           nbuus002
virtual_machine  de012418vc.schaeffler.com
Command completed successfully.

 

nbemmcmd -getemmserver

NBEMMCMD, Version: 7.6.0.3
These hosts were found in this domain: nbuus002

Checking with the host "nbuus002"...

Server Type    Host Version        Host Name                     EMM Server
MASTER         7.6                 nbuus002                      nbuus002

Command completed successfully.

woops forgot attachement...

Is it possible AV could be a factor?  We have the recommended excludes set but in the past we had success with 636 errors when we removed it as a test.  And I'm also seeing 636 again this morning along with the others already mentioned.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

I doubt that AV will cause these errors. Even 636 is network-related. 
It is recommended though to exclude NBU binaries/services from AV scanning.

Have you had a chance to go through @RiaanBadenhorst's post as suggested a few days ago?
Have you enabled logs as suggested?

In all honesty - I am not a network expert, but one thing I know is that IPV6 on 7.6 is not supported.
Is it possible to disable IPV6 and DCHP? 

Something else that I have noticed is that the hostname and NBU name is different:

Host Name . . . . . . . . . . . . : US010099

host nbuus002: us010099.schaeffler.com at 10.217.2.198

Do you have hosts entry for host nbuus002 ?
10.217.2.198  nbuus002   us010099.schaeffler.com

If not, please do so. 

And about status 636 - 
Adjusting KeepAlive normally solves the problem.
Ensure master and all clients have the same setting.

See: 
https://vox.veritas.com/t5/NetBackup/Error-636/td-p/770527
https://vox.veritas.com/t5/NetBackup/Having-trouble-with-636-status-code/m-p/654987#M170327
https://www.veritas.com/support/en_US/article.TECH202675

 

Hi, yes I read his thread but it didn't appear relevant since he was using multiple IP addresses...?

To be clear you are suggesting to create the ltid log, correct?  Looks like I need to create the debug folder as well if it doesn't exist?

I disabled IPV6 but oddly enough I reran a bunch of jobs and almost immediately got the status 77 popup and nothing is currently backing up.  bpbrm seems hung even after services stop and had to kill them manually.

B:\VERITAS\NetBackup\bin>bpps
* US010099                                               1/08/18 16:25:46.764
COMMAND           PID      LOAD             TIME   MEM                  START
NbWin            6944    0.000%            3.681   29M   1/03/18 13:33:49.316
bpbrm            3964    0.000%            0.296   17M   1/05/18 19:00:03.270
bpbrm            8672    0.000%            0.140   17M   1/05/18 21:35:36.467
bpbrm            6084    0.000%            0.421   17M   1/05/18 23:30:02.187
bpbrm           10664    0.000%            0.202   17M   1/06/18 19:00:05.624
bpps             4816    0.000%            0.218  6.7M   1/08/18 16:25:45.532

 

We use that nbu alias in many locations.  I have added the line to the hosts file as suggested.

Keepalive is currently at 300000 from previous troubleshooting of 636.  Are you suggesting to try 900000?  Also, its only set on the media server at the moment. per that TN

Circling back a bit...if we take this repeated error at face value its saying that netbackup loses sight of the device list from the EMM server which is itself when one server is both master and media server?  Even though the resultant job errors are "network errors" doesn't this point to some sort of internal communcation/connection issues?

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

@weigojmi wrote:

Circling back a bit...if we take this repeated error at face value its saying that netbackup loses sight of the device list from the EMM server which is itself when one server is both master and media server?  Even though the resultant job errors are "network errors" doesn't this point to some sort of internal communcation/connection issues?


Yes, as per my post of a couple of days ago:

....  the master/media also communicates between different processes using TCP/IP ports. Even though it's on the same server.

There is something in TCP config on the master/media server that is causing inter-process comms to get confused/lost. 

I have tried making some suggestions with my very limited networking knowledge.
If nobody else can contribute further, your only other option is to enable logging and see if that helps in any way.
ltid logging needs directories to be created, VERBOSE entry in vm.conf followed by NBU Device Management Service.
Unified logs such as nbemm needs logging to be increased with vxlogcfg command. The logs then need to be read with vxlogview command.