Forum Discussion

Alun's avatar
Alun
Level 4
4 years ago

Inter Site SLP fails but Intra Site SLP succeeds

I have two datacenters, with a clustered master server node, media servers and an SSO connected tape library in each.

I migrated our server infrastructure from older hardware running Windows Server 2008 R2 to newer servers running Windows Server 2012 R2 (Master Server Cluster Nodes) or 2016 (Media Servers), the IP addresses from the old media servers were re-used for the new servers

Since the re-platform, existing SLP duplications between the two datacenters fail.

The required media is loaded into the tape drives in each site, the server hosting the images to be duplicated queues the restore, the target server queues the backup job but the two servers never successfully initiate the communication channel required for the duplication to proceed.

If I create an SLP to duplicate from server 1 to server 2 in the same site, the duplication completes successfully.

The required media is loaded into the tape drives in the site, the server hosting the images to be duplicated queues the restore, the target server queues the backup job, the servers establish communication and the duplication successfully completes.

Can anyone explain the actual processes that initiate SLP duplication, what the process flow is and what to look for when comparing the differences between the successful intra-site and the unsuccessful inter-site duplications?

Thanks,

Alun

  • We were never able to satisfactorily resolve this issue, instead we chose to duplicate within each datacenter and complete all of the outstanding SLPs in that fashion.

    Thanks for your advice and assistance, apologies for not replying sooner.

  • Hi Alun 

    Can you confirm you are looking for the differences between SLP controlled duplication within one site (which is working) and AIR replication between sites (which is not working)?

    If nothing else has changed other than the re-platform, then I'd first be looking at firewall configurations (local windows firewall). How did you go about the replatform work (for both the master server cluster and media servers (i.e. what process)?

    Have you used the nbstlutil command to determine what the state of the SLP managed image is?

    David

    • Alun's avatar
      Alun
      Level 4

      Hi David,

      I'm referring to SLP controlled duplication between media servers in the same physical location and also SLP controlled duplication between media servers in two physical locations (we don't use AIR).

      The networking is in theory no different between the new servers in both datacenters, disabling the Windows firewall makes no difference to the success or failure of the SLP duplications.

      New servers were built on new hardware with newer OSs and new IP addresses, once the old media servers were removed the new media servers were allocated the IP addresses from the old media servers.

      The Master server cluster nodes were replaced by performing in place OS upgrades on the old nodes, adding the two new nodes into the cluster, installing NetBackup on them, failing over between each to confirm that they were all working as anticipated and then finally removing the old nodes form the cluster.

      Alun

      • Marianne's avatar
        Marianne
        Level 6

        Alun 

        Can you please show us all text in Job Details of a failed duplication?

        This will tell us which processes and PIDs on master and media servers to troubleshoot.

        If you do not want to display hostnames, please replace real names with generic names, e.g.
        master, media1, media2.

        Ensure that log folders exist on master: admin (I don't think more legacy logs are needed on the master)
        On media servers : bpbrm, bptm, bpdm.
        Increase logging level to 3 (level 3 is sufficient for this forum; if you log a call with Veritas Support, they will ask for level 5).

        Depending on what we see in Job Details, we will know which logs to check.

  • We were never able to satisfactorily resolve this issue, instead we chose to duplicate within each datacenter and complete all of the outstanding SLPs in that fashion.

    Thanks for your advice and assistance, apologies for not replying sooner.