Master media communication interruption during duplication to tape

I 'm having an issue in our netbackup environment when a duplication job to tape is running on the media server and a network interruption happens between the master and the media server. I will try to explain.

We have a master server (7.7.3) on windows 2008 R2 in our main datacenter and a media server (7.7.3) also on Windows 2008 R2 in our second domain center. There is a tape robot directly connected to the media server. Both centers are on the same company domain so there is 40GB lan connection between both centers.

I'm using an SLP job to duplicate images between both centers and put a final copy on tape. The SLP job has 4 stages. The first one is backup to Puredisk volume on the master server, then we duplicate this image to an other Puredisk volume on the media server in the second data center. Then a rehydratation job occurs to an advanced disk on this media server and finally this image is put on tape.

When a duplication job is running as final stage of a SLP (Copy to tape from advanced disk pool) on the media server and there is a network interruption of a few seconds between the master and the media server. The duplication job errors, the bptm processes keeps running, the tape gets stuck in the robot,it's sitting idle and does nothing anymore and the duplication job gets in a loop aksing for a new tape. When this happens, I have to manually put the tape back in his slot, stop the bptm and bpdm process of that job before any other duplication job can start.

My question, why is the job interrupted when this network hickup happens? The job is running on the media server but when the master looses it's "view" of the processes on the media server for a few seconds every duplication job to tape gets stuck. Is there a way to prevent this?



Hmm, cannot really see what happens here, but...if you have a communication hickup between you master and mediaserver, you never know the outcome. This should be adressed first.

I have several similar configurations running good without the advanced-disk between last MSDP and Tape. Depending on your amount of data, use only one stream going from your MSDP to a single tape to ensure streaming. (limit the STU to 1 write-drive no mpx.) Concider SLP-windows if your MSDPs are heavily loaded.

refer to Network Resiliency topic in Admin Guide. It allows for short-term (up to tenths of seconds) network outages in communication. It is primarily intended for client-server communication, but maybe it will work also for server-server communication - you must test it.



We are working on the network issues but we can't implement a solution as quick as we want.
I'm aware this is the issue but I was wondering if there there was a workaround in netbackup so i don't have to manual intervene when this happens.

The design of our netbackup landscape is 5 years old. When we started using this hardware the MDSP couldn't feed my lto6 drives. So that's why we are still using the advanced-disk between MDSP and tape. Next year the entire landscape will be renewed and redesigned.

Thx for your info.

I will have a look at the network resiliency topic in the manual.