Due to cluster failover - Incremental backup behav...

Surya_2 · ‎08-02-2007

Hi Guys,

This is a strange issue, which is related to cluster. We have some WAFS servers recently migrated to the SAN Media servers.

4 WAFS servers are uschdcls003a ,uschdcls003b ,uschdcls006a and uschdcls006b ( Physical nodes )

These are not Active -Active or Active-Passive clusters , but they behave as a clusters.

Now ..consider uschdcls003a and uschdcls003b.

Each server is holding 4.5 TB of data,

we have 5 service groups ( 3p,3q,3r,3s,3t) .

3p --> C,D,E,F,

3q --> G,H,I,J

3r --> K,L,M,N

At any point of time 2 service groups will be active on one node and 3 service groups will be active on another node.

In case if there is an issue with any of the nodes, the service groups will be failovered to another node .In that point one node will hold all the service groups i.e 9TB of data.

How Backups configured :

Each of the node is a SAN Media server with single tape drive each.

We have created seperate policies for each of the Server with multiplexing on the storage unit to 4 and multistreaming allowed at the policy level.

FileList : ALL_LOCAL_DRIVES

Problems :

1. Whenever a FULL job is initiating , we are getting 4 streams into Active , and they are taking more than 2 days to complete the backup .In the meantime the other streams are failing with error 196 .[ Solution : May be we can increase the backup window ,but this will cause differentials to miss for a couple of days]

If we increase the multistreaming ..the server is creating 8 bpbkar processess and all the processess are utilizing the cpu resources.. and the CPU resources are 100 % at that point of time.

2. When a service group got failovered from one node to another node and If an incremental backup is scheduled for that destination node ,then the incremental is running a FULL for that particular service group as the data is new to that server at that point of time. [ The first incremental backup is always a FULL as it dont have any previous backups to compare the file changes,it will backup everything ]

3. when a backup is active for a node , we will be having some streams in active state and some in queued state and while backup is running ,if the service group got failovered to another node , then the backups which are active are failing with error and also the backups which comes active from queued state are also failing with error 71 ( None of the files exist in the file list ) as those drives are not present on the node at that point of time.

Please share your thoughts on how we can address these issues in a better way !!

Surya , NetBackup, NetAPP, SOLARIS & HP-UX

NetBackup Engineer.

VOX

Due to cluster failover - Incremental backup behaving as a FULL