cancel
Showing results for 
Search instead for 
Did you mean: 

Backup of file server on a cluster

snawaz3
Level 6

I have a windows 2008 r2 cluster. The name of the cluster is NDU_FILESERV. The cluster consists of 2 nodes, NDUFPS01 and NDUFPS02. The shares that need to be backedup are NDU_SHARES. In my policy I have the backups pointing to NDU_SHARES. The active node in the cluster was NDUFPS02. last night a failover occured and the active node switched to NDUFPS01. My backups failed with the error message  

"Error bpbrm(pid=6648) socket read failed. An existing connection was forcibly closed by the remote host. (10054)

In activity monitor all my jobs showed as incomplete. I restated a coup[le of jobs and they are running. I have a couple of questions

1. Why did netbackup not backup the jobs when the cluster node changed. The policy was pointing to the share and should not be dependant on which node is active.

2. Why are the reatarted backup jobs running slow. currently 4260 kb/sec

3. Will the restarted backup jobs pickup from where they left or will they start from the beginning and do a comnplete backup.

Thanks for your help.

1 ACCEPTED SOLUTION

Accepted Solutions

Mark_Solutions
Level 6
Partner Accredited Certified

1, As the node went down it will cause the job to fail in the first instance - it relies on yoru re-try settings to pick the job back up. If it takes too long for the failover then it may result in a failure / incomlete status.

2. This may be down to a network issue? There is no good reason other than the networks being different on the nodes that could cause the issue.

3. You status was incomplete so it should resume from where it got to - however if you also use the True Image Restore option it could cause it to start from scratch as the TIR information would be missing on the second node - hopefully that will not be the case as the TIR information is taken early in the backup - the detail status of the job will show if this is the case.

Hope this helps

View solution in original post

8 REPLIES 8

Mark_Solutions
Level 6
Partner Accredited Certified

1, As the node went down it will cause the job to fail in the first instance - it relies on yoru re-try settings to pick the job back up. If it takes too long for the failover then it may result in a failure / incomlete status.

2. This may be down to a network issue? There is no good reason other than the networks being different on the nodes that could cause the issue.

3. You status was incomplete so it should resume from where it got to - however if you also use the True Image Restore option it could cause it to start from scratch as the TIR information would be missing on the second node - hopefully that will not be the case as the TIR information is taken early in the backup - the detail status of the job will show if this is the case.

Hope this helps

snawaz3
Level 6

where can I configure the retry settings and what should they be set to?

snawaz3
Level 6

also does it matter if I have the policy pointing to NDU_SHARES or to NDU_FILESERV?

Mark_Solutions
Level 6
Partner Accredited Certified

The retry setting are on the Master Servers Host Properties - Global Attributes and is shown as the number of tries in a number of hours

What you set them to is yoru choice but i generally go for 3 tries in 8 hours

As the system resolves the directories via NBU_Shares then that should be OK

snawaz3
Level 6

the retry settings were set to 3 tries in 24 hours. so it should have worked as the backup jobs failed/incompleted at 8.17 PM 2/22 for the job started at 8.00 PM 2/22. I checked the activity monitor at 7.00AM 2/25. I am not understanding this. I have now changed the retry to 3 retries in 8 hours. I have selected resume on 2 jobs since I have 2 drives but they are running very slow 2217 KB/sec.

Mark_Solutions
Level 6
Partner Accredited Certified

It is worth checking the actual network usage on the client - the value shown may just be an average so may increase with time and be reflecting and average rather than a current value.

Perhaps your jobs do not run so well when there are only two jobs running to tape due to buffering issues .. do you use multiplexing for your jobs (as this keeps lots of jobs running to tape which tends to make the tape drives work better without shoe shining)

snawaz3
Level 6

now this is a new term for me. and I would like to implement this if it will make my drives more efficient. How can i set this up?

Mark_Solutions
Level 6
Partner Accredited Certified

If you backup directly to tape then in order to increase the number of jobs that run to each drive you can enable multiplexing (on the Storage Units properties) - this can be up to 32 but i usually reccomend no more that a value of 6 (so 6 jobs run at the same time)

To get this to work you also need to use the same value in you policy schedules.

The buffering is the use of buffer touch files in the netbackup\db\config directory and all media servers where you can adjust the number and the size of the buffers used by NetBackup to maximise performance

These are available for both disk and tape - tape uses SIZE_DATA_BUFFERS and NUMBER_DATA_BUFFERS

If you need more guidance let me know or google those file names