cancel
Showing results for 
Search instead for 
Did you mean: 

Concern about backing up windows 2008 file-server cluster with NBU 7.5

Mux
Level 4

Hi,

 

We have a two-node windows 2008 file-server cluster. It has 8 shared drives and each are 14 TB in size. Almost 100TB size has been utilized.

 

We are using netbackup appliance 5220 version 2.5.2 (NBU version is 7.5.0.5). I have accelerator enabled.

So I have 9 policies. 8 for each shared drives where the client name is the cluster name and one policy to backup the C: drive and shadow copy component of both the nodes.

 

I have the below questions/issues.

 

1. During the backup, if the cluster is failing over, the backup fails with error code 42: network read failed. This is happening becuase for the accelerator to work, we need a golden full backup image and this backup takes atleast 3-4 days /drive.

2. Even if the backup is failing (error code:42) after backing up couple of TB data, they are not accessible/available for restore. The backup is going to the deduplication storage disk, and as far as i know, if the backups are going to disk and the job fails for any reason, the data will be erased from the storage. But if it going to tape, only the catalog entries are deleted. Correct me if i am wrong.

3. How will NTFS journalling work in this environment?

3 ACCEPTED SOLUTIONS

Accepted Solutions

RonCaplinger
Level 6

I would recommend the following:

First, enable checkpoint restarts for the policies if not already done.  Take checkpoints every 30 or 60 minutes.  If the backup fails, the backup should be able to restart near where it left off. (You should also have backups auto-retrying at least once, which is set in the master server's Global Attributes properties.)

Second, if your file servers can handle the I/O, check "Allow Multiple Data Streams" on the policies, and create multiple streams in the "Backup Selections List", like this:

NEW_STREAM

F:\DirectoryName\a*

F:\DirectoryName\b*

NEW_STREAM

F:\DirectoryName\c*

F:\DirectoryName\d*

...etc.

For your item #2, if the backup fails whether going to disk or tape, you cannot restore any of the files from the failed backup. If the backup has taken checkpoints and is restarted, though, it will retain the previously backed up data and continue from that point, whether going to tape or disk.

For your #3, Symantec's response to me when I did not notice any change in backup times using NTFS journal, was that you had to have almost exactly 10% change rate on your data.  Any more or any less and you would not see any difference in backup times if you turned of NTFS journdal.  I find this to be an odd statement and would have expected it to be useful for a larger change rate, but that was their response.

My only observation is that this is a lot of data for only using (effectively) one file server and you may not be happy with the performance, particularly with backups.  We have 9 VM file servers, connected over a 10G connection, and they contain a lot less storage than what you are using.

View solution in original post

StefanosM
Level 6
Partner    VIP    Accredited Certified

the engineer is right.

Do not use change journal with clusters.

And you have to move the accelerator track log to a shared node so both netbackup clients can use the same track log.
Of course you will use accelerator only for the cluster service and not for node backups.

If you do not move the track log, accelerator will not work when the service fail to the other node

 

View solution in original post

Mux
Level 4

Thanks to everyone for your suggestions... I think I have sorted it out......

This is what I did....

  • 1 policy each per shared drive
  • 1 policy to backup c drive and system state
  • ntfs change journal not used
  • multistream enabled with more than 8 streams per job
  • netbackup accelerator enabled on all policies
  • before the first backup, the default track location: c:\progream files\veritas\netbackup\track was moved to one of the shared drive and a soft link was created for the path  c:\progream files\veritas\netbackup\track. So that when NBU checks for the track, it points to the shared drive.
  • set the policy to create a check point on an hourly basis.

 

I have tested the backups with multiple runs and it works great...

For one drive, it took 50 hours to complete the first run whereas the second run was completed in less than 3 hours....

 

View solution in original post

9 REPLIES 9

RonCaplinger
Level 6

I would recommend the following:

First, enable checkpoint restarts for the policies if not already done.  Take checkpoints every 30 or 60 minutes.  If the backup fails, the backup should be able to restart near where it left off. (You should also have backups auto-retrying at least once, which is set in the master server's Global Attributes properties.)

Second, if your file servers can handle the I/O, check "Allow Multiple Data Streams" on the policies, and create multiple streams in the "Backup Selections List", like this:

NEW_STREAM

F:\DirectoryName\a*

F:\DirectoryName\b*

NEW_STREAM

F:\DirectoryName\c*

F:\DirectoryName\d*

...etc.

For your item #2, if the backup fails whether going to disk or tape, you cannot restore any of the files from the failed backup. If the backup has taken checkpoints and is restarted, though, it will retain the previously backed up data and continue from that point, whether going to tape or disk.

For your #3, Symantec's response to me when I did not notice any change in backup times using NTFS journal, was that you had to have almost exactly 10% change rate on your data.  Any more or any less and you would not see any difference in backup times if you turned of NTFS journdal.  I find this to be an odd statement and would have expected it to be useful for a larger change rate, but that was their response.

My only observation is that this is a lot of data for only using (effectively) one file server and you may not be happy with the performance, particularly with backups.  We have 9 VM file servers, connected over a 10G connection, and they contain a lot less storage than what you are using.

Mux
Level 4

Thanks RON,... it was a wonderful explanation.

 

Thanks for reminding about the check-point. It never crossed my mind...i will try it by enabling it to take check point every 60 mins.

i think enabling auto-retries of  failed jobs in global attributes will restart the failed jobs and not resume incomplete jobs. 

 

I already has multistream enabled and each policies are getting more than 10 streams...(working fine)

 

#2 I assume, if a job is writing to tape and fails, only the catalog entries are deleted but the actual image still stays in the tape. So if you do a re-catalog of that tape, you may get some data back. ( This is just an assumption) 

 

#3 NTFS journal looks are big grey area for me.. So I may disable it..:(

huanglao2002
Level 6

 checkpoint restarts and fail retry is best for you environment.

Are you thinking use 10G network for you backup environment,it's may be improve your backup performance and reduce backup error.

3. How will NTFS journalling work in this environment?

# NTFS journalling is using for netbackup  accelerator function. it's useful for large number files backup.

 

What is NetBackup Accelerator track log?

Track log is a platform and file system independent change tracking log used by NetBackup Accelerator. Unlike file system specific change journals (e.g. Windows NTFS change journal), there are no kernel level drivers that runs all the time on production clients. The track log comes to action during the backup and is populated with entries that are used by NetBackup Accelerator to intelligently identify changed files and segments within the changed files.

The size of the track log is a function of number of files and size of the files in the file system. The size of the track log does not increase with increase in data change rate in the file system.

 

Frequently Asked Questions on NetBackup Accelerator

https://www-secure.symantec.com/connect/blogs/frequently-asked-questions-netbackup-accelerator

Mux
Level 4

Symantec engineer told me that we may need to disable change journal as it is independent for node and may not work for cluster.....

 

I am doing some test run and hopefully it will do the trick.....

 

StefanosM
Level 6
Partner    VIP    Accredited Certified

the engineer is right.

Do not use change journal with clusters.

And you have to move the accelerator track log to a shared node so both netbackup clients can use the same track log.
Of course you will use accelerator only for the cluster service and not for node backups.

If you do not move the track log, accelerator will not work when the service fail to the other node

 

Mux
Level 4

I have reconfigured and moved the track logs to one of the shared drives..

testing is still going on.. let c how it goes...

Mux
Level 4

Thanks to everyone for your suggestions... I think I have sorted it out......

This is what I did....

  • 1 policy each per shared drive
  • 1 policy to backup c drive and system state
  • ntfs change journal not used
  • multistream enabled with more than 8 streams per job
  • netbackup accelerator enabled on all policies
  • before the first backup, the default track location: c:\progream files\veritas\netbackup\track was moved to one of the shared drive and a soft link was created for the path  c:\progream files\veritas\netbackup\track. So that when NBU checks for the track, it points to the shared drive.
  • set the policy to create a check point on an hourly basis.

 

I have tested the backups with multiple runs and it works great...

For one drive, it took 50 hours to complete the first run whereas the second run was completed in less than 3 hours....

 

StefanosM
Level 6
Partner    VIP    Accredited Certified

had you try to move the service to the other node and re run the backup?

Mux
Level 4

Yes I did... and worked as expected....

one more thing I noted is that

during the first run, I had the ntfs change journal enabled(i noticed it only hours after starting the job) and while the backup still running, I disabled the change journal.

Once the first run was completed I failed over the cluster and ran the second time.

 

The accelerator worked fine as expected and the change journal didnt cause any issues.