06-12-2013 03:21 AM
Hi,
We have a two-node windows 2008 file-server cluster. It has 8 shared drives and each are 14 TB in size. Almost 100TB size has been utilized.
We are using netbackup appliance 5220 version 2.5.2 (NBU version is 7.5.0.5). I have accelerator enabled.
So I have 9 policies. 8 for each shared drives where the client name is the cluster name and one policy to backup the C: drive and shadow copy component of both the nodes.
I have the below questions/issues.
1. During the backup, if the cluster is failing over, the backup fails with error code 42: network read failed. This is happening becuase for the accelerator to work, we need a golden full backup image and this backup takes atleast 3-4 days /drive.
2. Even if the backup is failing (error code:42) after backing up couple of TB data, they are not accessible/available for restore. The backup is going to the deduplication storage disk, and as far as i know, if the backups are going to disk and the job fails for any reason, the data will be erased from the storage. But if it going to tape, only the catalog entries are deleted. Correct me if i am wrong.
3. How will NTFS journalling work in this environment?
Solved! Go to Solution.
06-12-2013 11:08 AM
I would recommend the following:
First, enable checkpoint restarts for the policies if not already done. Take checkpoints every 30 or 60 minutes. If the backup fails, the backup should be able to restart near where it left off. (You should also have backups auto-retrying at least once, which is set in the master server's Global Attributes properties.)
Second, if your file servers can handle the I/O, check "Allow Multiple Data Streams" on the policies, and create multiple streams in the "Backup Selections List", like this:
NEW_STREAM
F:\DirectoryName\a*
F:\DirectoryName\b*
NEW_STREAM
F:\DirectoryName\c*
F:\DirectoryName\d*
...etc.
For your item #2, if the backup fails whether going to disk or tape, you cannot restore any of the files from the failed backup. If the backup has taken checkpoints and is restarted, though, it will retain the previously backed up data and continue from that point, whether going to tape or disk.
For your #3, Symantec's response to me when I did not notice any change in backup times using NTFS journal, was that you had to have almost exactly 10% change rate on your data. Any more or any less and you would not see any difference in backup times if you turned of NTFS journdal. I find this to be an odd statement and would have expected it to be useful for a larger change rate, but that was their response.
My only observation is that this is a lot of data for only using (effectively) one file server and you may not be happy with the performance, particularly with backups. We have 9 VM file servers, connected over a 10G connection, and they contain a lot less storage than what you are using.
06-14-2013 06:41 AM
the engineer is right.
Do not use change journal with clusters.
And you have to move the accelerator track log to a shared node so both netbackup clients can use the same track log.
Of course you will use accelerator only for the cluster service and not for node backups.
If you do not move the track log, accelerator will not work when the service fail to the other node
06-17-2013 01:15 PM
Thanks to everyone for your suggestions... I think I have sorted it out......
This is what I did....
I have tested the backups with multiple runs and it works great...
For one drive, it took 50 hours to complete the first run whereas the second run was completed in less than 3 hours....
06-12-2013 11:08 AM
I would recommend the following:
First, enable checkpoint restarts for the policies if not already done. Take checkpoints every 30 or 60 minutes. If the backup fails, the backup should be able to restart near where it left off. (You should also have backups auto-retrying at least once, which is set in the master server's Global Attributes properties.)
Second, if your file servers can handle the I/O, check "Allow Multiple Data Streams" on the policies, and create multiple streams in the "Backup Selections List", like this:
NEW_STREAM
F:\DirectoryName\a*
F:\DirectoryName\b*
NEW_STREAM
F:\DirectoryName\c*
F:\DirectoryName\d*
...etc.
For your item #2, if the backup fails whether going to disk or tape, you cannot restore any of the files from the failed backup. If the backup has taken checkpoints and is restarted, though, it will retain the previously backed up data and continue from that point, whether going to tape or disk.
For your #3, Symantec's response to me when I did not notice any change in backup times using NTFS journal, was that you had to have almost exactly 10% change rate on your data. Any more or any less and you would not see any difference in backup times if you turned of NTFS journdal. I find this to be an odd statement and would have expected it to be useful for a larger change rate, but that was their response.
My only observation is that this is a lot of data for only using (effectively) one file server and you may not be happy with the performance, particularly with backups. We have 9 VM file servers, connected over a 10G connection, and they contain a lot less storage than what you are using.
06-13-2013 05:05 AM
Thanks RON,... it was a wonderful explanation.
Thanks for reminding about the check-point. It never crossed my mind...i will try it by enabling it to take check point every 60 mins.
i think enabling auto-retries of failed jobs in global attributes will restart the failed jobs and not resume incomplete jobs.
I already has multistream enabled and each policies are getting more than 10 streams...(working fine)
#2 I assume, if a job is writing to tape and fails, only the catalog entries are deleted but the actual image still stays in the tape. So if you do a re-catalog of that tape, you may get some data back. ( This is just an assumption)
#3 NTFS journal looks are big grey area for me.. So I may disable it..:(
06-13-2013 05:37 AM
checkpoint restarts and fail retry is best for you environment.
Are you thinking use 10G network for you backup environment,it's may be improve your backup performance and reduce backup error.
3. How will NTFS journalling work in this environment?
# NTFS journalling is using for netbackup accelerator function. it's useful for large number files backup.
What is NetBackup Accelerator track log?
Track log is a platform and file system independent change tracking log used by NetBackup Accelerator. Unlike file system specific change journals (e.g. Windows NTFS change journal), there are no kernel level drivers that runs all the time on production clients. The track log comes to action during the backup and is populated with entries that are used by NetBackup Accelerator to intelligently identify changed files and segments within the changed files.
The size of the track log is a function of number of files and size of the files in the file system. The size of the track log does not increase with increase in data change rate in the file system.
Frequently Asked Questions on NetBackup Accelerator
https://www-secure.symantec.com/connect/blogs/frequently-asked-questions-netbackup-accelerator
06-14-2013 06:32 AM
Symantec engineer told me that we may need to disable change journal as it is independent for node and may not work for cluster.....
I am doing some test run and hopefully it will do the trick.....
06-14-2013 06:41 AM
the engineer is right.
Do not use change journal with clusters.
And you have to move the accelerator track log to a shared node so both netbackup clients can use the same track log.
Of course you will use accelerator only for the cluster service and not for node backups.
If you do not move the track log, accelerator will not work when the service fail to the other node
06-14-2013 06:55 AM
I have reconfigured and moved the track logs to one of the shared drives..
testing is still going on.. let c how it goes...
06-17-2013 01:15 PM
Thanks to everyone for your suggestions... I think I have sorted it out......
This is what I did....
I have tested the backups with multiple runs and it works great...
For one drive, it took 50 hours to complete the first run whereas the second run was completed in less than 3 hours....
06-17-2013 03:04 PM
had you try to move the service to the other node and re run the backup?
06-18-2013 01:41 PM
Yes I did... and worked as expected....
one more thing I noted is that
during the first run, I had the ntfs change journal enabled(i noticed it only hours after starting the job) and while the backup still running, I disabled the change journal.
Once the first run was completed I failed over the cluster and ran the second time.
The accelerator worked fine as expected and the change journal didnt cause any issues.