Hi there
I am currently modeling a proposed backup environment for a new project, all is greenfield so there is nothing working to base the solution on. The test environment is also very new and untested so the problems could be with the underlying environment.
Basics
NBU 6.5.4
Platform Windows 2003 R2 SP2 x64
Master server is clustered using VCS
- site 1 local 2 node cluster
- site 2 remote single node cluster
- site 3 remote single node cluster
Each site also has a single media server - each media server has some SAN based disk presented which is configured into Advanced Disk - Disk Pools/Storage Units.
The sites are connected over an emulated WAN with latencies etc which are likely to represent the production environment (MPLS cloud).
Description
The initial work I was doing was to test SLP's with the plan that backups will take place on one of the 3 sites and data will be replicated to the other 2 sites using SLP duplication.
Basic testing was done using non slp based polices proving that the backup jobs would run to each of the media servers from each of the other servers (so all the basic configs to allow visibility and use of the disks and media servers is OK
The test SLP's initially seem to work and then become very random in their completion. On review significant numbers of of error 800's are seen.
either "media server missing" or "disk volume is down". The errors seem intermittent and a job may be rerun and work when it previously failed.
On looking in the disk log report significant numbers of
Volume xxxxxx H:\ marked down
Volume xxxxxx H:\ marked up
Are seen, sometimes a few seconds down, sometimes a few minutes.
I dont seem to have issues with normal disk target backups only when using SLP's, but that could just be luck!!
Question
How can I track down what the problem is?
I am thinking it could be timeouts across the WAN but how can I prove this? and if it is what can I do to prevent it?
I am limited in what logs etc can be posted but if you need any other info let me know and see what I can do.
Thanks in advance
Alex