RESOLVED - network name no longer available and er...

Andy_Pyne · ‎06-01-2005

Hi,

I'm just scouring the forums to find all the threads where people have had this issue.

I tried everything to resolve this issue and managed to successfully resolve this a short while ago by altering my disk controller and disks. I was using 250gb and 400gb SATA 7200RPM disks on a 3Ware 8506-12 controller.

I found that I couldn't backup more than one or two BE9.0, BE9.1 or BE10.0 servers to the disk repository simultaneously without failure, yet BE8.6 Servers would be more than happy to run up to 30 backup jobs concurently to the same devices without issue (except the latency associated with such a massive amount of disk I/O).

It appears that BE9.0, BE9.1 and BE10.0 all have a latency issue whereby the job fails when there is a miniscule delay in writing to disk. This is likely to only be a couple of miliseconds delay as it doesn't even appear in the NT/2000/2003 event log.

Veritas have been no help whatsoever in resolving this issue, and it was only by performing various tests at my expense (in both time and money) that I was able to resolve it myself.

My solution is to use a SCSI Controller (Compaq 5200) with 300gb HP 10k Disks. I can now run tens of jobs concurently without failiure.

This solution has been in place now for 5 weeks at time of writing (since Start of May 2005)

I hope that helps.

If not, please feel free to mail me.

Andy Pyne

For more info on the aggro i've experienced, see my previous post http://forums.veritas.com/discussions/thread.jspa?forumID=101&threadID=41420&messageID=4351282攲

Hi everyone.

I get this issue too. I'm running a mixture of BEv8.6 and BE v9.1 at the moment. They all backup to one disk repository with more than enough disk space. My BEv8.6 Servers seem to be ok - but my BE9.1 backups consitantly fail - but not at consistent byte counts or percentages of jobs. One aspect that is consistent is that I always receive error 0xa00084f4 and the dialogue box explains that an unknown error occurred and that the network name is unavailable. Veritas have been so far unable to help.

I've had this problem for ages, and have even tried BE SP2 - and that doesn't seem to make a difference. I've seen other threads to explain that this is still happening in BEv10. I'll list below my findings in the vain hope that it will either help someone else overcome their issues or (more selfishly) provice enough information for someone else to provide me with a solution (or at least a suggestion!).

My situation is as follows. I have a mixture of Windows NT4.0, Windows 2000, and Windows 2003 Servers with local copies of Backup Exec v8.6 through to 9.1 installed on them. I do not use remote agents.

All Servers (of which there are approximately 20 BEv8.6 and 7 BEv9.1 Servers) perform a backup to disk accross a local switched 100mps LAN (with a few 1gb ports available to use).

The volume of data varies, but approximately 2tb of data is moved to disk each week before being staged to tape.

The current Disk Repository is an Evesham NAS3120. The spec is as follows:

2.4ghz CPU
1gb RAM
Promise Array Controller for 2x PATA disks (OS)
3Ware 8506-12 C0ntroller
12x SATA Disks - 9x 250gb, 3x400gb
3x Arrays (5x250gb RAID5), (4x250gb RAID5, 3x400gb RAID5)
1x Gigabit PCI LAN Adapter with 2x 1gb LAN interfaces.
1x Megabit LAN Adapter with 2x 100mb interfaces (on-board)
OS = Windows 2003 Server

Initially we started with local installs of Backup Exec v8.x on each of the Servers and a local tape drive on each Server. Since we had already invested in Backup Exec Licences it seemed sensible to retain our existing Backup Exec software infrastructure in moving slowly to a centralised backup to disk model.

Initially our Backup to Disk repository was nothing more than a beefed-up desktop which i'd stuck in 4x250gb PATA drives and an appropriate controller. The fact that the solution grew such that ~20 Servers were backing up to disk at pretty much the same time caused a huge amount of disk I/O and network utilisation for the Backup to Disk repository to handle, and we soon upgraded to solution listed above.

At the time, we were using only BEv8.6. Generally speaking the backups ran pretty well.

As we use Lotus Domino as our mail solution, we were forced to upgrade several Servers to Backup Exec v9.x. This was necessitated by the Domino team upgrading from Domino v5.x to 6.x. Domino 6.x isn't supported properly in BEv8.6 and my testing showed that in using transaction logging in Domino v6.x with BEv8.6, caused inconstencies in performing Point in Time restores.

After upgrading to BEv9.x, that's where the problems started....

It seems that in running several jobs at the same time - the BEv9.x jobs will unexpectedly fail.

I have 3x Domino Servers with ~1tb data between them. I've split the jobs down so that jobs range from 30gb to ~190gb.

The data is being written to each of the 3x arrays listed above to minimise the disk I/O on any particular volume.

I usually use just one of the gigabit interfaces for all network I/O - and yes I realise this is a single point of failure! ;o)
I've tried to use the 100mb LAN adpters in modes for fault tolerance, load balancing, and bandwidth aggregation.
The gigabit LAN adpter yields the best results. Performance is better and reliability for BE9.x backups is much the same (POOR!).

I've noticed a few CPU and LAN spikes on the disk repository when starting multiple BE v9.xBackups. This doesn't seem to happen with BEv8.x Backups.

I have been able to improve the reliability of the Backup to Disk repository (i'll call it a B2DR from here-on) quite substantially by REMOVING the pagefile entirely. I noticed that when the B2DR was under heavy CPU utilisation that the pageing to disk suffered and caused even more disk I/O. This was the cause of some of my most catastrophic failures!!

I upgraded the B2DR by changing 3x250gb disks for 3x400gb disks. As mentioned, I have 3x volumes. Partly to ease the disk I/O to one particular volume and partly through necessity - the 3Ware controller only supports a maximum of 2tb of RAW disk per Array!!!! - although with 12x ports, you can have up to 24tb supported on the card.......

I've also tried to change the 'Maximum size for Backup-to-Disk' variable from between 1gb and 100gb. The jobs still fail.
In Backup Exec v8.6 there was an option to change the amount of data to cache before writing to disk - this is unavailbe in BEv9.x

On the B2DR i've disabled most services which are not required, and have altered the disk caching options. I find that enabling disk caching seems to make things more reliable than less!

Since the disk volumes are RAID5, I do not perform verifys on the data. The Partition structure is NTFS.

BEv9.x backups use the Domino Agent. I have found that once a job has failed, deleting the backup to disk file, stopping and restarting the services again and even running a chkdsk on the B2DR and rebooting it does not significantly improve the situation. What does improve the situation is rebooting the host server and ensuring that only one BEv9.x job is running at any one time. This, however, is not entirely practical!!!!
I'm not entirely sure whether running the job without the Domino agent would help, but it's not something I can do as i'd lose the translog backup ability (meaning backups would be full each day instead of incremental) and would have to down the Server each backup.

I've also looked in the event logs on the host server and the B2DR, but there's usually nothing of consequence in there. Occasionally I get a few disk messages to say that there was latency in the disk write - but these are infrequent and dont tie in necesarily with backup failures.

I'm currently performing 2x BEv9.1 backups to disk at the moment and network utilisation is between around 9% and 13% at peaks. I have a memory commit Charge of 200mb, and CPU time is around the 50% mark - peaking a further 7-10% occasionally.

Usually the only thing the B2DR is doing is receiving backups (disk writes). Once all the backups are safely to disk, I then run the backup to tape - which is to two locally attached SCSI tape libraries (1x AIT3, 1x AIT4) on a Compaq 532 Array controller

After firmware revisions, software patches, and OS patches are applied I still have no improvement.

So if anyone can suggest something i've missed - please contact me to advise. I'm pretty confident that i've tried a failry exhaustitive list of things, none of which have entirely resolved the situation. If I think of something i've missed out of my above description, i'll append this post.

I'd appreciate any advice anyone can give me - and equally, if there's anything I can help someone with - i'd be more than happy to give it a go (subject to resource and time constraints)

Cheers,

Andy

Amruta_Purandar · ‎06-16-2005

Hello,

Moving your thread to 9.1 Forum.

VOX

RESOLVED - network name no longer available and error 0x40 and 0xa00084f4