09-24-2010 01:36 PM
I am testing off-host backups against volumes on Equallogic arrays. I am experiencing a problem where the jobs can sometimes run up to a hour after the actual amount of data has already backed up. Example, backing up a 20gb volume, the backup shows 20gb backed up yet the job continues running, the job rate continues to increase but no additional data is written and no more activity against the volume/snap-shot takes place.
This doesnt seem to happen every time but I would say 8 times out 10.
I am running R2 with all the latest patches. The test backup server is configured with 3 nics using Equallogic MPIO so I am pulling data cross 3 cards - not that it should matter I am writing only to disk right now.
09-24-2010 05:58 PM
FYI, MPIO means nothing if you only have one job configured. The trick to get MPIO to work and work well is to create multiple smaller jobs that run concurrently to disk. This way the round robin effect happens, and the other links are leveraged. The way most people fail is they have 2+ link,s but only one job, and BE runs jobs serially, thus never opening up multiple connections.
That said, I dont know the solution to your issue. I'd open a case with Symantec.
09-24-2010 07:34 PM
May be due to Verify After Backup Completes option...
Check it...
09-26-2010 12:49 PM
I don't think so. On equallogic arrays the multi-pathing is working correctly. If I run the job with a single NIC, 127gb worth of test data the backup takes approximately 1hour and 20 minutes at around 3300MB/min. Using two nics' with mpio the job takes about 35-40 minutes averaging 5800 - 6000mb/min, with three nic's and mpio it ran at about 7100mb/min and took 26 minutes. That tells me the MPIO is working as expected, and the SAN network graphs in SANhq and port utilization reports represent the increase in traffic as multiple san ports were utilized for the different runs.
I understand what you are saying, and for traditional jobs this is the case - it is that way with our md3000i's that we use for our regular backup environment.
09-26-2010 12:52 PM
The verify job is unselected. It would almost seem like the verify job is trying to run, thinks its running but is in fact not running. You would think that would about account for the time lapse with which the job continues to run but no data is transferred. Likewise no network traffic is generated after the job. I dont recall off the top of my head, but Im pretty there is *some* network traffic during the verify job as the BE verifies content's against the san based snapshot. In our case however there is no traffic and no real IO to account for the long over a hour worth of idle time as the transfer rate continues to increase but nothing is happening.
09-28-2010 12:26 AM