08-08-2012 01:58 PM
We have been using BackupExec 2010 R3 for almost a year and as soon as we brought the product it has caused nothing but trouble for us. The issue is that the deduplication storage does not perform. We still have about 10 support calls open, and we are still working with Dev and the backup line support to resolve the issue.
What the issue is that our backups to either the CASO or MMS Server which is using its own local deduplciation storage is just in a queue state which can be hours. We were one of the first to apply Hotfix 191248 in July 2012 but this basically did nothing that we can see.
We have around 17 concurrent thread for each deduplication folder but currently we only have 3 active jobs on one folder and 1 active on the other folder. The performance of the server is 11%CPU, memory is 30% used. Our disk queue length is 0.1. So that our backups are sort of working we are having to use disk based backups which experencience no issues but our server was spec with assistance from Symantec but as the dedup is not working correctly, we are having to rely on disk backup and are running out of disk space.
Symantec have ran a number of disk checks which has identied bad media in the dedup folder, but this takes 24 hours to complete. There backup line team have been helpful but theres no solution in sight.
Also we have rebult the server twice and still the same issues.
We have been a customer of Backup Exec for 12 years but the response from the Dev team has been terrible. I even emailed the CEO.
Our case details are below:
OPEN CASES [12] (changes or new in red)
Priority 1: as the backups takes so long and is unreliable.
416-022-872 Backup job on Dedup folder fails with error "A hardware error occurred"
Status: Initial investigation suggested this issue is occurring due to corrupt OST media in Dedup folder
STSINV utility was run and moved the corrupt OST media's in retired media set
The first set of weekly backups and since then all daily backup run successful (except some few jobs which failed because of other errors)
Second set of weekly backups run successfully
Issue no longer seen
Issue most likely related to the 2TB backup job / duplicate job which runs for days.
Symantec enabled detailed debugging for further analysis
Backup and duplicate job run successful with debugging enabled
Issue hasn’t been seen in the last few days
POA: Need to monitor situation and see if the issue reappears and then decide if re-enabling debugging is possible
416-375-655 Duplicate job is prompting to import an OST media which was last used in the backup job and is now offline
Status Backup job was successful and used lots of OST media as it is over 1.6 TB of data.
No verify run
Duplicate job starts to run and then prompts to import OST media xxx.
This OST media used in the backup job is now OFFLINE and because of that there is no chance to put that media back into online or as asked to import it.
Symantec enabled detailed debugging for further analysis
Backup and duplicate job run successful with debugging enabled
Issue hasn’t been seen in the last few days
POA: Need to monitor situation and see if the issue reappears and then decide if re-enabling debugging is possible
416-400-068 Backup to dedup folder fails with: 0xe00084ec - A backup storage read/write error has occurred.
Status: Issue most likely related to the 2TB backup job / duplicate job which runs for days.
Backup and duplicate job run successful with debugging enabled
A different job failed with this error
Logs collected
Issue hasn’t been seen in the last few days
POA: Symantec to analyse collected logs
Priority 2:
416-022-858 Exchange backup on Dedup folder fails with error "-1022 There is a disk IO error"
Status: Exchange Incremental backups are failing with all sort of errors and are slow
Full backup is successful and takes 4 hours
Incremental backup takes till 18 hours or is failing
Customer made the decision starting from today on, to use Differential instead of Incremental
Weekly Full backup run successful
Differential backup is reporting again the same error
3 of 9 SG affected
Not always the same SG affected but in every job which fails there is 1 -3 SG which is failing with that error
Since we moved the time when the backup job is running, no further errors
Full backup started now to report this error
When the same job runs at a different (not so busy) time it is successful
Issue seen again this week on all daily backups
The last few days the Exchange backup run successfully
POA: Monitoring the backups and see if the issue reappears or not
Priority 4:
416-340-993 Exchange duplicate job is failing with error:V-79-57344-33329 - Library - cleaning media was mounted
Status: Exchange backup was never successful so far
Now the backup is successful the duplicate is failing with the above error
Other jobs reporting the same error
Applied Technote: http://www.symantec.com/docs/TECH51468
Issue still seen
Duplicate Jobs from Server 2 to Server 3 work, issue affects duplicate jobs from Server 3 to Server 2
Applied the technote again on both servers
Issue hasn’t been seen anymore in the last few days
POA: Monitoring the situation
Priority 5:
416-341-021 Backup / duplicate jobs to / from dedup folder goes into "device queued" status
Status: Backup / duplicate jobs to / from dedup folder goes into "device queued" status
Other jobs already running are keeping running to the same device.
All new jobs will just sit in queued until the "Device queued" job has been cancelled and then all other jobs are starting to run.
Issue not seen in the last two days
Issue happened again
Issue hasn’t been seen all week
POA: Symantec to keep monitoring and if issue reappears we need to gather new set of logs
Priority 7:
416-345-707 Application event alert on BKUPEXEC: SQL Server has encountered 1 occurrence(s) of cachestore flush for the 'Bound Trees' cachestore
Status: Alert was reported a few times in the last few days
SQL Studio Express installed
Issue as described in: http://www.symantec.com/docs/TECH129073
Technote applied
No alerts seen the last few days
POA: Symantec and customer to monitor situation and confirm issue is solved
Priority 8:
416-345-165 Beserver crash
Status: This crash happened the first time at the weekend of Feb 3rd
Crash has not been seen since
Another beserver crash happened but not identical to the previous one
No crash seen since SP2 got applied
POA: Symantec to keep monitoring the situation
Priority 9:
416-345-115 Media alert: "Do you want to overwrite imported media: OSTxxx" holds up all other backups until it is acknowledged
Status: This alert stopped all backups from continue running until it was acknowledged
Reason why it wasn't automatic cancelled identified - response was set to 3 days. This has now changed to 1 minute
Workaround applied of unchecking the "prompt for imported media alert"
POA: Symantec to clarify:
Why do we ask to overwrite an imported media when it is an OST media
Priority 10:
416-263-851 Media alert on DEDUP folder: "Media is unrecognized. The media in drive '%drive%' is written in an unrecognized format. Erase the media to make it usable"
Status: Alert happened 35 times
These 35 alerts are reported against 3 OST media
Alerts happened again and again about the same 3 OST media which have already been retired
Customer has recreated all jobs (today Friday Feb 3rd)
Alerts happened again over the weekend – same 3 OST media
POA: Workaround applied by removing this alert to being prompted
Symantec to research this issue
Priority 11:
415-876-929 When adding Remote Agent for Direct Access (RADA) the Backup Exec console freeze
Status: The moment customer adding around 10 or more RADA the above issue is seen
Currently customer is not using RADA because of the issue reported in this case.
Symantec explained the underlying issue and workaround:
Workaround is to recycle all beremote services on the remote servers which are using client side deduplication
Beta patch available
Customer not to keen to test a BETA patch.
POA: This issue is currently not the highest priority
Customer would prefer an official hotfix then just a BETA patch
Customer would like to have test results from other customers first before applying a BETA patch to his setup
Priority 12:
415-873-137 CAS Server status shows "Discovering Devices" when restarting services after adding remote servers to RADA
Status: The moment the customer is adding around 10 or more RADA the above issue is seen
Currently customer is not using RADA because of the issue reported in this case.
Symantec explained the underlying issue and workaround:
Workaround is to recycle all beremote services on the remote servers which are using client side deduplication
Beta patch available
Customer not to keen to test a BETA patch.
POA: This issue is currently not the highest priority
Customer would prefer an official hotfix then just a BETA patch
Customer would like to have test results from other customers first before applying a BETA patch to his setup
CLOSED CASES [3]:
416-346-317 Backup job fails and causes the media to go "end marker unreadable".
Status: Only one duplicate job affected
The OST media at the opt-dedup target device will end in that state.
Issue identified and probably caused by the BE 11d agent still running on this one system
Workaround / Solution applied by using a separate policy to both stop the 11D services and to do share level backups followed by duplicates
Case closed March 2nd
416-263-556 Monthly backup are failing with: "0xe0009444 - The requested source duplicate backup sets catalog record could not be found"
Status: Monthly backup did fail with the above error
Issue caused by the fact that we still try to duplicate data older than one months
Symantec adjusted the selection lists for these jobs
Customer has recreated all jobs (today Friday Feb 3rd)
Case closed on Feb 15th
416-264-317 Duplicate jobs to tape library is failing with: 0xe00084ed - A hardware error occurred
Status: Some duplicate jobs did fail because of the above error
Issue is clearly with the Tape Library
Backup are working fine again
Case closed on Feb 15th