06-14-2012 03:33 AM
Hi and hope you can help.
First of all some details:
Master server:
Now the problem is the DSSU does not clear down when the disk hits the high water mark. I need to go and manually identify the images that have been duplicated to tape and then expire them from disk.
It also doesnt seem to report a "disk full condition" when it should. For example the disk was 100% full when checked it first thing this morning but it only reported a "disk full condition" a minute of so after I manually expired some of the images.
I have just taken over from someone so I have no idea how long this has been going on.
I have googled everything I can think of but can find no reference to anything even similar.
Has anyone experienced anything like this or does anyone have any ideas as to the cause/solution?
Thanks in advance.
06-14-2012 03:44 AM
Can you verify that staging to tape is successful? How often are staging jobs scheduled?
Cleanup can only happen when images have been successfully staged or when image expiration is reached.
See this TN for DSSU cleanup behaviour: http://www.symantec.com/docs/TECH66149
06-14-2012 03:48 AM
Yes, before manually expiring images I check the catalog and make sure the images that i'm expiring are on tape.
Staging is scheduled once everyday.
06-14-2012 04:20 AM
The main problem of having DSSU is usually this cleanup process, some images may not get duplicated properly due to various reason and were left there to fill up your storage. You will then have to manually clean them up and is quite frustrating.
Check out SLP and Vault duplication to see if they suit you better for duplication. Note that SLP does not support BasicDisk and Vault requires additional license.
06-14-2012 04:21 AM
Please show us output of bpstulist -label <storage_unit_label> -L
Please also confirm that DSU location is a dedicated volume used for disk backup only and not shared with anything else (e.g. software repository).
Is once a day sufficient to duplicate all disk backups before next backup window? We normally recommend to schedule multiple times per day (e.g. every 2 hours).
Ensure bptm, bpdm and admin log folders exist on the media server to verify duplication and troubleshoot cleanup.
06-14-2012 04:33 AM
If this turns out to be more than a config issue (unlikely) or just a mis-understanding ...
These would be the logs required to start with (there could be others, depends what is found)
vxlogcfg -a -p 51216 -o 226 -s DebugLevel=4 -s DiagnosticLevel=6 (nbstserv / master)
06-14-2012 04:50 AM
06-14-2012 04:58 AM
06-14-2012 05:10 AM
Sounds like your servers may be struggling and needs some tuning or patching.
For the Windows ones you can try these:
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\
add a DWORD – TcpTimedWaitDelay - Decimal Value of 30
add a DWORD – MaxUserPort – Decimal Value 65534
HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management\
add or edit DWORD - PoolUsageMaximum - Decimal value of 40
add or edit DWORD - PagedPoolSize Hex value of FFFFFFFF (this is 8 x F)
Also check your desktop heap settings - i have seen this cause the Corba errors:
HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems
There is a Windows key with a long string - see what values you have for the part reading similar to: Windows SharedSection=1024,12288,512
If it ends in 512 like the above then change it to 1024
All of the above apply to Master and Media Servers and all need a reboot.
See how it goes after the reboot - though you should go to at least 6.5.6 anyway for now to ensure you have the bug fixes applied
Hope this helps
06-14-2012 05:20 AM
A similar but not quite exact error is listed in internal TN TECH69705
Eg
12:43:23.739 [1928.6180] <16> emmlib_ImageQueryFetchOneSeq: (0) CORBA call threw exception. <system exception, ID 'IDL:omg.org/CORBA/OBJECT_NOT_EXIST:1.0'TAO exception, minor code = 0 (unknown location; unspecified errno), completed = NO>
12:43:23.739 [1928.6180] <16> emmlib_ImageQueryFetchOneSeq: (0) fetchImages failed, emmError = 3000004, nbError = 0
12:43:23.739 [1928.6180] <16> emmlib_ImageQueryFetch: (0) emmlib_ImageQueryFetch failed, emmError = 3000004, nbError = 0
12:43:23.739 [1928.6180] <16> volume_cleanup: (-) Translating EMM_ERROR_CorbaException(3000004) to 25 in the NetBackup context
12:43:23.739 [1928.6180] <2> volume_cleanup: emmlib_ImageQueryFetch failed 25
This was fixed with eTrack 1542212
Also fixed in 6.5.6 and 7.0.1 and 7.1 onwards
Martin
06-14-2012 05:23 AM
This is also along the same lines ...
http://www.symantec.com/docs/TECH74989
Following on from Marks point - lets consider tuning/ performance.
From when this was working, till now, when it fails, has there been a change in the amount of work the server has to do.
Martin