Forum Discussion

shashi_pratap's avatar
13 years ago

Difference in "%Full" of 5020 disk pools in spite of replication.

Hi ,

I have two Netbackup Appliance 5020 which have 32 TB of storage space each , one at production and other at DR site.

These 32 TB diskpools are presented as PureDisk storage unit through a 5220 appliance media server at each site respectively.

We have Optimized Duplication here and after every job , the SLP triggers a duplication job for Production to DR and vice versa.

On 7th Sep '12 , we had a scheduled reboot of 5220 applance media server as well as 5020 appliance storage pool.

Now all the jobs are going successfully including the SLPs,but there is stark difference between the repsective 5020-DiskPool utilization

i.e the "%Full" for 5020 DiskPool at production is 23.38 % ,

                       whereas , for 5020 DiskPool at DR is 62.54 %.

Is this a matter of real concern ? because prior to shutdown the "%Full" for both the DiskPools was approximately the same at around 50%.

Please suggest.

 

  • I would check that the DR appliances are processing everything correctly as they may not be clearing their data down

    In the CLISH (hate that term!) go to Admin the type su and enter the password (P@ssw0rd?)

    cd to /opt/pdcr/bin/

    And then use the following to check the state of the appliance:
     
    To check the disk pool status     crcontrol --dsstat


    To check when was the last successful CRQP ran (queue processing):   crcontrol --queueinfo

    This should be at midnight last night or midday today (depending on when you run the command - it runs automatically twice a day)

    If it has an old date then run, to check whether CRQP is already in progress :   crcontrol --processqueueinfo

    If it is not running but the date of the last run is not within the last 12 hours then run the following to kick it off and start CRQP manually:       crcontrol --processqueue

    The queue processing clears the queues and then does the data removal so if it has not been running since the reboot the disk will eventually fill up

    There is also a bug in the 1.4.1.1 version of the appliance that doesn't add the cleanup (pddodataremoval)  to the cron queue so run crontab -l to check that it is actually scheduled to run.

    There is a technote somewhere on how to add it in manually if this is the case

    Hope this all helps

  • I would check that the DR appliances are processing everything correctly as they may not be clearing their data down

    In the CLISH (hate that term!) go to Admin the type su and enter the password (P@ssw0rd?)

    cd to /opt/pdcr/bin/

    And then use the following to check the state of the appliance:
     
    To check the disk pool status     crcontrol --dsstat


    To check when was the last successful CRQP ran (queue processing):   crcontrol --queueinfo

    This should be at midnight last night or midday today (depending on when you run the command - it runs automatically twice a day)

    If it has an old date then run, to check whether CRQP is already in progress :   crcontrol --processqueueinfo

    If it is not running but the date of the last run is not within the last 12 hours then run the following to kick it off and start CRQP manually:       crcontrol --processqueue

    The queue processing clears the queues and then does the data removal so if it has not been running since the reboot the disk will eventually fill up

    There is also a bug in the 1.4.1.1 version of the appliance that doesn't add the cleanup (pddodataremoval)  to the cron queue so run crontab -l to check that it is actually scheduled to run.

    There is a technote somewhere on how to add it in manually if this is the case

    Hope this all helps

  • Hi Mark ,

    Welcome back ! .

    Thank you so much for your reply  , was waiting for it this all this while ,

    This issue got fixed as when I ran the dsstat command from the CLISH ( I hate it too ) menu ,

    it seems that all the junk data got cleared and looks pretty ok now.

    However , a new issue has cropped up all of a sudden , where my SLP duplication job is failing consistently for 2 clients in particular with error code 174.

    Below is the link post on Symantec where I have put up this issue for your notice.

    https://www-secure.symantec.com/connect/forums/slp-duplication-failing-error-code-174

    Will be really glad if you can take a look it.