cancel
Showing results for 
Search instead for 
Did you mean: 

Disk space (MSDP) discrepencies between master and media server appliance

elanmbx
Level 6

So I have a discrepency on one of my media server appliances (5220) which is causing me some headaches, since it is reporting "disk full" during my backup window and failing some backups and duplications.  Strange thing is that the media server itself shows to only be 89% full (the Java console and OpsCenter show this pool is 96% full):

Screen3.jpg

densyma02p.Storage> Show Partition
- [Info] Performing sanity check on disks and partitions... (5 mins approx)
-----------------------------------------------------------------------
Partition     | Total      | Available  | Used       | %Used | Status
-----------------------------------------------------------------------
AdvancedDisk  |   11.21 TB |    4.93 TB |    6.28 TB |    57 | Optimal
Configuration |      25 GB |   18.31 GB |    6.69 GB |    27 | Optimal
MSDP          |      63 TB |    7.22 TB |   55.78 TB |    89 | Optimal
MSDP Catalog  |       1 TB |  1010.2 GB |   13.78 GB |     2 | Optimal
Unallocated   |  251.98 GB |       -    |       -    |    -  | -      

Screen1.jpg

Screen2.jpg

I'm half-inclined to restart the services on the master... but wanted to know if anyone has run into this issue?

1 ACCEPTED SOLUTION

Accepted Solutions

elanmbx
Level 6

Root-cause:  Bug in 7.6.1 install that caused a VxFS storage checkpoint to be left behind after the upgrade.  This checkpoint gradually grew until the MSDP filesystem was out of space.  Discovered using the command:

fsckptadm list /msdp/data/dp1/pdvol

Removed with:

fsckptadm remove msdp_ckpt /msdp/data/dp1/pdvol

Restarted services and my MSDP pool was back in business.

This affected *2* of my 5220 appliances - both had to be repaired with the above procedure.

Support Case #08311159 just in case anyone needs to reference this issue with Support.

View solution in original post

9 REPLIES 9

elanmbx
Level 6

A bit of environment info:

Solaris master - 7.6.1

5220 Media Servers - 2.6.1

All systems recently upgraded to 7.6.1/2.6.1 recently (within the last week)

Andrew_Madsen
Level 6
Partner

You have a couple of possibilities here. I have run into this in the past when you have a large database to back up and it prechecks on the storage available on the target disk. Remember the system does not know how much the data stream will deduplicate and if it needs 10 TB for a backup and you report only 7 being available you will fail.

Now for the discrepancies run 

/usr/openv/pdde/pdcr/bin/crcontrol --dsstat and see what the system then reports. I have my suspicions that the media server is reporting what is available to the CLISH and GUI and is reporting what is available and needs compression to the Master server. 

 

elanmbx
Level 6

So here is the dsstat:

densyma02p:/msdp/data/dp1/pdvol/history # /usr/openv/pdde/pdcr/bin/crcontrol --dsstat

************ Data Store statistics ************
Data storage      Raw    Size   Used   Avail  Use%
                  63.0T  58.1T  55.9T   2.2T  97%

Number of containers             : 350838
Average container size           : 166807262 bytes (159.08MB)
Space allocated for containers   : 58522326196922 bytes (53.23TB)
Reserved space                   : 5458315213824 bytes (4.96TB)
Reserved space percentage        : 7.9%

This is obviously where it is getting its "shut it down!" information.  Seems like it is reserving an *awful* lot of space in the pool...

For reference - I have two other identical 5220s and each of them only reserver 4%:

densyma03p:/msdp/data/dp1/pdvol/log/convert # /usr/openv/pdde/pdcr/bin/crcontrol --dsstat

************ Data Store statistics ************
Data storage      Raw    Size   Used   Avail  Use%
                  62.9T  60.4T  49.4T  10.9T  82%

Number of containers             : 407267
Average container size           : 131225549 bytes (125.15MB)
Space allocated for containers   : 53443835834010 bytes (48.61TB)
Reserved space                   : 2771037736960 bytes (2.52TB)
Reserved space percentage        : 4.0%

elanmbx
Level 6

This continues to be a significant issue.  It appears that the MSDP is not allowing space to be reclaimed.  I have moved policies off this appliance since the MSDP is currently 100% full.  I continue to expire images yet get NOTHING back.

Support has been working on it for over a week now with no results.

Will update when progress is made.

GHamilton
Level 3
Partner Accredited Certified

Elanmbx - Are you using that MSDP as a target for replication? I had a scenario where a backlog of SLP's continued to copy data to a target disk pool that was full and as we manually expired images (and ran the garbage collection - if you are not doing so...) the disk pool would fill right back up.

If that is not the case please update us when you have a resoltion.

elanmbx
Level 6

Interesting - we *did* have replications running to this MSDP from a remote site.  I turned those off a little while ago - but they would have been running to this appliance during this issue.  I wonder if there is a bunch of replicated data orphaned on the MSDP pool...

GHamilton
Level 3
Partner Accredited Certified

SLP's (if not manually cancelled) will run until they are completed. Have you seen any replication failures or a backlog of replication jobs in your activity monitor?

elanmbx
Level 6

I manually cancelled all of the SLPs from my remote appliance that was replicating to this one.  At this point I have the vast majority of the images expired from this appliance and it will NOT release any space in the MSDP - it is still over 94% full as reported by NetBackup.

I have a case and am working with support to get to the bottom of this...

elanmbx
Level 6

Root-cause:  Bug in 7.6.1 install that caused a VxFS storage checkpoint to be left behind after the upgrade.  This checkpoint gradually grew until the MSDP filesystem was out of space.  Discovered using the command:

fsckptadm list /msdp/data/dp1/pdvol

Removed with:

fsckptadm remove msdp_ckpt /msdp/data/dp1/pdvol

Restarted services and my MSDP pool was back in business.

This affected *2* of my 5220 appliances - both had to be repaired with the above procedure.

Support Case #08311159 just in case anyone needs to reference this issue with Support.