02-08-2015 09:34 AM
So I have a discrepency on one of my media server appliances (5220) which is causing me some headaches, since it is reporting "disk full" during my backup window and failing some backups and duplications. Strange thing is that the media server itself shows to only be 89% full (the Java console and OpsCenter show this pool is 96% full):
densyma02p.Storage> Show Partition - [Info] Performing sanity check on disks and partitions... (5 mins approx) ----------------------------------------------------------------------- Partition | Total | Available | Used | %Used | Status ----------------------------------------------------------------------- AdvancedDisk | 11.21 TB | 4.93 TB | 6.28 TB | 57 | Optimal Configuration | 25 GB | 18.31 GB | 6.69 GB | 27 | Optimal MSDP | 63 TB | 7.22 TB | 55.78 TB | 89 | Optimal MSDP Catalog | 1 TB | 1010.2 GB | 13.78 GB | 2 | Optimal Unallocated | 251.98 GB | - | - | - | -
I'm half-inclined to restart the services on the master... but wanted to know if anyone has run into this issue?
Solved! Go to Solution.
03-04-2015 03:22 PM
Root-cause: Bug in 7.6.1 install that caused a VxFS storage checkpoint to be left behind after the upgrade. This checkpoint gradually grew until the MSDP filesystem was out of space. Discovered using the command:
fsckptadm list /msdp/data/dp1/pdvol
Removed with:
fsckptadm remove msdp_ckpt /msdp/data/dp1/pdvol
Restarted services and my MSDP pool was back in business.
This affected *2* of my 5220 appliances - both had to be repaired with the above procedure.
Support Case #08311159 just in case anyone needs to reference this issue with Support.
02-08-2015 09:40 AM
A bit of environment info:
Solaris master - 7.6.1
5220 Media Servers - 2.6.1
All systems recently upgraded to 7.6.1/2.6.1 recently (within the last week)
02-09-2015 04:21 AM
You have a couple of possibilities here. I have run into this in the past when you have a large database to back up and it prechecks on the storage available on the target disk. Remember the system does not know how much the data stream will deduplicate and if it needs 10 TB for a backup and you report only 7 being available you will fail.
Now for the discrepancies run
/usr/openv/pdde/pdcr/bin/crcontrol --dsstat and see what the system then reports. I have my suspicions that the media server is reporting what is available to the CLISH and GUI and is reporting what is available and needs compression to the Master server.
02-09-2015 08:11 AM
So here is the dsstat:
densyma02p:/msdp/data/dp1/pdvol/history # /usr/openv/pdde/pdcr/bin/crcontrol --dsstat ************ Data Store statistics ************ Data storage Raw Size Used Avail Use% 63.0T 58.1T 55.9T 2.2T 97% Number of containers : 350838 Average container size : 166807262 bytes (159.08MB) Space allocated for containers : 58522326196922 bytes (53.23TB) Reserved space : 5458315213824 bytes (4.96TB) Reserved space percentage : 7.9%
This is obviously where it is getting its "shut it down!" information. Seems like it is reserving an *awful* lot of space in the pool...
For reference - I have two other identical 5220s and each of them only reserver 4%:
densyma03p:/msdp/data/dp1/pdvol/log/convert # /usr/openv/pdde/pdcr/bin/crcontrol --dsstat ************ Data Store statistics ************ Data storage Raw Size Used Avail Use% 62.9T 60.4T 49.4T 10.9T 82% Number of containers : 407267 Average container size : 131225549 bytes (125.15MB) Space allocated for containers : 53443835834010 bytes (48.61TB) Reserved space : 2771037736960 bytes (2.52TB) Reserved space percentage : 4.0%
02-18-2015 08:49 AM
This continues to be a significant issue. It appears that the MSDP is not allowing space to be reclaimed. I have moved policies off this appliance since the MSDP is currently 100% full. I continue to expire images yet get NOTHING back.
Support has been working on it for over a week now with no results.
Will update when progress is made.
02-20-2015 08:11 AM
Elanmbx - Are you using that MSDP as a target for replication? I had a scenario where a backlog of SLP's continued to copy data to a target disk pool that was full and as we manually expired images (and ran the garbage collection - if you are not doing so...) the disk pool would fill right back up.
If that is not the case please update us when you have a resoltion.
02-20-2015 03:08 PM
Interesting - we *did* have replications running to this MSDP from a remote site. I turned those off a little while ago - but they would have been running to this appliance during this issue. I wonder if there is a bunch of replicated data orphaned on the MSDP pool...
02-23-2015 06:45 AM
SLP's (if not manually cancelled) will run until they are completed. Have you seen any replication failures or a backlog of replication jobs in your activity monitor?
02-25-2015 06:20 AM
I manually cancelled all of the SLPs from my remote appliance that was replicating to this one. At this point I have the vast majority of the images expired from this appliance and it will NOT release any space in the MSDP - it is still over 94% full as reported by NetBackup.
I have a case and am working with support to get to the bottom of this...
03-04-2015 03:22 PM
Root-cause: Bug in 7.6.1 install that caused a VxFS storage checkpoint to be left behind after the upgrade. This checkpoint gradually grew until the MSDP filesystem was out of space. Discovered using the command:
fsckptadm list /msdp/data/dp1/pdvol
Removed with:
fsckptadm remove msdp_ckpt /msdp/data/dp1/pdvol
Restarted services and my MSDP pool was back in business.
This affected *2* of my 5220 appliances - both had to be repaired with the above procedure.
Support Case #08311159 just in case anyone needs to reference this issue with Support.