5220 PureDisk Volume Down

DecKy · ‎03-25-2013

Hi,

I have 2x5220 Master / Media appliances running 2.5.1 with external storage units, the PureDisk Volume is going down on one or other of the appliances every few weeks, it usually takes multiple reboots / full shudown to get it back up but it's hit and miss, I have no definitive process for bringing it back online. How do I troubleshoot ? what logs should I be looking at ? is there a correct way of bringing a PureDisk Volume back up ?

Thanks,

Dec

Mark_Solutions · ‎03-25-2013

Lots of questions there and a little like how long is a piece of string!! It all depends why it goes down as to how to stop it and how to bring it back up. First places to look at the spoold.log and storaged.log for clues There are a few issues on 2.5.1 ... For example, if you do VMWare backups and your LUNS have multiple paths it can cause a memory leak causing the appliance to run out of memory - fixed in 2.5.2 Also there is a possible bug in spoold - EEB available but need to see what is happening in the spoold.log first If you can post those 2 logs on here for a date when they have gone down we may be able to assist but i would reccomend raising a case with support as there are a lot of variable here Hope this helps

DecKy · ‎03-25-2013

I've attached the spoold.log and storaged.log, this is the initial alert - Alert Raised on: 23 March 2013 06:50 - Alert Policy: Disk Storage Unit - DOWN

Is there a fix in 2.5.1 for the memory leak with multiple paths ?

Many Thanks,

Dec

Mark_Solutions · ‎03-25-2013

For the memory leak in 2.5.1, as long as it is just a Media Server and not a Master Server you can do the following: To see if you have leaked semaphores run this single line command (copy and paste into putty!!) for semid in `ipcs -s | awk '/^0x/ {if ($1=="0x00000000") print $2}'`; do ipcs -s -i $semid; done | egrep -v "se maphore|uid|mode|nsems|otime|ctime|semnum" | awk '{if ($5 != "0" && $5 != "") print $5}' | uniq | xargs ps -p If this shows orphaned processes you can add this to the crontab (crontab -e) - again just one line so copy and paste from notepad: */15 * * * * /usr/bin/ipcs -s | grep 0x00000000 | awk '{print $2}' | while read sem; do /usr/bin/ipcrm -s $sem; done Dont do this on a Master appliance as it takes EMM down! I will have a look at the logs and get back to you....

Mark_Solutions · ‎03-25-2013

Just one other thing - multiple paths for VMWare LUNS is not actually supported at the moment - not sure if that has changed in 2.5.2. From the spoold log - lots of these: March 23 06:51:18 INFO [47685560046816]: Storage Manager: initializing March 23 06:51:18 INFO [47685560046816]: Database Manager: initializing March 23 06:51:18 INFO [47685560046816]: Database Manager: initialization complete March 23 06:51:18 INFO [47685560046816]: Database Manager: closing storage database connection March 23 06:51:18 INFO [47685560046816]: Database Manager: shutdown Worth raising a case and ask them if ET2962020 applies to your appliance (new spoold binary) Storaged log also has lots like this: March 25 15:28:17 INFO [1082194240]: fakeFPCheck: DO fd00355bc39b497510379a4eb9ab2f49 is corrupt so worth asking them to check this out as you may need some maintenance doing as well - this could be causing spoold to crash during queue processing / rebasing Logs a call and see if they can get you sorted out

wan2see · ‎03-31-2013

Just want to point out that this is not just an appliance issue, we have 3 PD pools and all are attached and using 7.5x and we encounter the same challenges. NBU support has no answer and we have to reboot the masters multiple times also. As stated above spoold and spad are the culprits

Mark_Solutions · ‎04-02-2013

The original EEB was for MSDP and was converted to a rpm for me when i identified this issue on an appliance Support should be able to provide you with both for your system to put it right

Tahir_Maqbool · ‎04-10-2013

I hope NBU support can help you.... make sure that the mentioned above logs are checked by the support and ask the Assigned TSE to forward these logs to be verified by Back-line support before applying any EEB.

I have the same sort of issue and i am still in touch with NBU Support to resolve the issue as one of my appliance is useless these days.

-matt- · ‎04-23-2013

Puredisk Volume has been marked down

Known issue with the following workaround (resolved in 7.5.0.5)
Stop Netbackup
Start Netbackup
When spoold is accepting connections run the following to check, then turn off image re-basing;

cat /disk/log/spoold/spoold.log |grep rebase

/usr/openv/pdde/pdcr/bin/crcontrol --rebasestate
      Image rebasing: ON
      Rebasing busy: Yes

/usr/openv/pdde/pdcr/bin/crcontrol --rebaseoff (takes a few minutes to complete)
      Data store conversion turned off

http://www.symantec.com/business/support/index?page=content&pmv=print&impressions=&viewlocale=&id=TE...

Mark_Solutions · ‎04-23-2013

-matt-

Same as the issue I mentioned and the EEB for 7.5.0.4 that is available - though that tech note gives a different ET number to the original - maybe renamed for the appliance version now

Good spot though

DecKy · ‎04-23-2013

The EEB didn't work for me, I still had the problem after it was installed, I'll use the above workaround if it happens again, hopefully it won't

Mark_Solutions · ‎04-24-2013

Ok - surprise at that as i assume it is the 2.5.2 spoold file - but yes, turn off rebasing for now but it will affect performance eventually as it is basically de-dupe de-frag

VOX

5220 PureDisk Volume Down