cancel
Showing results for 
Search instead for 
Did you mean: 

PureDisk 6.5.0.1 - dsid 81 was denied - No slots available anymore

bakitup
Level 4
Once we went over a certain threshold on clients and PDDO media servers when our NetBackup fulls run, we start getting messages in Events like, Current Process count exceeds the Notice threshold value (current value: 845, allowed value: 700).  Once this numer gets above 850 we start getting PureDisk backup and PDDO failures with the message in PD Events "No more Slots available".  Anyone seen this error before.  We have 18 PDDO media servers and about 50 PureDisk clients.  Spreading out when they all start has helped some, but still have problems.

It is a 3 node PureDisk setup, SPA/CR and 2 more CR.

Thanks

Jeff
1 ACCEPTED SOLUTION

Accepted Solutions

bakitup
Level 4
There is a parameter on each Content Router that can be adjusted.  The default is 128.  /opt/pdconfigure/etc/default/pdcr.  The line
OPTIONS="-q" was changed to OPTIONS="-q -m 512"

Stop and start PureDisk Services

This corrected my issues but what I am still in the process of getting from support is how to tune for future growth.

View solution in original post

4 REPLIES 4

PureD
Level 4
Well, you could try increasing the default "Info" threshold value for MaxProcess within the web UI (Click on Configuration tab, then, expand PureDisk ContentRouter, Default ValueSet for PureDisk ContentRouter, Sections, Monitor and then edit MaxProcess, default value is ALL OS:500 600 700 0 1 ...change to: 600 700 800 0 1)

....Good move on your part on spreading out the backup schedules.....but you also might try limiting the # of streams your PureDisk client agents are sending -and- check what your 'Max concurrent jobs' value is on the NBU Storage Units related to PDDO. If the Max Concurrent Jobs is some obscenely high value (I ran into problems with this set to 50! in my busy environment with max jobs per client at 99!), try reducing that max concurrent jobs value to say 12 for a night or weekend to see if the problem becomes less severe or even disappears. You should also set pd.conf on your NBU mediasvrs to use compression.

You might look at other things for performance...like what is the throughput on your PD client bkups and the NBU bkups? If it's bad, then more and more streams are going to pile up and run concurrently....try tweaking things. PD6.5.1 upgrade includes a tcp_tune.sh script that helps on the PDOS side. There is also some network performance tuning stuff in the PureDisk best practices guide (TN319449).

bakitup
Level 4
I have changed the MaxProcess within web UI and all that does is change the info/warning thresholds.  It does not really increase the number of processes from nodes will support.  I also reduced the number of concurrent jobs for the PDDO storage unit from 10 to 6 when I spread out the backup jobs which again has helped the immediate problem but seems like too me we should not have to do.  The strange thing is that support seems to have never seen this error before.  We do not have that large of a PD environment and it was implemented in February  As mentioned, we have a 3 node PD and about 50 PD clients with 15 PDDO storage units.  We got through again this weekend with no more errors of out of slots but our maxprocesses gets up to 800 and usually when I see it at 850 or higher is when "no more slot" messages begin to appear and backups start failing.

BTW, our lab is 6.5.1 and we still get the high process numbers there.  I can drive it up to 600 with only 2 PD clients and one PDDO storage unit with 10 jobs running.

Thanks

Jeff

bakitup
Level 4
There is a parameter on each Content Router that can be adjusted.  The default is 128.  /opt/pdconfigure/etc/default/pdcr.  The line
OPTIONS="-q" was changed to OPTIONS="-q -m 512"

Stop and start PureDisk Services

This corrected my issues but what I am still in the process of getting from support is how to tune for future growth.

Kerry_LeRoy
Level 3
We have a 6 node Pd environment - 1 SPA/MBE/MBS, and 4 CR/MBE's, with a Hot Spare.  We have 8 PDDO media servers pushing data from 2 NetBackup environments. We have been seeing the Process messages for quite a while but over the last 3 weeks we have had 3 nights where the PD Pool went unreachable to both NetBackup Master servers at the same time.. Durring the most recent event, I did see the slot error that was mentioned here.

Initially we believed this to be network related because the PD nodes were inadvertently all active on the same switch, and we saw discards on the trunk.  The environment was properly split between two switches.  Durring the last event, we did not see network issues.

Here are some log messages
sts_read_image failed: error 2060017 system call failed
image read failed: error 2060017: system call failed
cannot read image from disk, Invalid argument
sts_close_handle failed: 2060017 system call failed
cannot write image to disk, media close failed with status 2060017
sts_get_image_prop failed: error 2060001: one or more invalid arguments
sts_close_handle failed: 2060001 one or more invalid arguments

Guessing that you have not seen the above errors yet?
I am definitely going to look at the parameters mentioned here.
Thanks