02-07-2014 10:48 AM
I got a page about this:
hrSWRunStatus bpstsinfodsproxy:Unix process bpstsinfo -DPSPROXY is either NOT running or has breached the thresholds of minimum count 1 and maximum count 1. Active Instance Count : 0
Does anybody know what it means? Or, how to resolve?
Thanks, Ken
02-07-2014 11:11 AM
When was the last taime you restarted NetBackup services? This sounds like bpstsinfo has hit a condition that its breaking on. It is not a documented issue that I could find internally. If, after a service restart, your still getting that, you might want to open a support case.
02-07-2014 11:35 AM
I did halt and restart the NBU daemons. It has been since 1/31 that they were restarted (about 1 week). I have opened a support case, but haven't received a response from support yet. I really don't want to have to halt and restart NBU weekly. Thanks for your response!
02-07-2014 12:26 PM
Greets ket4kl,
A little bit of background info pls:
- where are you seeing error? SNMP page in (sic) OpsCenter
- what storage are you using?
- are your backups successful and has storage been going up and down at all?
- can you look at admin logs on master server pls and report any findings on dpsproxy and errors?
- run a full errors report from GUI or CLI and note times of failures/media and or storage related: bperror -all -U
- lastly (I know the answer is 'nothing changed'): what has changed in your environment recently? ;)
* Background: I had a situation with DPS (polling service) which if memory serves had/has a timeout of about a minute (I think was under 7.1.x ... it basically tells nbu (via bpstsinfo) the status of the disk. if no reply is received, the disks go down. this was 'remediated' by increasing the timeout, however the RCA wasnt established as I had too much else to do unfortunately, and usually relates to dodgy networking to/from storage and hiccups in other things like FC cards or interconnections. Its a tough one to diagnose, but its worth the effort in the long run.
Good luck,
- Billy
02-07-2014 01:40 PM
Just got off the phone with support. We did run an error report from the Java console, no errors reported.
Where are the admin logs you talk about?
I did run bperror -s WARNING | ERROR | CRITICAL -U command on the master server. There were some warnings (I understood them) and no errors or critical messages.
What timeout value would I increase?
And, I did run a bpexpdate command earlier this morning to change the retention on some backups and when the backup data wasn't immediately deleted, I went back into the Admin Console and did a manual delete.
And, for this issue, I did stop and restart the NBU processes on the NBU servers.
I think I am at a point to just sit back and wait until Monday comes.
02-07-2014 03:51 PM
Greets,
admin logs are found with all other legacy logs in default /usr/openv/netbackup/logs/admin - mostly all commands from /usr/openv/netbackup/bin/admincmd directory will log here including bpstsinfo, the process you reported in the error.
considering you raised a case formerly with SYMC I'm hesistant to go into detail about the timeout values; (too many cooks spoil the broth?) it might very well end up hiding the error, but its not truly the answer to the problem :( I suggest doing a deep-dive on your environment and I'm sure SYMC will help to work through the interconnects on your network. if your disks and backups are not being affected I'd recommend treading cautiously and not simply adding timeouts until you're happy: some timeouts will work well, but the negative is that they might otherwise hide high profile problems as/when they occur.
Good luck,
- Billy