We are encountering an issue on NetBackup particularly on nbpem (Policy Execution Manager). Scheduled backup jobs won't run and also we try to run manual triggered backup jobs but unfortunately, it also doesn't display on activity monitor. We try to run a cycle on service (stop/start) but nbpem and other services which is dependent on it not stopping at all.
All is working fine until an intermittent connection arise, it last for like 4-5 hours. On that time frame period, Storage (Disk and TLD are down), java console and backup jobs are having an issue. Java console so slow to react. Backup jobs are failing. We tried to run a cycle of service (stop/start). At starting the services, nbwmc was not starting so we up it using bpup -e nbwmc command and it wen't successfully. However, on checking the java console this time, some TLDs are up and some are not, when we check on scheduled jobs that has to be run, it won't not starting at all so we decide to test manual backups but also, it provide the same result. We check that nbpem is not reacting and we try to cycle again the services but this time, nbpem is not stopping as also the dependent services (it is looping).
Looking forward for any comments or suggestions on this as not having successful backups is not good.
Thanks and Regards,
You said nbpem and other services are not stoping even when you are trying to stop them, have you tried to kill theses processes using taskkill or task manager?
please set nbpem & bprd logs on the master (verbo =3) and retry the issue.
If I read your post correctly it all seems to be related to connection issues :
All is working fine until an intermittent connection arise, it last for like 4-5 hours. On that time frame period, Storage (Disk and TLD are down),
Will that be connection issues with media servers?
I have seen this happening with excessive Client Connect timeout values and one or more media servers happened to be down or inaccessible due to network issues.
So, when processes like nbpem and nbrb needs to evaluate resources, it needs to connect to each media server.
Lets say that there in a Client Connect Timeout of 3600 (1 hour) and Media1 and Media3 are down or inaccessible -
Master tries to connect to Media1 - it will wait for 1 hour for a response before timing out and move on to Media2.
Media2 responds, and Master now moves on to Media3, again wait for 1 hour to timeout before moving to next media server.
When you are experiencing these issues, simulate connection request to all media servers with command like:
Best to mark a media server down when you know that there are connection issues and to reduce Client Connect Timeout to the default of 5 minutes.
If my assumptions are wrong, then you will need to ding into logs to troubleshoot.