04-23-2016 10:17 PM
Hi dear all,
2 days ago our master server overloaded, bpjobd process got many server resources, after restart NBU services I saw bpjobd log increased abnormally,
-rw-rw-rw- 1 root root 34967134 Apr 20 23:59 log.042016
-rw-rw-rw- 1 root root 53303748 Apr 21 23:59 log.042116
-rw-rw-rw- 1 root root 35243017306 Apr 22 23:59 log.042216
-rw-rw-rw- 1 root root 31203173 Apr 23 23:59 log.042316
-rw-rw-rw- 1 root root 19902730 Apr 24 09:42 log.042416
We have about 370 million line "<16> <tid:47717857593888> bpjobd: socket() failed: Bad file descriptor (9)"
What is troubleshooting steps in this case?
Thanks,
04-24-2016 01:35 AM
I think I found error root cause, I have how many reservation conflict error in server messeages log
Apr 22 17:04:03 mstr-nbkp-srv kernel: st 4:0:33:0: reservation conflict
Apr 22 17:04:03 mstr-nbkp-srv avrd[6895]: Reservation Conflict status from HP.ULTRIUM6-SCSI.006 (device 46)
Apr 22 17:04:03 mstr-nbkp-srv kernel: st 4:0:27:0: reservation conflict
Apr 22 17:04:03 mstr-nbkp-srv avrd[6895]: Reservation Conflict status from HP.ULTRIUM6-SCSI.011 (device 51)
Apr 22 17:04:03 mstr-nbkp-srv kernel: st 4:0:17:0: reservation conflict
Apr 22 17:04:03 mstr-nbkp-srv avrd[6895]: Reservation Conflict status from HP.ULTRIUM6-SCSI.021 (device 61)
Apr 22 17:04:03 mstr-nbkp-srv kernel: st 2:0:17:0: reservation conflict
Apr 22 17:04:03 mstr-nbkp-srv avrd[6895]: Reservation Conflict status from HP.ULTRIUM6-SCSI.030 (device 70)
Apr 22 17:04:03 mstr-nbkp-srv kernel: st 2:0:10:0: reservation conflict
Apr 22 17:04:03 mstr-nbkp-srv avrd[6895]: Reservation Conflict status from HP.ULTRIUM6-SCSI.037 (device 77)
Apr 22 17:04:03 mstr-nbkp-srv kernel: st 2:0:7:0: reservation conflict
Apr 22 17:04:03 mstr-nbkp-srv avrd[6895]: Reservation Conflict status from HP.ULTRIUM6-SCSI.039 (device 79)
Apr 22 17:04:03 mstr-nbkp-srv kernel: st 2:0:5:0: reservation conflict
Apr 22 17:04:03 mstr-nbkp-srv avrd[6895]: Reservation Conflict status from HP.ULTRIUM6-SCSI.041 (device 81)
Apr 22 17:04:18 mstr-nbkp-srv kernel: st 4:0:33:0: reservation conflict
Please assist
Thanks
05-03-2016 03:12 AM
That is a SCSI reservation conflict message. Is there another host that could have been trying to use the tape drives at the same time?
Take a look at these
https://www.veritas.com/support/en_US/article.000044932
https://www.veritas.com/support/en_US/article.000043839
05-03-2016 08:06 PM
When you say server was overloaded... the server could be running out of Open File Descriptors... Run command "ulimit -n or ulimi -a" to check Open File Descriptors,, For a master / media server it should be 8192 or higher. If you already have 8192 configured and still not enough when all job starts.. you can modify it to as per your needs... may be set it to 16000.
Modify File Descritptors by running command "ulimit -n 16000"
Note: Above command for modifying file descriptors will have temporary effect and file descriptors will be set to previous value after reboot. To have it persistent, modify /etc/sysctl.conf and take a reboot to be in effect.