cancel
Showing results for 
Search instead for 
Did you mean: 

Bad file descriptor

shahriar_sadm
Level 6

Hi dear all,

2 days ago our master server overloaded, bpjobd process got many server resources, after restart NBU services I saw bpjobd log increased abnormally, 

-rw-rw-rw- 1 root root    34967134 Apr 20 23:59 log.042016
-rw-rw-rw- 1 root root    53303748 Apr 21 23:59 log.042116
-rw-rw-rw- 1 root root 35243017306 Apr 22 23:59 log.042216
-rw-rw-rw- 1 root root    31203173 Apr 23 23:59 log.042316
-rw-rw-rw- 1 root root    19902730 Apr 24 09:42 log.042416

We have about 370 million line "<16> <tid:47717857593888> bpjobd: socket() failed: Bad file descriptor (9)" 

What is troubleshooting steps in this case?

Thanks,

 

 

 

3 REPLIES 3

shahriar_sadm
Level 6

I think I found error root cause, I have  how many reservation conflict error in server messeages log

 

Apr 22 17:04:03 mstr-nbkp-srv kernel: st 4:0:33:0: reservation conflict
Apr 22 17:04:03 mstr-nbkp-srv avrd[6895]: Reservation Conflict status from HP.ULTRIUM6-SCSI.006 (device 46)
Apr 22 17:04:03 mstr-nbkp-srv kernel: st 4:0:27:0: reservation conflict
Apr 22 17:04:03 mstr-nbkp-srv avrd[6895]: Reservation Conflict status from HP.ULTRIUM6-SCSI.011 (device 51)
Apr 22 17:04:03 mstr-nbkp-srv kernel: st 4:0:17:0: reservation conflict
Apr 22 17:04:03 mstr-nbkp-srv avrd[6895]: Reservation Conflict status from HP.ULTRIUM6-SCSI.021 (device 61)
Apr 22 17:04:03 mstr-nbkp-srv kernel: st 2:0:17:0: reservation conflict
Apr 22 17:04:03 mstr-nbkp-srv avrd[6895]: Reservation Conflict status from HP.ULTRIUM6-SCSI.030 (device 70)
Apr 22 17:04:03 mstr-nbkp-srv kernel: st 2:0:10:0: reservation conflict
Apr 22 17:04:03 mstr-nbkp-srv avrd[6895]: Reservation Conflict status from HP.ULTRIUM6-SCSI.037 (device 77)
Apr 22 17:04:03 mstr-nbkp-srv kernel: st 2:0:7:0: reservation conflict
Apr 22 17:04:03 mstr-nbkp-srv avrd[6895]: Reservation Conflict status from HP.ULTRIUM6-SCSI.039 (device 79)
Apr 22 17:04:03 mstr-nbkp-srv kernel: st 2:0:5:0: reservation conflict
Apr 22 17:04:03 mstr-nbkp-srv avrd[6895]: Reservation Conflict status from HP.ULTRIUM6-SCSI.041 (device 81)
Apr 22 17:04:18 mstr-nbkp-srv kernel: st 4:0:33:0: reservation conflict

Please assist

Thanks

 

vjuhola
Level 4
Partner

That is a SCSI reservation conflict message. Is there another host that could have been trying to use the tape drives at the same time?

Take a look at these

https://www.veritas.com/support/en_US/article.000044932

https://www.veritas.com/support/en_US/article.000043839

PatS729
Level 5

When you say server was overloaded... the server could be running out of Open File Descriptors... Run command "ulimit -n or ulimi -a" to check Open File Descriptors,, For a master / media server it should be 8192 or higher. If you already have 8192 configured and still not enough when all job starts.. you can modify it to as per your needs... may be set it to 16000.

Modify File Descritptors  by running command "ulimit -n 16000"

Note: Above command for modifying file descriptors will have temporary effect and file descriptors will be set to previous value after reboot. To have it persistent, modify /etc/sysctl.conf and take a reboot to be in effect.