Solved: I changed NET_BUFFER_SZ and DATA_BUFFER_SZ and go...

LAToro · ‎09-03-2010

As I await access to a few clients that failed last night, I figure I'd throw this out there in case there is a quick explanation...

The environment is running NB 6.5.6 (on clients and media servers). We have 2 media servers (a windows and linux media server), backing up windows and rh linux clients.
The storage media is LTO4.

In an effort to improve performance, I increased the NET_BUFFER_SZ from 65536 to 262144, and increased the DATA_BUFFER_SZ from 131072 to 262144. I just realized the 2 higher numbers differ...perhaps I fat fingered the numbers when I made the change but this is what I see in the BPTM log:

Before:
23:07:46.556 [23746] <2> io_set_recvbuf: receive network buffer is 131072 bytes
23:07:49.634 [23856] <2> io_set_recvbuf: setting receive network buffer to 65536 bytes
23:07:49.634 [23856] <2> io_set_recvbuf: receive network buffer is 131072 bytes
23:07:50.634 [23882] <2> io_set_recvbuf: setting receive network buffer to 65536 bytes
23:07:50.634 [23882] <2> io_set_recvbuf: receive network buffer is 131072 bytes
23:07:50.646 [23880] <2> io_set_recvbuf: setting receive network buffer to 65536 bytes

After:
17:03:15.992 [9475] <2> io_set_recvbuf: setting receive network buffer to 262144 bytes
17:03:15.992 [9475] <2> io_set_recvbuf: receive network buffer is 262142 bytes
17:03:41.647 [9520] <2> io_set_recvbuf: setting receive network buffer to 262144 bytes
17:03:41.647 [9520] <2> io_set_recvbuf: receive network buffer is 262142 bytes
17:04:15.462 [9602] <2> io_set_recvbuf: setting receive network buffer to 262144 bytes
17:04:15.462 [9602] <2> io_set_recvbuf: receive network buffer is 262142 bytes

For some reason several, but not all, linux clients failed with a RC=13:
Sep 2, 2010 6:13:30 PM - begin writing
Sep 2, 2010 6:23:48 PM - Error bpbrm (pid=21985) socket read failed: errno = 62 - Timer expired
Sep 2, 2010 6:25:02 PM - end writing; write time: 0:11:32
file read failed (13)

I reverted the NET_BUFFER_SZ to 65536, and the jobs reran successfully.

Any thoughts?

Will_Restore · ‎09-03-2010

Generally, performance is better when the value in the NET_BUFFER_SZ file on the client matches the value in the NET_BUFFER_SZ file on the media server.

View solution in original post

Will_Restore · ‎09-03-2010

Generally, performance is better when the value in the NET_BUFFER_SZ file on the client matches the value in the NET_BUFFER_SZ file on the media server.

LAToro · ‎09-07-2010

I was able to confirm the NET_BUFFER_SZ setting on the clients that failed. And they were set to 65536, so it would seem that was the issue. However, I am puzzled as to why other clients did not fail.

VOX

I changed NET_BUFFER_SZ and DATA_BUFFER_SZ and got ec=13s on some backups