Forum Discussion

AdamChangepoint's avatar
6 years ago

Various clients fail backup with (24) socket write failed and (42) network read failed

I'm having an issue with various clients throwing the errors above.

I've gone through a number of troubleshooting steps , more then I can remember at this point, changing timeout values or settings on the master server to try and resolve the issue.

That being said I can solve the issue by rebooting the system.  After a reboot the system will backup without error and wthout changing any settings on the system. 

The problem is that the issue starts to re-occur within a week or so requiring another reboot of the system.

If a reboot was all that was required to fix the issue that would be fine, but weekly reboots is not something I can have occuring.  

The errors are the same across systems:

11/18/2018 16:53:31 - Critical bpbrm (pid=296) from client : FTL - socket write failed
11/18/2018 16:53:31 - Error bptm (pid=5848) socket operation failed - 10054 (at ../child.c.1276)
11/18/2018 16:53:31 - Error bptm (pid=5848) unable to perform read from client socket, connection may have been broken
11/18/2018 16:53:31 - Info bptm (pid=3896) EXITING with status 42 <----------
11/18/2018 16:53:31 - Error bpbrm (pid=296) could not send server status message

or

11/18/2018 04:40:08 - Critical bpbrm (pid=6188) from client : FTL - socket write failed
11/18/2018 04:40:09 - Error bptm (pid=5636) socket operation failed - 10054 (at ../child.c.1276)
11/18/2018 04:40:09 - Error bptm (pid=5636) unable to perform read from client socket, connection may have been broken
11/18/2018 04:40:09 - Error bpbrm (pid=6188) could not send server status message

Looking for next steps on how to run this issue down.

Thanks

Adam Becker

11 Replies

  • I am battling with the OS and NBU versions that you have selected. 

    W2012 with 7.1.x and earlier? 
    Not possible as support for W2012 started much later.

    Status 24 is never an NBU issue, therefore a good understanding of the environment is crucial, especially the problematic clients.
    There are technotes for W2003 clients, but best if you give us correct info. 

    Herewith extract from excellent post

    I describe the 23/24/25 status codes as follows:

    RC=23:  Server A sent a IP packet to valid server B, and is waiting for a response packet.  It fails to get the response packet within the TIMEOUT window and raises the rc=23.

    RC=25:  Server A tried to sent IP packet to invalid server B.   No connection made so Server A sets rc=25.

    RC=24: Server A sends packet to server B and get a response within the TIMEOUT window. But something happens that drops connection between them. 

    I make an analogy of this communication environment using phone calls:

    Person on Phone A calls to phone number B, which connects and they leave a voice mail to call them back. They wait for a call back that does not come and after a specified time, they quit. RC=23.

    Person A calls phone number for what he thinks is a valid Phone B.  The call does not go through and they hear the message "The number you have dialed is not a working number". RC=25.

    Person A calls Person B, they call is picked up but the line connection somehow gets dropped unexpectedly.while communications is in progress.  RC=24.

    All of these are communication errors of some kind.

    For RC=25, the sourtce server may have the wrong target server name in its environment or an invalid/wrong IP address for the target server.

    For RC=23, A can talk to B but B cannot talk to A. Could be a source server it does not recognize or it is using the wrong IP address t respond to.  Possible bad host name to IP resolution.

    RC 24: The toughest of the bunch. A and B know each other correctly. They just can't keep the call going.

    You may also want to go through this post by mph999 :
    https://vox.veritas.com/t5/NetBackup/Backup-job-fails-with-different-status-codes-13-24-and-42/m-p/460060#M102743

    Unfortunately most of the Symantec URLs are no longer working...

    • AdamChangepoint's avatar
      AdamChangepoint
      Level 3

      Thanks for that info, but sorry for sending you down the wrong path :/

      I made a mistake and have now updated my original message, my client and master server version is actually 7.7.3 and not 7.1.x

      My Bad :(

      • Marianne's avatar
        Marianne
        Level 6

        What about OS and NBU version(s) on problematic clients? 

        How many clients are affected? The same or different clients each time?

        Does this happen only during peak backup times? 
        If so, have you tried to stagger backup schedule times? 

        The 2nd post that I have referred to lists quite a lot of possible reasons for network issues during backup window.

  • You mean to rebooting master/media server solves the problem and backup again starts failing after a while??

    Do you have any anti-virus software configured on your master?? or any other software (bit9 or carbon black) that could be causing these issues.

    • AdamChangepoint's avatar
      AdamChangepoint
      Level 3

      No I mean rebooting the clients.  I could live with rebooting the master server.