12-07-2015 01:23 PM
Hi,
I'm trying to restore a backup from a Unix(Ubuntu) to another Ubuntu server. On multiple occasion I get the error cannot write data to socket, 10054, and media manager for backup id xxx exited with status 24: socket write failed
I've been looking in the bpbrm log file on the server and I found this section and I'm not sure if it's normal or not.
14:34:43.770 [25708.26436] <2> bpbrm read_media_msg: read from media manager: MEDIA READY
14:34:43.770 [25708.26436] <2> bpbrm signal_bpbrm_child: sending Media Ready to bpbrm child 20536
14:34:45.314 [20536.24204] <2> bpbrm mm_sig: received ready signal from media manager
14:39:44.010 [25708.26436] <2> bpbrm send_parent_msg: KEEP_ALIVE 1
14:39:44.134 [25708.26436] <2> bpbrm read_parent_msg: read from parent ACK_KEEP_ALIVE
14:44:45.060 [25708.26436] <2> bpbrm send_parent_msg: KEEP_ALIVE 2
14:44:45.185 [25708.26436] <2> bpbrm read_parent_msg: read from parent ACK_KEEP_ALIVE
14:49:46.049 [25708.26436] <2> bpbrm send_parent_msg: KEEP_ALIVE 3
14:49:46.111 [25708.26436] <2> bpbrm read_parent_msg: read from parent ACK_KEEP_ALIVE
14:54:47.037 [25708.26436] <2> bpbrm send_parent_msg: KEEP_ALIVE 4
14:54:47.099 [25708.26436] <2> bpbrm read_parent_msg: read from parent ACK_KEEP_ALIVE
14:59:48.025 [25708.26436] <2> bpbrm send_parent_msg: KEEP_ALIVE 5
14:59:48.150 [25708.26436] <2> bpbrm read_parent_msg: read from parent ACK_KEEP_ALIVE
15:04:49.045 [25708.26436] <2> bpbrm send_parent_msg: KEEP_ALIVE 6
15:04:49.107 [25708.26436] <2> bpbrm read_parent_msg: read from parent ACK_KEEP_ALIVE
15:09:50.049 [25708.26436] <2> bpbrm send_parent_msg: KEEP_ALIVE 7
15:09:50.174 [25708.26436] <2> bpbrm read_parent_msg: read from parent ACK_KEEP_ALIVE
15:14:51.037 [25708.26436] <2> bpbrm send_parent_msg: KEEP_ALIVE 8
15:14:51.099 [25708.26436] <2> bpbrm read_parent_msg: read from parent ACK_KEEP_ALIVE
15:19:52.010 [25708.26436] <2> bpbrm send_parent_msg: KEEP_ALIVE 9
15:19:52.135 [25708.26436] <2> bpbrm read_parent_msg: read from parent ACK_KEEP_ALIVE
15:24:53.061 [25708.26436] <2> bpbrm send_parent_msg: KEEP_ALIVE 10
15:24:53.185 [25708.26436] <2> bpbrm read_parent_msg: read from parent ACK_KEEP_ALIVE
15:29:54.049 [25708.26436] <2> bpbrm send_parent_msg: KEEP_ALIVE 11
15:29:54.174 [25708.26436] <2> bpbrm read_parent_msg: read from parent ACK_KEEP_ALIVE
15:34:55.053 [25708.26436] <2> bpbrm send_parent_msg: KEEP_ALIVE 12
15:34:55.178 [25708.26436] <2> bpbrm read_parent_msg: read from parent ACK_KEEP_ALIVE
15:39:56.057 [25708.26436] <2> bpbrm send_parent_msg: KEEP_ALIVE 13
15:39:56.182 [25708.26436] <2> bpbrm read_parent_msg: read from parent ACK_KEEP_ALIVE
15:42:04.056 [25708.26436] <2> bpbrm read_media_msg: read from media manager: MEDIA NOT READY
15:42:04.056 [25708.26436] <2> bpbrm signal_bpbrm_child: sending Media Ready to bpbrm child 20536
15:42:04.056 [25708.26436] <2> bpbrm read_media_msg: read from media manager: EXIT licqc06_1447567228 24
15:42:04.056 [25708.26436] <2> bpbrm process_media_msg: media manager for backup id licqc06_1447567228 exited with status 24: socket write failed
15:42:04.056 [25708.26436] <2> bpbrm kill_bpbrm_child: terminating bpbrm child 20536 jobid=4295267
15:42:04.056 [25708.26436] <2> bpbrm signal_bpbrm_child: sending Terminate to bpbrm child 20536
15:42:06.474 [20536.24204] <2> bpbrm mm_sig: received not ready signal from media manager
15:42:06.474 [20536.24204] <2> bpbrm check_for_terminate: unexpected terminate
15:42:06.474 [20536.24204] <2> bpbrm kill_child_process_Ex: start
15:42:06.474 [20536.24204] <2> job_monitoring_exex: ACK disconnect
15:42:06.474 [20536.24204] <2> job_disconnect: Disconnected
15:42:06.474 [25708.26436] <2> bpbrm brm_child_done: child done, status 150
15:42:06.474 [25708.26436] <2> bpbrm brm_child_done: bpbrm child 20536 terminated by bpbrm parent
15:42:06.474 [25708.26436] <2> bpbrm send_status_to_parent: EXIT licqc06_1447567228 24 sent to parent process for jobid = 4295267.
15:42:06.536 [25708.26436] <2> bpbrm read_parent_msg: read from parent TERMINATE
15:42:06.536 [25708.26436] <2> bpbrm tell_mm: sending media manager msg: TERMINATE
15:42:06.536 [25708.26436] <2> job_monitoring_exex: ACK disconnect
15:42:06.536 [25708.26436] <2> job_disconnect: Disconnected
All the ACK_KEEP_ALIVE is what I'm wondering if it's normal, looks like it's timing out.
Thanks
Philip
Solved! Go to Solution.
12-15-2015 12:41 PM
12-08-2015 12:40 AM
Any firewalls between master/media and client ?
bpbkar on the media server and tar on the client are also relevant to have a look in.
12-08-2015 01:16 AM
Do you have all relevant logging enabled?
On media server: bpbrm and bptm
On destination client: tar (If file-level restore)
Level 3 should be sufficient for now.
Please tell us more about the restore - is anything actually written to the client before it fails?
Are you restoring large files (that could maybe cause the timeout while bpbrm and bptm are waiting for acknowledgement from client)?
What is Client Read Timeout set to on the media server?
Please copy full logs to .txt files (e.g. bptm.txt) and upload as File attachments.
12-08-2015 01:56 AM
Being a linux variation I am thinking it might could be the local OS firewall, if there is a firewall it worth to check if there is dropped packet/connections
I would start with going through the connectity troubleshooting steps, to be sure all required name resolution and ports was open between the client and the Netbackup Server(s).
Have found that the bpcd log often contains indications of the problem(s) with issues like this.
12-08-2015 06:42 AM
Since I don't have access to the linux client, I've asked someone to create the directory tar and if it's already created to send me the logs. I've also asked him if anything was written on the destination server.
The logs bpbkar, bpbrm and bptm on the media are already created, I've attached them to the message. The restore began around 14h30. Source client is licqc06, destination ligqc05.
The client read timeout is set to 1800 sec.
12-08-2015 06:44 AM
We really need tar log on the client - we need to follow entire process flow.
bpbkar on a media server is not involved in client restore.
Please remember to increase logging level to 3.
Client as well.
Seems logging level is still at 0:
(VERBOSE = 0)
12-08-2015 06:54 AM
OP log looks a lot like this one: https://www.veritas.com/community/forums/restore-fails-when-using-change-dest-works-ok-if-no-change-dest
Check permissions at the destination.
Maybe you aren't allowed to write there
12-08-2015 07:53 AM
The linux guy found a tar log, here's the content.
10:22:52 (4312399.001) INF - TAR STARTED 27445
10:22:52 (4312399.001) **LOCALE ERROR** locale <en_CA.UTF-8> not found in file </usr/openv/msg/.conf>
10:22:52 (4312399.001) Setting network receive buffer size to 32032 bytes
10:23:29 (4312399.001) Write interrupted by SIGPIPE.
10:23:29 (4312399.001) INF - TAR EXITING WITH STATUS = 40
10:23:29 (4312399.001) INF - TAR RESTORED 0 OF 0 FILES SUCCESSFULLY
10:23:29 (4312399.001) INF - TAR KEPT 0 EXISTING FILES
10:23:29 (4312399.001) INF - TAR PARTIALLY RESTORED 0 FILES
10:30:18 (4312402.001) INF - TAR STARTED 31253
10:30:18 (4312402.001) **LOCALE ERROR** locale <en_CA.UTF-8> not found in file </usr/openv/msg/.conf>
10:30:18 (4312402.001) Setting network receive buffer size to 32032 bytes
10:30:43 (4312402.001) Write interrupted by SIGPIPE.
10:30:43 (4312402.001) INF - TAR EXITING WITH STATUS = 40
10:30:43 (4312402.001) INF - TAR RESTORED 0 OF 0 FILES SUCCESSFULLY
10:30:43 (4312402.001) INF - TAR KEPT 0 EXISTING FILES
10:30:43 (4312402.001) INF - TAR PARTIALLY RESTORED 0 FILES
How do I change the verbose level on linux client, is it the same command? nbsetconfig TAR_VERBOSE = 3 ???
Thanks
12-08-2015 08:04 AM
Ok found how to change the logging level on linux.
12-08-2015 12:08 PM
I can confirm that there's no firewall install on that linux box.
Also, nothing was written on the destination server.
12-08-2015 12:28 PM
12-08-2015 12:38 PM
Yes I know Marianne. I was waiting to receive the tapes.
I'll be running the restore tonight and I'll be able to upload all the logs. (Might be only tomorrow though)
Thanks
12-08-2015 01:00 PM
I bet there is a firewall software installed on the client, it may not have any rules but it will be there.
Get a root admin to run:
iptables -L
Then post the output here.
12-08-2015 01:25 PM
The wierd thing is that some restore works and some doesn't. I'll ask an admin to run the command.
12-08-2015 01:28 PM
Phil, are you restoring them all to the same place - it could be a permissions issue on the directories/filesystems you are restoring to.
12-08-2015 01:31 PM
Yes they're all going into the same folder. (parent folder)
/srv/mail/mysite.com/Restaure/xxx
The xxx part is the only part changing.
12-08-2015 02:02 PM
VERBOSE 5 logs required as mentioned above - all of them needed.
12-09-2015 06:53 AM
Hi revarooo, here's the result of the iptables -L command.
~$ iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Chain SUBNET_CHECK (0 references)
target prot opt source destination
12-09-2015 07:07 AM
12-09-2015 07:36 AM
Is this error consistent - what if you restore a file from /tmp, do you still get same error ?