cancel
Showing results for 
Search instead for 
Did you mean: 

status 24(Socket write failed) issue while restore

James-Lee
Level 3


status 24(Socket write failed) issue while restore

Hi everyone.

I tried to restore but status code 24 fail occured while restoring the data. Backup environment and details log are below, please suggest how to resolve this issue.

Master server(backupserver) : Solaris 10, NBU-6.5.6
Restore Source Client(cmserver) : HP-UX 11.11 PA-RISC, NBU-6.5.6
Restore Destination Client(cmsvr) : HP-UX 11.31 IA64, NBU-6.5.6
Restore Type : file restore

NET_BUFFER_SZ is 262144 (master, client),
CLIENT_READ_TIMEOUT = 7200, CLIENT_CONNECT_TIMEOUT = 7200 (master, client)


### Detail status log
2014. 7. 15 오후 5:52:12 - begin Restore
2014. 7. 15 오후 5:52:21 - number of images required: 1
2014. 7. 15 오후 5:52:21 - media needed: 0134L5
2014. 7. 15 오후 5:52:25 - restoring from image cmserver_1405396953
2014. 7. 15 오후 5:52:27 - connecting
2014. 7. 15 오후 5:52:28 - connected; connect time: 0:00:00
2014. 7. 15 오후 5:52:39 - started process bptm (pid=21676)
2014. 7. 15 오후 5:52:39 - mounting 0134L5
2014. 7. 15 오후 5:52:37 - requesting resource 0134L5
2014. 7. 15 오후 5:52:37 - granted resource  0134L5
2014. 7. 15 오후 5:52:37 - granted resource  IBM.ULT3580-TD5.006
2014. 7. 15 오후 5:53:06 - mounted 0134L5; mount time: 0:00:27
2014. 7. 15 오후 5:53:07 - positioning 0134L5 to file 147
2014. 7. 15 오후 5:54:34 - Warning bprd (pid=21626) Restore must be resumed prior to first image expiration on Fri Aug 15 13:02:33 2014
2014. 7. 15 오후 5:54:35 - end Restore; elapsed time 0:02:23
2014. 7. 15 오후 5:54:35 - positioned 0134L5; position time: 0:01:28
2014. 7. 15 오후 5:54:35 - begin reading
2014. 7. 15 오후 5:54:36 - Error bptm (pid=21689) The following files/folders were not restored:
2014. 7. 15 오후 5:54:36 - Error bptm (pid=21689) UTF - /backup/
2014. 7. 15 오후 5:54:36 - Error bptm (pid=21689) UTF - /backup/ChangeFlow.20101228.tar
2014. 7. 15 오후 5:54:36 - Error bptm (pid=21689) UTF - /backup/ChangeFlow.20110114.tar
2014. 7. 15 오후 5:54:36 - Error bptm (pid=21689) UTF - /backup/WEB-INF.20110114.tar
2014. 7. 15 오후 5:54:36 - Error bptm (pid=21689) UTF - /backup/WEB-INF.20110502.tar
2014. 7. 15 오후 5:54:37 - Error bptm (pid=21689) UTF - /backup/changeflow.20130219.tar
2014. 7. 15 오후 5:54:37 - Error bptm (pid=21689) UTF - /backup/classes.tar
2014. 7. 15 오후 5:54:37 - Error bptm (pid=21689) UTF - /backup/web-inf.20130219.tar
2014. 7. 15 오후 5:54:37 - Error bptm (pid=21689) UTF - /backup/workflow.20110114.tar
2014. 7. 15 오후 5:54:37 - Error bptm (pid=21689) UTF - /backup/classes/
2014. 7. 15 오후 5:54:37 - Error bptm (pid=21689) more than 10 files were not restored, remaining ones are shown in the progress log.
socket write failed  (24)
######################################################

### Destination Client bptestbpcd test
[/usr/openv/netbackup/logs]# bptestbpcd -client cmsvr -verbose
1 1 1
x.x.1.254:44842 -> x.x.1.129:13724
x.x.1.254:44844 -> x.x.1.129:13724
PEER_NAME = backupserver
HOST_NAME = cmsvr
CLIENT_NAME = cmsvr
VERSION = 0x06540000
PLATFORM = hpia64
PATCH_VERSION = 6.5.6.0
SERVER_PATCH_VERSION = -1.-1.-1.-1
MASTER_SERVER = backupserver
EMM_SERVER = backupserver
x.x.1.254:44847 -> x.x.1.129:13724


### Master server restore log(/usr/openv/netbackup/logs/user_ops/root/logs/jbp-22132405414329366564000000411-xwaGoR.log)
Restore started 07/15/2014 17:52:10

17:52:21 (104874.xxx) Restore job id 104874 will require 1 image.
17:52:21 (104874.xxx) Media id 0134L5 is needed for the restore.

17:52:25 (104874.001) Restoring from image created Tue Jul 15 13:02:33 2014
17:52:27 (104874.001) INF - If Media id 0134L5 is not in a robotic library administrative interaction may be required to satisfy this mount request.
17:52:39 (104874.001) INF - Waiting for mount of media id 0134L5 on server backupserver for reading.
17:53:06 (104874.001) INF - Waiting for positioning of media id 0134L5 on server backupserver for reading.
17:54:35 (104874.001) Status of restore from image created Tue Jul 15 13:02:33 2014 = socket write failed

17:54:35 (104874.001) INF - Beginning restore from server backupserver to client cmsvr.
17:54:35 (104874.xxx) INF - Status = socket write failed.

17:54:36 (104874.001) The following files/folders were not restored:
17:54:36 (104874.001) UTF - /backup/
17:54:36 (104874.001) UTF - /backup/ChangeFlow.20101228.tar
17:54:36 (104874.001) UTF - /backup/ChangeFlow.20110114.tar
...
...

### Master bpbrm log
...
17:52:28.276 [21692] <2> bpbrm start_bpcd_stat: DATA_SOCK from bpcr = 10
17:52:28.276 [21692] <2> bpbrm start_bpcd_stat: NAME_SOCK from bpcr = 11
17:52:28.276 [21692] <2> bpbrm handle_restore: calling bpcr_get_socket_rqst3.
17:52:28.277 [21692] <2> bpbrm handle_restore: forking tar
17:52:28.277 [21692] <2> bpbrm handle_restore: restore command = /usr/openv/netbackup/bin/tar tar -x -v -Y -p -P -I 1405414330 -U 0 -E /usr/openv/netbackup/.rename.21692 -j -k -Q -J clnt_lc_messages=C -J clnt_lc_time=C -J clnt_lc_ctype=C -J clnt_lc_collate=C -J clnt_lc_numeric=C -J restoreid=104874.001 -J job_total=1 -J client=cmsvr -J requesting_client=backupserver -J browse_client=cmserver -J backup_time=1405396953 -L /usr/openv/netbackup/logs/user_ops/root/logs/jbp-22132405414329366564000000411-xwaGoR.log -f -
17:52:28.277 [21692] <2> bpbrm handle_restore: received bpcd success message
17:52:28.396 [21692] <2> bpbrm handle_restore: read tar start message from cmsvr
17:54:34.463 [21672] <2> bpbrm read_parent_msg: read from parent STOP RESTORE cmserver_1405396953
17:54:34.463 [21672] <2> bpbrm kill_bpbrm_child: killing bpbrm child 21692.
17:54:34.464 [21692] <2> bpbrm check_for_terminate: process killed by signal 1
17:54:34.494 [21672] <2> bpbrm brm_sigcld: SIGCLD caught by bpbrm
17:54:34.495 [21672] <2> bpbrm brm_sigcld: bpbrm child 21692 exit_status = 150, signal_status = 0
17:54:34.495 [21672] <2> bpbrm brm_sigcld: child 21692 exited with status 150: termination requested by administrator
17:54:34.495 [21672] <2> bpbrm send_status_to_parent: bpbrm child is done, but the media manager child is not.
17:54:34.495 [21672] <2> bpbrm tell_mm: sending media manager msg: STOP RESTORE cmserver_1405396953
17:54:35.995 [21672] <2> bpbrm read_media_msg: read from media manager: CURRENT POSITION 0134L5 147
17:54:35.995 [21672] <2> bpbrm send_parent_msg: CURRENT POSITION 0134L5 147
17:54:57.997 [21672] <2> bpbrm read_media_msg: read from media manager: EXIT cmserver_1405396953 150
17:54:57.998 [21672] <2> bpbrm process_media_msg: media manager for backup id cmserver_1405396953 exited with status 150: termination requested by administrator
...

### Master bptm log - attached.
 

### Master bpcd log - attached.
...
 

 

### client tar log
17:52:26 (104874.001) INF - TAR STARTED
17:52:26 (104874.001) **LOCALE ERROR** locale <ko_KR.eucKR> not found in file </usr/openv/msg/.conf>
17:52:26 (104874.001) Setting network receive buffer size to 262144 bytes
17:54:55 (104874.001) INF - TAR EXITING WITH STATUS = 6
17:54:55 (104874.001) INF - TAR RESTORED 0 OF 0 FILES SUCCESSFULLY

8 REPLIES 8

RiaanBadenhorst
Level 6
Partner    VIP    Accredited Certified

Comms problem maybe

From bpcd log.

17:54:35.431 [21881] <2> get_short: (2) premature end of file (byte 1)
17:54:35.431 [21881] <16> bpcd main: token read: -9
17:54:35.431 [21881] <4> bpcd main:    errno = 5 - I/O error

Marianne
Level 6
Partner    VIP    Accredited Certified

We need client's bpcd log. Not master's.

Please add VERBOSE = 5 in client's bp.conf before trying the restore again.

Post verbose logs from client as File attachments.

James-Lee
Level 3

Attach log file(verbose 5)

Thank you.

James-Lee
Level 3

Hi Riaan.

Thanks for the reply.

I guess that is the problem but I don't know the exact cause.

James-Lee
Level 3

Hi Marianne.

Thank you for the quick answer.

I've attached the client bpcd log file as you advised.

And I've attached the other log files.

RiaanBadenhorst
Level 6
Partner    VIP    Accredited Certified

bpcd stops logging here. Maybe the process was killed?

17:52:26.631 [9504] <2> bpcd main: child_args[45] = disallow_server_file_writes=0
17:52:26.631 [9504] <2> bpcd main: Before execvp of command

James-Lee
Level 3

But I didn't anything job to stop the restore manually.

And no load on the network because the development server.

Thank you Riaan.

Marianne
Level 6
Partner    VIP    Accredited Certified

There is nothing in the logs that 'jumps out'. 
I was hoping that verbose tar log would help.

Maybe we should take a step back - 
Are normal backups on this (destination) client working?
Have you tried to restore anything backed up from the client? 
Anything different as far as filesystems on source and destination clients are concerned?

Are OS patches up to date on this client?
We have in the past seen that OS patches have resolved network issues on HP-UX clients.

Also wondering if the **LOCALE ERROR** may be causing an issue. 
Can you perhaps compare OS locale settings with other working clients?

*** EDIT ***

Seems the Locale Error is 'not serious': http://www.symantec.com/docs/TECH19946