cancel
Showing results for 
Search instead for 
Did you mean: 

I am getting same error 2826 while restoring

RajkumarBaniwal
Level 4
Certified

Hi

Reference: https://www-secure.symantec.com/connect/forums/mailbox-restore-fails-invalid-error-number2826

I am getting same error 2826 while restoring on Linux client .It is dispalying the error

4/5/2012 12:24:37 PM - Error bptm(pid=13460) cannot write data to socket, Broken pipe     
4/5/2012 12:24:37 PM - Error bptm(pid=13460) The following files/folders were not restored:      
4/5/2012 12:24:37 PM - Error bptm(pid=13460) UTF - /dcs/arch01/ARC/ERI/Mar_2012/MSEMR03201203130957551503.dat.gz         
4/5/2012 12:24:38 PM - Info bpbrm(pid=13451) media manager for backup id gnnmed25_1333259894 exited with status 24: socket write failed
4/5/2012 12:24:38 PM - Info bpbrm(pid=13451) terminating bpbrm child 13463 jobid=2014589       
4/5/2012 12:24:38 PM - restored image gnnmed25_1333259894 - (socket write failed(24)); restore time 01:18:28
4/5/2012 12:24:39 PM - end Restore; elapsed time: 01:19:12
invalid error number(2826).

 

please help on the same

 

Rajkumar Baniwal

backup support 

 

***** Moved to new discussion by MvdB ******

Regards,
Rajkumar Baniwal
1 ACCEPTED SOLUTION

Accepted Solutions

RajkumarBaniwal
Level 4
Certified

It is resolved after changing the same client as SAN media server and restoring through the same media server

Regards,
Rajkumar Baniwal

View solution in original post

8 REPLIES 8

Marianne
Level 6
Partner    VIP    Accredited Certified

You have probably noticed in the discussion where you originally posted that we have asked for additional info but never received anything.
The 'real' error in your instance seems to be status 24.
Please share all relevant info for your environment:

OS on Master server and media server
OS on Client (Which Linux 'flavour' and version)
NBU version (and patch level) on master, media server and client

Any firewall between master and media server or between media server and client?
We can see that the restore failed after 1 hour. We sometimes see that firewall timeouts will terminate connection. Was any data actually written to client?

Please share all text in Activity Monitor details as well as the info in 'View Status' tab of the restore GUI.

Also post following logs as attachments (if they do not exist, please create them and retry the restore):

On master: bprd (restart NBU after creating this log)
On media server: bpbrm and bptm
On client : bpcd and tar

Please rename logs to match process name and post them as attachments.

RajkumarBaniwal
Level 4
Certified

PFB

OS on Master server and media server           SUN Solaris OS 5.10
OS on Client (Which Linux 'flavour' and version)    Linux 2.6.18-164.el5
NBU version (and patch level) on master, media server and client   7.1.0.3

Any firewall between master and media server or between media server and client?   No
Was any data actually written to client?  No

Alttached all logs fo reference.

 

Thanks and regards

Rajkumar Baniwal

Regards,
Rajkumar Baniwal

mph999
Level 6
Employee Accredited

I'm a bit confused,

The restore starts at around 11:06 (from job details we can match the PID)

From media server bpbrm log

11:06:10.726 [13451] <2> logparams: -restore -mpx_restore -S gnnbkp01 -mt 2 -to 32767 -mud 180 -mediasvr gnnbkp01 -masterversion 710000 

 

From client tar log the first line is 11.06
 
11:06:13 (2014589.001) INF - TAR STARTED 15158
 
But bpcd log from client, the first line is 
 
17:58:07.449 [13799] <2> setup_debug_log: switched debug log file for bpcd
 
... unless I have make a mistake, the bpcd log is not covering the time of the restore.
 
Anyhow looking at what we have ...
 
bptm seems to start reading the tape, suspect it could be an mpx 'backup' as it reads for some time ...
 
11:08:12.148 [13456] <2> send_brm_msg: CURRENT POSITION UN0125 3
11:08:12.148 [13456] <2> process_brm_msg: no pending message from bpbrm
11:08:13.148 [13456] <2> read_brm_msg: CONTINUE RESTORE
11:08:13.148 [13456] <2> send_brm_msg: MEDIA READY
11:08:13.148 [13456] <2> mpx_read_data: begin reading data from media id UN0125, file num 3
11:08:13.148 [13456] <2> mpx_read_data: Total Kbytes transferred 0
11:08:13.153 [13456] <2> read_position: read position on drive index 4, is LBA 429134
11:28:13.097 [13456] <4> report_throughput: VBRT 1 13456 0 1 HP.ULTRIUM4-SCSI.002 UN0125 0 1 0 141796352  141796352 (mpxrestore.c.1715)
11:28:13.097 [13456] <2> mpx_read_data: Total Kbytes transferred 141796352
11:48:13.104 [13456] <4> report_throughput: VBRT 1 13456 0 1 HP.ULTRIUM4-SCSI.002 UN0125 0 1 0 136298496  136298496 (mpxrestore.c.1715)
11:48:13.104 [13456] <2> mpx_read_data: Total Kbytes transferred 278094848
12:08:13.130 [13456] <4> report_throughput: VBRT 1 13456 0 1 HP.ULTRIUM4-SCSI.002 UN0125 0 1 0 129108992  129108992 (mpxrestore.c.1715)
12:08:13.130 [13456] <2> mpx_read_data: Total Kbytes transferred 407203840
12:16:28.434 [13460] <2> filter_image: [13456] Min_records before frag switch is 90000, locate available = 0
12:24:37.873 [13460] <2> get_next_file: [13456] received string: (EXIT STATUS 0)
12:24:37.921 [13456] <2> Media_dispatch_signal: calling child_wait (mpxrestore.c:2881) delay 0 seconds
12:24:37.921 [13456] <2> child_wait: SIGCHLD: exit=24, signo=0 core=no, pid=13460
 
 
We see the last line is child_wait: SIGCHLD: exit=24
 
There is a child bptm process involved PID [13460], this is what returns the status 24.

 

11:06:13.219 [13460] <2> read_config_file: using 262144 value from /usr/openv/netbackup/NET_BUFFER_SZ
11:06:13.219 [13460] <2> io_set_sendbuf: setting send network buffer to 262144 bytes
11:06:13.220 [13460] <2> vnet_pbxConnect: pbxConnectEx Succeeded
11:06:13.227 [13460] <2> do_pbx_service: ../../libvlibs/vnet_connect.c.1776: 0: via PBX: VNETD CONNECT FROM 172.30.27.1.54741 TO 172.30.27.1.1556 fd = 8
11:06:13.227 [13460] <2> vnet_vnetd_daemon_socket: ../../libvlibs/vnet_vnetd.c.383: 0: VN_REQUEST_DAEMON_SOCKET: 2 0x00000002
11:06:13.227 [13460] <2> vnet_vnetd_daemon_socket: ../../libvlibs/vnet_vnetd.c.397: 0: ipc_string: /tmp/vnet-13376333604170772171000000188-kmaaiA
11:06:13.295 [13460] <2> connect_bprd_socket: [13456] data socket to bprd is connected port = 0,  fd = 8, DATAPORT=0, duping to stdin
12:16:28.434 [13460] <2> filter_image: [13456] Min_records before frag switch is 90000, locate available = 0
12:24:37.344 [13460] <2> file_to_addrinfo: ../../libvlibs/vnet_addrinfo.c.6669: 0: fopen() failed: 2 0x00000002
12:24:37.344 [13460] <2> file_to_addrinfo: ../../libvlibs/vnet_addrinfo.c.6670: 0: fopen() failed: /usr/openv/var/host_cache/1f7/260a25f7+bpjobd,1,4,2,2,0+gnnbkp01.txt
12:24:37.349 [13460] <2> vnet_pbxConnect: pbxConnectEx Succeeded
12:24:37.349 [13460] <2> job_connect: SO_KEEPALIVE set on socket 2 for client gnnbkp01
12:24:37.349 [13460] <2> logconnections: BPJOBD CONNECT FROM 172.30.27.1.61865 TO 172.30.27.1.1556 fd = 2
12:24:37.349 [13460] <2> job_authenticate_connection: ignoring VxSS authentication check for now...
12:24:37.353 [13460] <2> job_connect: Connected to the host gnnbkp01 contype 10 jobid <2014589> socket <2>
12:24:37.353 [13460] <2> job_connect: Connected on port 61865
12:24:37.421 [13460] <2> job_monitoring_exex: ACK disconnect
12:24:37.422 [13460] <2> job_disconnect: Disconnected
12:24:37.422 [13460] <16> write_to_out: cannot write data to socket, Broken pipe
 
Think we need the bpcd log ...

Omar_Villa
Level 6
Employee

Have you check if the iptables (firewall) service is on? most of the socket errors I see in linux are because iptables are filtering big blocks by default, the rest of your applications can work perfectly but because netbackup needs big blocks is the one that most of the time fails.

try: service iptables status (if is on)

try: service iptables stop (run a backup)

 

Regards.

RajkumarBaniwal
Level 4
Certified

Firewall is blocked. but we are getting same error

Regards,
Rajkumar Baniwal

Marianne
Level 6
Partner    VIP    Accredited Certified

6 April:
Any firewall between master and media server or between media server and client?   No
12 April:
Firewall is blocked.

Nothing is going to work with blocked firewall.

mph999
Level 6
Employee Accredited

 

Gone through the logs again, doesn't really tell us anything to be honest, but since I spent time doing it, I figured I would post it up.  Perhaps this will help show one way of going through logs, and a method of finding where the problem is.  In this part, there are x4 PIDs inivolved across two logs (bpbrm and bptm) - as these two processes are on the same machine, it's easy to justy look at the time - which every processes shows the status 24 first, it the one having (but not necessarily causing) the issue.
 
 
There are two bptm and two bpbrm processes - parent/ child
 
bptm
Parent PID [13456] 
Child PID  [13460] 
 
bpbrm
 
Parent PID [13451]
Child PID  [13463]
 
Looking in the logs to see where the status 24 occurs first (is it bptm of bpbrm)
 
From bprd
 
12:24:38.144 [29918] <2> read_bpbrm_msg: (pid 13451) EXIT gnnmed25_1333259894 24
 
 
From bpbrm PID 13463 - think we can discount this - the 24 appears in bpbrm PID 13451
 
11:06:13.218 [13463] <2> bpbrm handle_restore: forking tar
11:06:13.218 [13463] <2> bpbrm handle_restore: restore command = /usr/openv/netbackup/bin/tar tar -x -v -Y -p -P -I 1333592417 -U 0 -E /usr/openv/netbackup/.rename.13463 -j -Q -J clnt_lc_messages=C -J clnt_lc_time=C -J clnt_lc_ctype=C -J clnt_lc_collate=C -J clnt_lc_numeric=C -J restoreid=2014589.001 -J job_total=1 -J client=gnnmed25 -J requesting_client=gnnbkp01 -J browse_client=gnnmed25 -J backup_time=1333259894 -L /tmp/logmedERI -f -
11:06:13.218 [13463] <2> bpbrm handle_restore: received bpcd success message
11:06:13.301 [13463] <2> bpbrm handle_restore: read tar start message from gnnmed25
11:08:13.196 [13463] <2> bpbrm mm_sig: received ready signal from media manager
12:24:37.986 [13463] <2> bpbrm mm_sig: received not ready signal from media manager
12:24:38.037 [13463] <2> bpbrm check_for_terminate: process killed by signal 1
12:24:38.049 [13463] <2> BpConfigHolder::~BpConfigHolder: +++ ENTERING +++ : obj = 1005bed78 (../BpConf.cpp:40)
12:24:38.049 [13463] <2> BpConfigHolder::free: +++ ENTERING +++ : obj = 1005bed78 (../BpConf.cpp:60)
12:24:38.049 [13463] <2> BpConfigHolder::free: --- EXITING --- : obj = 1005bed78 (../BpConf.cpp:60)
12:24:38.049 [13463] <2> BpConfigHolder::~BpConfigHolder: --- EXITING --- : obj = 1005bed78 (../BpConf.cpp:40)
12:24:38.051 [13463] <2> NBIORInterceptor::destroy: Destroying Interceptor: EMMlib_Orb_EMMlib_Orb(../NBIORInterceptor.cpp:60)
12:24:38.053 [13463] <2> InsecureEndpointSelectorFactory::~InsecureEndpointSelectorFactory: +++ ENTERING +++ : obj = 1005d7970 (../Endpoint_Selector.cpp:1864)
12:24:38.053 [13463] <2> InsecureEndpointSelectorFactory::~InsecureEndpointSelectorFactory: --- EXITING --- : obj = 1005d7970 (../Endpoint_Selector.cpp:1864)
12:24:38.053 [13463] <2> EndpointSelectorFactory::~EndpointSelectorFactory: +++ ENTERING +++ : obj = 1005d7870 (../Endpoint_Selector.cpp:1769)
12:24:38.053 [13463] <2> EndpointSelectorFactory::~EndpointSelectorFactory: --- EXITING --- : obj = 1005d7870 (../Endpoint_Selector.cpp:1769)
 
 
From bpbrm PID 13451
 
12:23:28.005 [13451] <2> bpbrm send_parent_msg: KEEP_ALIVE 15
12:23:28.055 [13451] <2> bpbrm read_parent_msg: read from parent ACK_KEEP_ALIVE
12:24:37.986 [13451] <2> bpbrm read_media_msg: read from media manager: MEDIA NOT READY
12:24:37.986 [13451] <2> bpbrm signal_bpbrm_child: sending SIGUSR1 to bpbrm child 13463
* 12:24:38.037 [13451] <2> bpbrm read_media_msg: read from media manager: EXIT gnnmed25_1333259894 24
12:24:38.037 [13451] <2> bpbrm process_media_msg: media manager for backup id gnnmed25_1333259894 exited with status 24: socket write failed
12:24:38.037 [13451] <2> bpbrm kill_bpbrm_child: terminating bpbrm child 13463 jobid=2014589
12:24:38.071 [13451] <2> bpbrm brm_sigcld: SIGCLD caught by bpbrm
12:24:38.071 [13451] <2> bpbrm brm_sigcld: bpbrm child 13463 exit_status = 150, signal_status = 0
12:24:38.071 [13451] <2> bpbrm brm_sigcld: child 13463 exited with status 150: termination requested by administrator
12:24:38.071 [13451] <2> bpbrm send_status_to_parent: EXIT gnnmed25_1333259894 24 sent to parent process.
12:24:38.171 [13451] <2> bpbrm read_parent_msg: read from parent TERMINATE
12:24:38.171 [13451] <2> bpbrm tell_mm: sending media manager msg: TERMINATE
 
 
From bptm PID 13460
 
* 12:24:37.422 [13460] <16> write_to_out: cannot write data to socket, Broken pipe
 
 
From bptm PID 13456
 
11:08:12.148 [13456] <4> mpx_issue_begin_msg: begin reading backup id gnnmed25_1333259894 (restore), copy 1, fragment 2, from media id UN0125 on drive HP.ULTRIUM4-SCSI.002 (index 4)
11:08:12.148 [13456] <2> send_brm_msg: CURRENT POSITION UN0125 3
11:08:12.148 [13456] <2> process_brm_msg: no pending message from bpbrm
11:08:13.148 [13456] <2> read_brm_msg: CONTINUE RESTORE
11:08:13.148 [13456] <2> send_brm_msg: MEDIA READY
11:08:13.148 [13456] <2> mpx_read_data: begin reading data from media id UN0125, file num 3
11:08:13.148 [13456] <2> mpx_read_data: Total Kbytes transferred 0
11:08:13.153 [13456] <2> read_position: read position on drive index 4, is LBA 429134
11:28:13.097 [13456] <4> report_throughput: VBRT 1 13456 0 1 HP.ULTRIUM4-SCSI.002 UN0125 0 1 0 141796352  141796352 (mpxrestore.c.1715)
11:28:13.097 [13456] <2> mpx_read_data: Total Kbytes transferred 141796352
11:48:13.104 [13456] <4> report_throughput: VBRT 1 13456 0 1 HP.ULTRIUM4-SCSI.002 UN0125 0 1 0 136298496  136298496 (mpxrestore.c.1715)
11:48:13.104 [13456] <2> mpx_read_data: Total Kbytes transferred 278094848
12:08:13.130 [13456] <4> report_throughput: VBRT 1 13456 0 1 HP.ULTRIUM4-SCSI.002 UN0125 0 1 0 129108992  129108992 (mpxrestore.c.1715)
12:08:13.130 [13456] <2> mpx_read_data: Total Kbytes transferred 407203840
12:16:28.434 [13460] <2> filter_image: [13456] Min_records before frag switch is 90000, locate available = 0
12:24:37.873 [13460] <2> get_next_file: [13456] received string: (EXIT STATUS 0)
12:24:37.921 [13456] <2> Media_dispatch_signal: calling child_wait (mpxrestore.c:2881) delay 0 seconds
* 12:24:37.921 [13456] <2> child_wait: SIGCHLD: exit=24, signo=0 core=no, pid=13460
12:24:37.921 [13456] <2> notify: executing - /usr/openv/netbackup/bin/restore_notify bptm gnnmed25_1333259894 restore
12:24:37.950 [13456] <4> report_throughput: VBRT 1 13456 0 1 HP.ULTRIUM4-SCSI.002 UN0125 0 1 0 54067200  54067200 (mpxrestore.c.2996)
12:24:37.950 [13456] <2> send_brm_msg: MEDIA NOT READY
12:24:37.950 [13456] <2> send_brm_msg: EXIT gnnmed25_1333259894 24
 
 
So it looks like bptm [13460] gets the status 24 ...  [13460] is the child bptm process that connects to the client.
 
 
Looking at this log then ...
 
12:16:28.434 [13460] <2> filter_image: [13456] Min_records before frag switch is 90000, locate available = 0
12:24:37.344 [13460] <2> file_to_addrinfo: ../../libvlibs/vnet_addrinfo.c.6669: 0: fopen() failed: 2 0x00000002
12:24:37.344 [13460] <2> file_to_addrinfo: ../../libvlibs/vnet_addrinfo.c.6670: 0: fopen() failed: /usr/openv/var/host_cache/1f7/260a25f7+bpjobd,1,4,2,2,0+gnnbkp01.txt
12:24:37.349 [13460] <2> vnet_pbxConnect: pbxConnectEx Succeeded
12:24:37.349 [13460] <2> job_connect: SO_KEEPALIVE set on socket 2 for client gnnbkp01
12:24:37.349 [13460] <2> logconnections: BPJOBD CONNECT FROM 172.30.27.1.61865 TO 172.30.27.1.1556 fd = 2
12:24:37.349 [13460] <2> job_authenticate_connection: ignoring VxSS authentication check for now...
12:24:37.353 [13460] <2> job_connect: Connected to the host gnnbkp01 contype 10 jobid <2014589> socket <2>
12:24:37.353 [13460] <2> job_connect: Connected on port 61865
12:24:37.421 [13460] <2> job_monitoring_exex: ACK disconnect
12:24:37.422 [13460] <2> job_disconnect: Disconnected
12:24:37.422 [13460] <16> write_to_out: cannot write data to socket, Broken pipe
 
 
... 
 
So it seems that bptm child process cannot write to the client.
 

RajkumarBaniwal
Level 4
Certified

It is resolved after changing the same client as SAN media server and restoring through the same media server

Regards,
Rajkumar Baniwal