Hello (My 1e post : )
Since last week we have a strange issue with a failed small (<40GB) solaris backup.
Config envr : Media+client: same Solaris sparc host (sol10)
Master server : windows 2008
Storage node : Sepaton based hardware
Reference hour : 12:00:32
BPTM Messages ( I think at the end of the job ... after sending +- 80% of thr data to the Sepaton Storage node )
(bptm sending last data to sepaton ....)
11:39:58.056 [19855] <2> 6590726:bptm:19855:uxvenusc052: 1457347198.56163 :: SEPOST: stspi_write_image :: 1057 :: lengt h: 262144, offset: 3670528
11:39:58.057 [19855] <2> 6590726:bptm:19855:uxvenusc052: 1457347198.57526 :: SEPOST: stspi_write_image :: 1091 :: return status: 0
11:44:31.895 [8821] <2> SetMaxDataLimit: maximum data size: current=-3 max=-3
11:44:31.896 [8821] <2> initialize: fd values STDOUTSOCK=4 STDERRSOCK=5
11:44:31.901 [8821] <2> bptm: INITIATING (VERBOSE = 5): -rptdrv -jobid -1450998735 -jm
11:44:31.901 [8821] <2> bptm: PORT_STATUS = 0x00000000
11:44:31.903 [8821] <2> main: Sending [EXIT STATUS 0] to NBJM
11:44:31.903 [8821] <2> bptm: EXITING with status 0 <----------
11:54:31.834 [17702] <2> SetMaxDataLimit: maximum data size: current=-3 max=-3
11:54:31.834 [17702] <2> initialize: fd values STDOUTSOCK=4 STDERRSOCK=5
11:54:31.839 [17702] <2> bptm: INITIATING (VERBOSE = 5): -rptdrv -jobid -1450998787 -jm
11:54:31.840 [17702] <2> bptm: PORT_STATUS = 0x00000000
11:54:31.840 [17702] <2> main: Sending [EXIT STATUS 0] to NBJM
11:54:31.841 [17702] <2> bptm: EXITING with status 0 <----------
(!!!! then, for every FULL BCK , Alwas the same error and this since last week )
12:00:32.705 [19855] <2> get_exactly_n_bytes_or_eof_abs: read from socket failed: Connection timed out (145)
12:00:32.706 [19855] <2> set_job_details: Tfile (6590726): LOG 1457348432 16 bptm 19855 system call failed - Connection timed out (at bptm.c.27404)
12:00:32.706 [19855] <2> send_job_file: job ID 6590726, ftype = 3 msg len = 89, msg = LOG 1457348432 16 bptm 19855 syst em call failed - Connection timed out (at bptm.c.27404)
12:00:32.706 [19855] <2> ConnectionCache::connectAndCache: Acquiring new connection for host hictbrumzbu010, query type 1
12:00:32.710 [19855] <2> vnet_pbxConnect: pbxConnectEx Succeeded
12:00:32.710 [19855] <2> logconnections: BPDBM CONNECT FROM 10.25.13.2.49269 TO 10.35.10.38.1556 fd = 11
12:00:32.767 [19855] <2> db_end: Need to collect reply
12:00:32.783 [19855] <16> write_data_tir: system call failed - Connection timed out (at bptm.c.27404)
12:00:32.783 [19855] <2> 6590726:bptm:19855:uxvenusc052: 1457348432.783616 :: SEPOST: stspi_get_image_prop_v10 :: 358 : : image_name: uxvenusc052_1457345992_C1_TIR, server_name: 10.15.10.167
BPRM Messages
11:37:38.742 [19839] <2> bpbrm wait_for_child: start
12:00:45.877 [19839] <2> bpbrm wait_for_child: child exit_status = 23 signal_status = 0
12:00:45.877 [19839] <2> bpbrm kill_child_process: start
12:00:45.877 [19839] <2> bpbrm Exit: attempting to send mail to root on uxvenusc052
Bpbkar Messages (nothing special)
...
11:37:38.439 [19851] <2> bpbkar delete_old_files_recur: INF - checking files in directory /usr/openv/netbackup/logs/user_ops/root/jobs for prefix = jbp and older than 3 days
11:37:38.439 [19851] <4> bpbkar Exit: INF - bpbkar exit normal
11:37:38.439 [19851] <4> bpbkar Exit: INF - EXIT STATUS 0: the requested operation was successfully completed
11:37:38.439 [19851] <4> bpbkar Exit: INF - setenv FINISHED=
the veritas troubelshooting course describe the netbck process and data the flow as next:
-a-bpkar sends data to bptm child process . bptm stores this in SHARED MEMORY segments
because client and media is the same bpkar send tis direct to the Sharem Mem segments
-b-bptm direct the shared memory segment to the allocated storage media ( speaton block by block)
-c-bptm connects to bpdmb processes on the master server and update an image header in the image dba. This for each fragment.
-d-bpvkar send bck metadata to bpbrm after the data is send to bptm
-e- and finally bpbrm sends meta data to bpdbm ( master ) -> image catalog update
My problem is to "pin point" the problem.....
or an bptm issue with the sepaton
or an bptm shared memory segments issue ( see above -C-) !!!!!
(bpbkar has already a 'FINISHED' state , bprm is waiting for bptm input and bptm is sending data to the sepaton and
metadata to bpdm)
or .. something else
and what is the meaning of "get_exactly_n_bytes_or_eof_abs: read from socket failed: Connection timed out (145)"
" normally this is linked to a network issue "
Also the ouptut of vxlogview on the master servers was not helping me (not one reference message)
( used cmd vxlogview.cmd -d all -X "jobid=6590726" )
And ..... INCREMENTAL BACKUPS are working fine
------ K. Regards