cancel
Showing results forΒ 
Search instead forΒ 
Did you mean:Β 

Shadow Copy Components backup failing error code 13

Dave__G
Level 3

Hi

We have a Windows 2008 R2 production server which keeps failing its system state backup, specifically Shadow Copy Components.

(Master server v 7.7.1, 2 x media servers one of which is a Unix netbackup device running 7.6.0.1 - failed job is on this server)

Detailed status:-

27-Jun-2018 10:12:57 - Info nbjm (pid=3328) starting backup job (jobid=1441795) for client stas56, policy All_Drives_Sys_State, schedule Daily_Weekday
27-Jun-2018 10:12:58 - estimated 22992926 kbytes needed
27-Jun-2018 10:12:58 - Info nbjm (pid=3328) started backup (backupid=stas56_1530090777) job for client stas56, policy All_Drives_Sys_State, schedule Daily_Weekday on storage unit stu_disk_stas170
27-Jun-2018 10:12:59 - started process bpbrm (pid=133962)
27-Jun-2018 10:13:00 - Info bpbrm (pid=133962) stas56 is the host to backup data from
27-Jun-2018 10:13:00 - Info bpbrm (pid=133962) reading file list for client
27-Jun-2018 10:13:01 - Info bpbrm (pid=133962) accelerator enabled
27-Jun-2018 10:13:36 - connecting
27-Jun-2018 10:13:38 - Info bpbrm (pid=133962) starting bpbkar on client
27-Jun-2018 10:13:38 - connected; connect time: 0:00:00
27-Jun-2018 10:13:39 - Info bpbkar (pid=5664) Backup started
27-Jun-2018 10:13:39 - Info bpbrm (pid=133962) bptm pid: 133972
27-Jun-2018 10:13:39 - Info bpbkar (pid=5664) change time comparison:<enabled>
27-Jun-2018 10:13:39 - Info bpbkar (pid=5664) accelerator enabled backup, archive bit processing:<disabled>
27-Jun-2018 10:13:39 - Info bptm (pid=133972) start
27-Jun-2018 10:13:39 - Info bptm (pid=133972) using 262144 data buffer size
27-Jun-2018 10:13:39 - Info bptm (pid=133972) using 30 data buffers
27-Jun-2018 10:13:41 - Info bptm (pid=133972) start backup
27-Jun-2018 10:13:43 - Info bpbkar (pid=5664) not using change journal data for <System State:\>: not supported for non-local volumes / file systems
27-Jun-2018 10:13:43 - begin writing
27-Jun-2018 10:13:43 - Info bptm (pid=133972) backup child process is pid 133989
27-Jun-2018 10:13:43 - Info bpbkar (pid=5664) not using change journal data for <_BACKUP_SPECIAL_OBJECTS AFTER System State:>: not supported for non-local volumes / file systems
27-Jun-2018 11:26:26 - Info bpbkar (pid=5664) 5000 entries sent to bpdbm
27-Jun-2018 11:27:31 - Info bpbkar (pid=5664) 10000 entries sent to bpdbm
27-Jun-2018 11:28:32 - Info bpbkar (pid=5664) 15000 entries sent to bpdbm
27-Jun-2018 11:30:37 - Info bpbkar (pid=5664) accelerator sent 20420571136 bytes out of 20420571136 bytes to server, optimization 0.0%
27-Jun-2018 11:35:38 - Error bpbrm (pid=133962) socket read failed: errno = 62 - Timer expired
27-Jun-2018 11:35:40 - Error bptm (pid=133972) media manager terminated by parent process
27-Jun-2018 11:36:17 - Info stas170 (pid=133972) StorageServer=PureDisk:stas170; Report=PDDO Stats for (stas170): scanned: 19945355 KB, CR sent: 82093 KB, CR sent over FC: 0 KB, dedup: 99.6%, cache disabled
27-Jun-2018 11:37:17 - Info bpbkar (pid=5664) done. status: 13: file read failed
27-Jun-2018 11:37:17 - end writing; write time: 1:23:34
file read failed (13)

Taken from bpbkar log:-

10:39:29.427 [8352.6456] <4> dos_backup::tfs_scannext: INF - no more path list entries
10:39:29.427 [8352.6456] <4> tar_backup::backup_endarg_state: INF - finish error count: 0
10:39:29.427 [8352.6456] <4> tar_backup::backup_endarg_state: INF - finish fatal count: 0
10:39:29.443 [8352.6456] <2> tar_base::backup_finish: TAR - backup: 10157 files
10:39:29.443 [8352.6456] <2> tar_base::backup_finish: TAR - backup: file data: 994518369 bytes 12 gigabytes
10:39:29.443 [8352.6456] <2> tar_base::backup_finish: TAR - backup: image data: 1037027328 bytes 12 gigabytes
10:39:29.443 [8352.6456] <2> tar_base::backup_finish: TAR - backup: elapsed time: 1517 secs 9177277 bps
10:39:29.443 [8352.6456] <4> tar_backup::backup_done_state: INF - number of file directives not found: 0
10:39:29.443 [8352.6456] <4> tar_backup::backup_done_state: INF - number of file directives found: 1
10:39:29.443 [8352.6456] <4> tar_backup::backup_done_state: INF - number of files sent to server: 10157 out of 10157
10:39:29.443 [8352.6456] <4> tar_backup::backup_done_state: INF - image size: 13921929216 bytes
10:39:29.443 [8352.6456] <4> tar_backup::backup_done_state: INF - adjustment size: -13710906880 bytes
10:39:29.443 [8352.6456] <4> tar_backup::backup_done_state: INF - stream size: 211022336 bytes
10:39:29.443 [8352.6456] <2> tar_base::V_vTarMsgW: JBD - accelerator sent 211022336 bytes out of 13921929216 bytes to server, optimization 98.5%
10:39:29.443 [8352.8552] <4> tar_base::keepaliveThread: INF - keepalive thread terminating (reason: WAIT_OBJECT_0)
10:39:29.443 [8352.6456] <4> tar_base::stopKeepaliveThread: INF - keepalive thread has exited. (reason: WAIT_OBJECT_0)
10:39:29.443 [8352.6456] <2> tar_base::V_vTarMsgW: INF - EXIT STATUS 0: the requested operation was successfully completed
10:39:43.639 [8352.6456] <4> dos_backup::fscp_fini: INF - backup status:<0>
10:39:43.639 [8352.6456] <4> dos_backup::fscp_fini: INF - backup succeeded, renaming current track log
10:39:43.654 [8352.6456] <4> dos_backup::_change_journal_finalize: INF - finalizing change journal usage for file directive:<C:\> with backup status:<0>
10:39:43.654 [8352.6456] <4> dos_backup::tfs_reset: INF - Snapshot deletion start
10:39:43.654 [8352.6456] <4> OVStopCmd: INF - EXIT - status = 0
10:39:43.654 [8352.6456] <2> tar_base::V_Close: closing...
10:39:43.654 [8352.6456] <4> dos_backup::tfs_reset: INF - Snapshot deletion start
10:39:43.779 [8352.6456] <2> ov_log::V_GlobalLog: INF - BEDS_Term(): enter - InitFlags:0x00000101
10:39:43.779 [8352.6456] <2> ov_log::V_GlobalLog: INF - BEDS_Term(): ubs specifics: 0x001d0000
10:39:43.919 [8352.6456] <4> OVShutdown: INF - Finished process
10:39:43.919 [8352.6456] <4> WinMain: INF - Exiting C:\Program Files\Veritas\NetBackup\bin\bpbkar32.exe
11:25:02.594 [5664.8644] <4> dos_backup::tfs_findfirst: INF - unable to find first file: 'System State:\Background Intelligent Transfer Service\'
11:25:04.622 [5664.8644] <4> dos_backup::fscp_change_detection(): INF - file changed: modify time changed for <System State:\Windows Management Instrumentation>, previous:<1529740205>, current:<1530095104>
11:25:09.364 [5664.8644] <4> dos_backup::tfs_scannext: INF - no more path list entries
11:25:09.364 [5664.8644] <4> tar_backup::backup_endarg_state: INF - finish error count: 0
11:25:09.364 [5664.8644] <4> tar_backup::backup_endarg_state: INF - finish fatal count: 0
11:25:09.364 [5664.8644] <4> tar_backup_tfi::backup_arg_state: INF - remember start error count: 0
11:25:09.364 [5664.8644] <4> tar_backup_tfi::backup_arg_state: INF - remember start fatal count: 0
11:25:09.364 [5664.8644] <4> dos_backup::tfs_scanstart: INF - Backup Special Objects request AFTER System State:
11:25:09.364 [5664.8644] <4> dos_backup::tfs_scanstart: INF - starting scan for file directive:<_BACKUP_SPECIAL_OBJECTS AFTER System State:>
11:25:09.364 [5664.8644] <4> dos_backup::tfs_scanstart: INF - starting scan using name:<System State:\>
11:25:09.364 [5664.8644] <4> dos_backup::tfs_scanstart: INF - in 'backup' mode
11:25:09.364 [5664.8644] <4> dos_backup::fscp_init(): calling ct_cat_open() : previous backup ids:<stas56_1528513322,stas56_1530075231,stas56_1529975625,stas56_1529918754,stas56_1529890216,stas56_1529822041,stas56_1529736134,stas56_1529631665,stas56_1529546208,stas56_1529459184,stas56_1529372819,stas56_1529285258,stas56_1529204677,stas56_1529140537,stas56_1529028992,stas56_1528948148,stas56_1528856995,stas56_1528780407,stas56_1528688665,stas56_1528624698,stas56_1528513222,stas56_1529736083,stas56_1529631215,stas56_1529543053,stas56_1529456617,stas56_1529370175,stas56_1529283782,stas56_1529204624,stas56_1529137409,stas56_1529028721,stas56_1528947996,stas56_1528856935,stas56_1528780507,stas56_1528688312,stas56_1528624732>, backup id:<stas56_1530090777>, track journal name:<_BACKUP_SPECIAL_OBJECTS AFTER System State:>, schedule type:<INCR>
11:25:09.395 [5664.8644] <4> dos_backup::tfs_getinfo: INF - current object (System State:) is a parent of previous (object (System State:\COM+ Class Registration Database\COM+ REGDB)
11:25:09.395 [5664.8644] <4> dos_backup::fscp_change_detection(): INF - file changed: modify time changed for <System State:\_SharedHardlinkData_>, previous:<1529740208>, current:<1530095109>
11:30:36.044 [5664.8644] <2> ct_cat_find_last_entry_by_name: close previous track journal as read next entry failed
11:30:37.651 [5664.8644] <4> dos_backup::tfs_scannext: INF - no more path list entries
11:30:37.651 [5664.8644] <4> tar_backup::backup_endarg_state: INF - finish error count: 0
11:30:37.651 [5664.8644] <4> tar_backup::backup_endarg_state: INF - finish fatal count: 0
11:30:37.651 [5664.8644] <2> tar_base::backup_finish: TAR - backup: 18561 files
11:30:37.651 [5664.8644] <2> tar_base::backup_finish: TAR - backup: file data: 32176571 bytes 15 gigabytes
11:30:37.651 [5664.8644] <2> tar_base::backup_finish: TAR - backup: image data: 19476480 bytes 19 gigabytes
11:30:37.651 [5664.8644] <2> tar_base::backup_finish: TAR - backup: elapsed time: 3714 secs 5498269 bps
11:30:37.651 [5664.8644] <4> ov_log::OVLoop: Timestamp
11:30:37.651 [5664.8644] <4> tar_backup::backup_done_state: INF - number of file directives not found: 0
11:30:37.651 [5664.8644] <4> tar_backup::backup_done_state: INF - number of file directives found: 4
11:30:37.651 [5664.8644] <4> tar_backup::backup_done_state: INF - number of files sent to server: 18561 out of 18561
11:30:37.651 [5664.8644] <4> tar_backup::backup_done_state: INF - image size: 20420571136 bytes
11:30:37.651 [5664.8644] <4> tar_backup::backup_done_state: INF - adjustment size: 0 bytes
11:30:37.651 [5664.8644] <4> tar_backup::backup_done_state: INF - stream size: 20420571136 bytes
11:30:37.651 [5664.8644] <2> tar_base::V_vTarMsgW: JBD - accelerator sent 20420571136 bytes out of 20420571136 bytes to server, optimization 0.0%
11:30:37.651 [5664.2104] <4> tar_base::keepaliveThread: INF - keepalive thread terminating (reason: WAIT_OBJECT_0)
11:30:37.651 [5664.8644] <4> tar_base::stopKeepaliveThread: INF - keepalive thread has exited. (reason: WAIT_OBJECT_0)
11:37:18.321 [5664.8644] <2> tar_base::V_vTarMsgW: INF - EXIT STATUS 0: the requested operation was successfully completed
11:37:18.337 [5664.8644] <4> dos_backup::fscp_fini: INF - backup status:<6>
11:37:18.337 [5664.8644] <4> dos_backup::fscp_fini: INF - backup failed, deleting current track log
11:37:18.337 [5664.8644] <4> dos_backup::tfs_reset: INF - Snapshot deletion start
11:37:18.337 [5664.8644] <4> OVStopCmd: INF - EXIT - status = 0
11:37:18.337 [5664.8644] <2> tar_base::V_Close: closing...
11:37:18.337 [5664.8644] <4> dos_backup::tfs_reset: INF - Snapshot deletion start
11:37:18.337 [5664.8644] <2> ov_log::V_GlobalLog: INF - BEDS_Term(): enter - InitFlags:0x00000101
11:37:18.337 [5664.8644] <2> ov_log::V_GlobalLog: INF - BEDS_Term(): ubs specifics: 0x001d0000
11:37:18.352 [5664.8644] <16> dtcp_read: TCP - failure: recv socket (680) (TCP 10053: Software caused connection abort)
11:37:18.352 [5664.8644] <16> dtcp_read: TCP - failure: recv socket (640) (TCP 10053: Software caused connection abort)
11:37:18.352 [5664.8644] <4> OVShutdown: INF - Finished process
11:37:18.352 [5664.8644] <4> WinMain: INF - Exiting C:\Program Files\Veritas\NetBackup\bin\bpbkar32.exe

 

The TCP error above lead me to some support pages that suggested that the CLIENT_READ_TIMEOUT and CLIENT_CONNECT_TIMEOUT may need increasing.

These values on the master server are 500 s and 3600 s respectively. I increased the value for CLIENT_READ_TIMEOUT on the client from300 s to 3600 s , job fails with same error.

Server has 10Gb card, Client 1 GB card, I've tested bandwidth using iperf at 1000MB/s.

All netbackup logs are enabled and available on request.

Any pointers as to how I can resolve this issue? Many thanks 

 

 

1 REPLY 1

Lowell_Palecek
Level 6
Employee

What version is NetBackup on the Windows client?

When you increased the client read timeout, did the time gap just increase? It looks like a handshake mis-communication between bpbkar32 on the client and bpbrm on the media server.

Here bpbkar32 reported that it was done sending data. Then there's nothing until the timout:

27-Jun-2018 11:30:37 - Info bpbkar (pid=5664) accelerator sent 20420571136 bytes out of 20420571136 bytes to server, optimization 0.0%
27-Jun-2018 11:35:38 - Error bpbrm (pid=133962) socket read failed: errno = 62 - Timer expired

In your bpbkar log, bpbkar32 stopped the keepalive thread at 11:30, hence the timeout at 10:35 in bpbrm. Then you have nothing in the log until 10:37. This makes it seem that bpbkar32 was waiting for some sort of ack from bpbrm or bptm, and interpreted the timeout as the ack:

11:30:37.651 [5664.8644] <2> tar_base::V_vTarMsgW: JBD - accelerator sent 20420571136 bytes out of 20420571136 bytes to server, optimization 0.0%
11:30:37.651 [5664.2104] <4> tar_base::keepaliveThread: INF - keepalive thread terminating (reason: WAIT_OBJECT_0)
11:30:37.651 [5664.8644] <4> tar_base::stopKeepaliveThread: INF - keepalive thread has exited. (reason: WAIT_OBJECT_0)
11:37:18.321 [5664.8644] <2> tar_base::V_vTarMsgW: INF - EXIT STATUS 0: the requested operation was successfully completed

My guess: your media server version (bprbm, bptm) is older than your client version (bpbkar32). You can't do that.