cancel
Showing results for 
Search instead for 
Did you mean: 

Backups were failing with 84 & 87 error intermittently - Destination Storage is Pure Disk Pool

hariharan24
Level 4
Certified

Backups were failing with 84 & 87 error intermittently - Destination Storage is Pure Disk Pool 

Again the same backup is getting suuccessful in the next successive using different job ID

Error Code: 84

***************************************************************************************************************************************
12/25/2014 00:10:07 - Info bpbrm (pid=13539) INF - Client read timeout = 1800
12/25/2014 00:10:07 - connecting
12/25/2014 00:10:08 - Info bpbrm (pid=13539) accepted connection from client
12/25/2014 00:10:08 - Info dbclient (pid=4951) Backup started
12/25/2014 00:10:08 - Info bpbrm (pid=13539) bptm pid: 13540
12/25/2014 00:10:08 - Info bptm (pid=13540) start
12/25/2014 00:10:08 - Info bptm (pid=13540) using 524288 data buffer size
12/25/2014 00:10:08 - Info bptm (pid=13540) setting receive network buffer to 524288 bytes
12/25/2014 00:10:08 - Info bptm (pid=13540) using 256 data buffers
12/25/2014 00:10:08 - connected; connect time: 0:00:00
12/25/2014 00:10:10 - Info bptm (pid=13540) start backup
12/25/2014 00:12:17 - Info bptm (pid=13540) backup child process is pid 16293
12/25/2014 00:12:17 - begin writing
12/25/2014 00:12:18 - Info dbclient (pid=4951) dbclient(pid=4951) wrote first buffer(size=262144)
12/25/2014 00:13:02 - Critical bptm (pid=13540) Storage Server Error: (Storage server: PureDisk:) mtstrm_write_segment: Fatal error occured in Multi-Threaded Agent: Cr_ErrnoException in void mtstrm::server::Cr_Send::ThreadSend(Pd_Thread<mtstrm::server::Cr_Send>*): CRSegmentBatchStreamPutComp V-454-95
12/25/2014 00:13:02 - Critical bptm (pid=13540) image write failed: error 2060001: one or more invalid arguments
12/25/2014 00:13:04 - Critical bptm (pid=13540) Storage Server Error: (Storage server: PureDisk:) mtstrm_close_write_channel: Fatal error occured in Multi-Threaded Agent: Cr_ErrnoException in void mtstrm::server::Cr_Send::ThreadSend(Pd_Thread<mtstrm::server::Cr_Send>*): CRSegmentBatchStreamPutComp V-454-96
12/25/2014 00:13:04 - Critical bptm (pid=13540) sts_close_handle failed: 2060001 one or more invalid arguments
12/25/2014 00:13:16 - Error bptm (pid=13540) cannot write image to disk, Invalid argument
12/25/2014 00:13:16 - Info bptm (pid=13540) EXITING with status 84 <----------
12/25/2014 00:13:16 - Info dbclient (pid=4951) done. status: 6
12/25/2014 00:13:16 - Info (pid=13540) StorageServer=PureDisk:; Report=PDDO Stats (multi-threaded stream used) for (): scanned: 49667 KB, CR sent: 0 KB, CR sent over FC: 0 KB, dedup: 100.0%, cache disabled
12/25/2014 00:13:17 - Info dbclient (pid=4951) done. status: 84: media write error
12/25/2014 00:13:17 - end writing; write time: 0:01:00
media write error  (84)

***************************************************************************************************************************************

Error Code: 87

***************************************************************************************************************************************

12/25/2014 00:12:23 - Info bpbrm (pid=7499) INF - Client read timeout = 1800
12/25/2014 00:12:23 - Info bpbrm (pid=7499) accepted connection from client
12/25/2014 00:12:24 - Info dbclient (pid=736) Backup started
12/25/2014 00:12:24 - connected; connect time: 0:00:00
12/25/2014 00:12:24 - Info bpbrm (pid=7499) bptm pid: 7500
12/25/2014 00:12:24 - Info bptm (pid=7500) start
12/25/2014 00:12:24 - Info bptm (pid=7500) using 524288 data buffer size
12/25/2014 00:12:24 - Info bptm (pid=7500) using 256 data buffers
12/25/2014 00:12:26 - Info bptm (pid=7500) start backup
12/25/2014 00:17:02 - Info bptm (pid=7500) backup child process is pid 17130
12/25/2014 00:17:02 - begin writing
12/25/2014 00:17:03 - Info dbclient (pid=736) dbclient(pid=736) wrote first buffer(size=65536)
12/25/2014 00:17:04 - Info dbclient (pid=736) done. status: 0
12/25/2014 00:18:56 - Info bptm (pid=7500) waited for full buffer 1 times, delayed 71 times
12/25/2014 00:18:59 - Critical bptm (pid=7500) Storage Server Error: (Storage server: PureDisk:) mtstrm_close_write_channel: Fatal error occured in Multi-Threaded Agent: Close Write Channel command failed: Cr_ErrnoException:  dref model has not been initialized yet V-454-96
12/25/2014 00:19:00 - Critical bptm (pid=7500) sts_close_handle failed: 2060017 system call failed
12/25/2014 00:19:00 - Error bptm (pid=7500) cannot write image to disk, media close failed with status 2060017 
12/25/2014 00:19:05 - Info bptm (pid=7500) EXITING with status 87 <----------
12/25/2014 00:19:05 - Info (pid=7500) StorageServer=PureDisk:; Report=PDDO Stats (multi-threaded stream used) for (): scanned: 4835 KB, CR sent: 0 KB, CR sent over FC: 0 KB, dedup: 100.0%, cache disabled
12/25/2014 00:20:13 - Info dbclient (pid=736) done. status: 87: media close error
12/25/2014 00:20:13 - end writing; write time: 0:03:11
media close error  (87)

***************************************************************************************************************************************
Also attached the error part in the bptm log from media server

Client Read & Connect timeout for the respective Media servers - 1800 sec

IS it due to any DPS/Timeout polling time out issue or any other issue ?

 

1 ACCEPTED SOLUTION

Accepted Solutions

hariharan24
Level 4
Certified

Issue has been sorted out., seems the issue was with the host name resolution error.,

PD media servers were unable to resolve the clustered PD nodes., we got the below error from PD agent logs in all the PD media servers.,

December 25 08:51:52 ERR [139912246351616]: 31: CRRouteLookupFirst: internal error: found no route for 01a37b7be5065d3a712195475a90e628!
December 25 08:51:52 ERR [139912246351616]: 31: CRDCRefMode: Lookup route failed!
December 25 08:52:46 WARNING [139912246351616]: -1: addDo: The fingerprint cache limit of 20971520 has been reached for this session. DO 0a1daf473a7d43ad484760f47b10c4d7 will not be cached.
December 25 08:52:49 ERR [139912246351616]: 4: _CR_GwInitSuppressible: pd_getaddrinfo failed for host  using port 10082: Name or service not known
December 25 08:52:49 ERR [139912246351616]: 84: Failed to load route at line 4 in /usr/openv/pdde/pdag/var/rt/200_.recommended: could not initialize gateway: unable to resolve hostname
December 25 08:52:49 ERR [139912246351616]: 84: Router initialization failed: Failed to load route at line 4 in /usr/openv/pdde/pdag/var/rt/200_.recommended: could not initialize gateway: unable to resolve hostname

After adding host entries of all the PD nodes in all the PD media servers  - resolution error gets fixed and the backups are running fine.,

We are further checking with DNS team regarding the naming server issue.,

Thanks

 

View solution in original post

5 REPLIES 5

GulzarShaikhAUS
Level 6
Partner Accredited Certified

Hi,

Is it an appliance? What is the netbackup version?

hariharan24
Level 4
Certified

Its Pure disk clustered nodes (5+1) managed by VCS., its not Appliance.

NBU version on master & media servers - 7.6.0.2

Mujeeb-Arrosoft
Level 3
Partner Accredited
check the /usr/openv/netbackup/logs/bptm log folder.

examine the Application Event Log for NetBackup Errors.This combination of errors can be caused by a faulty SCSI card.  Replace the faulty SCSI card.

INT_RND
Level 6
Employee Accredited

Have you upgraded all the clients to match the patch level of the Master and media servers ?

All clients should also be running 7.6.0.2

hariharan24
Level 4
Certified

Issue has been sorted out., seems the issue was with the host name resolution error.,

PD media servers were unable to resolve the clustered PD nodes., we got the below error from PD agent logs in all the PD media servers.,

December 25 08:51:52 ERR [139912246351616]: 31: CRRouteLookupFirst: internal error: found no route for 01a37b7be5065d3a712195475a90e628!
December 25 08:51:52 ERR [139912246351616]: 31: CRDCRefMode: Lookup route failed!
December 25 08:52:46 WARNING [139912246351616]: -1: addDo: The fingerprint cache limit of 20971520 has been reached for this session. DO 0a1daf473a7d43ad484760f47b10c4d7 will not be cached.
December 25 08:52:49 ERR [139912246351616]: 4: _CR_GwInitSuppressible: pd_getaddrinfo failed for host  using port 10082: Name or service not known
December 25 08:52:49 ERR [139912246351616]: 84: Failed to load route at line 4 in /usr/openv/pdde/pdag/var/rt/200_.recommended: could not initialize gateway: unable to resolve hostname
December 25 08:52:49 ERR [139912246351616]: 84: Router initialization failed: Failed to load route at line 4 in /usr/openv/pdde/pdag/var/rt/200_.recommended: could not initialize gateway: unable to resolve hostname

After adding host entries of all the PD nodes in all the PD media servers  - resolution error gets fixed and the backups are running fine.,

We are further checking with DNS team regarding the naming server issue.,

Thanks