Highlighted

Intermittent error status 84, 83, 87 and 2074

Hi everyone,

We've been experiencing backups with our VMWare lately with intermittent status 84, 83, 87 and 2074. Started checking from the MSDP status but its always up and not going down. We've followed some workarounds namely:

  • Adding the TCP Keep Alive in registry (because we also encounted status 636 before) but has been eliminated already. 
  • Adding touch files DPS_PROXYDEFAULTSENDTMO, DPS_PROXYDEFAULTRECVTMO based on the technotes, and DPS_PROXYNOEXPIRE.
  • Increased client timeouts
  • Checked the logs and saw some errors like "connection reset by peer", "db_getClient=227"

Information:

  • NBU Version (All servers): 7.6.1.2
  • OS (All servers): Windows Server 2012 R2 Standard

Below is the detailed status of one of the jobs.

10/20/2015 2:12:18 PM - Info bpbrm(pid=4260) 132_147_160_082_ZMUNCLMSTST_PCPPI_COM is the host to backup data from     
10/20/2015 2:12:18 PM - Info bpbrm(pid=4260) reading file list for client        
10/20/2015 2:12:18 PM - Info bpbrm(pid=4260) accelerator enabled           
10/20/2015 2:12:19 PM - Info bpbrm(pid=4260) There is no complete backup image match with track journal, a regular full backup will be performed.
10/20/2015 2:12:19 PM - Info bpbrm(pid=4260) starting bpbkar32 on client         
10/20/2015 2:12:20 PM - Info bpbkar32(pid=6508) Backup started           
10/20/2015 2:12:21 PM - Info bpbkar32(pid=6508) accelerator enabled backup, archive bit processing:<disabled>       
10/20/2015 2:12:23 PM - Info bptm(pid=4196) start            
10/20/2015 2:12:25 PM - Info bptm(pid=4196) using 262144 data buffer size        
10/20/2015 2:12:25 PM - Info bptm(pid=4196) setting receive network buffer to 1049600 bytes      
10/20/2015 2:12:25 PM - Info bptm(pid=4196) using 30 data buffers         
10/20/2015 2:12:31 PM - Info bptm(pid=4196) start backup           
10/20/2015 2:12:43 PM - Info bptm(pid=4196) backup child process is pid 4620.4632       
10/20/2015 2:12:43 PM - Info bptm(pid=4620) start            
10/20/2015 2:13:02 PM - Info bpbkar32(pid=6508) 0 entries sent to bpdbm        
10/20/2015 2:13:13 PM - Info bpbkar32(pid=6508) 95000 entries sent to bpdbm        
10/20/2015 2:13:21 PM - Info bpbkar32(pid=6508) 146631 entries sent to bpdbm        
10/20/2015 2:13:21 PM - Info bpbkar32(pid=6508) 146632 entries sent to bpdbm        
10/20/2015 2:13:39 PM - Info nbjm(pid=4736) starting backup job (jobid=51853) for client 132_147_160_082_ZMUNCLMSTST_PCPPI_COM, policy zmunCLMStst_vmware_backup, schedule Monthly_Full_Backup  
10/20/2015 2:13:39 PM - estimated 0 Kbytes needed
10/20/2015 2:13:39 PM - Info nbjm(pid=4736) started backup (backupid=132_147_160_082_ZMUNCLMSTST_PCPPI_COM_1445375619) job for client 132_147_160_082_ZMUNCLMSTST_PCPPI_COM, policy zmunCLMStst_vmware_backup, schedule Monthly_Full_Backup on storage unit znbumedia1_dedupe_disk_pool-stu using backup host znbumaster1
10/20/2015 2:13:40 PM - started process bpbrm (4260)
10/20/2015 2:13:42 PM - connecting
10/20/2015 2:13:43 PM - connected; connect time: 0:00:01
10/20/2015 2:13:47 PM - Info bpbkar32(pid=6508) 192623 entries sent to bpdbm        
10/20/2015 2:13:47 PM - Info bpbkar32(pid=6508) INF - Transport Type =  nbd      
10/20/2015 2:14:07 PM - begin writing
10/20/2015 2:14:15 PM - Critical bptm(pid=4196) Storage Server Error: (Storage server: PureDisk:znbumedia1) pdvfs_io_pwrite: Failed to write to spoold on storage server (invalid argument). Ensure storage server services are running and operational. V-454-10
10/20/2015 2:14:15 PM - Critical bptm(pid=4196) image write failed: error 2060001: one or more invalid arguments   
10/20/2015 2:14:48 PM - Critical bptm(pid=4196) Storage Server Error: (Storage server: PureDisk:znbumedia1) _pdvfs_fcache_flushfile_norm_cleanup: Failed to write to spoold on storage server (invalid stream descriptor). Ensure storage server services are running and operational. V-454-8
10/20/2015 2:14:49 PM - Critical bptm(pid=4196) sts_close_handle failed: 2060017 system call failed       
10/20/2015 2:15:01 PM - Error bptm(pid=4196) cannot write image to disk, Invalid argument      
10/20/2015 2:15:01 PM - Info bptm(pid=4196) EXITING with status 84 <----------        
10/20/2015 2:15:01 PM - Info znbumedia1(pid=4196) StorageServer=PureDisk:znbumedia1; Report=PDDO Stats for (znbumedia1): scanned: 361864 KB, CR sent: 14430 KB, CR sent over FC: 0 KB, dedup: 96.0%, cache disabled
10/20/2015 2:15:05 PM - Info bpbkar32(pid=0) done. status: 84: media write error       
10/20/2015 2:16:28 PM - end writing; write time: 0:02:21
media write error (84)

 

Hoping you could help us with this.

Thanks,

Mariz

18 Replies

Please attach verbose bptm

Please attach verbose bptm log

Hi Riaan, I'm not sure I

Hi Riaan,

I'm not sure I could attach it in here as the file is around 9GB. Is there other way to upload or any info from the log that you would like me to find? Thanks!

Hi, Just post some of it so

Hi,

 

Just post some of it so we can see what happened around 2:10 - 2:15 PM.

Part 1 of bptm log 

Part 2 of bptm log 

Part 2 of bptm log 

Hi, Your log has a 5 hours

Hi,

 

Your log has a 5 hours gap in it. Maybe something is going wrong on your media server?

 

02:12:15.995 [1836.4272] <2> 51839:bptm:1836:znbumedia1: [DEBUG] PDSTS: write_multi_buf: buf(0000000002270000) buflen(262144) bytes(0) map_imglength(94720) imh_cur_filesize(1708032) actual_size(1708032)

07:22:15.012 [4300.1456] <2> 51843:bptm:4300:znbumedia1: [DEBUG] PDSTS: pi_write_image_v7: entry

Check Event Viewer logs on

Check Event Viewer logs on the media server to see if anything was logged in this period.

Hi Riaan, We had a disk

Hi Riaan,

We had a disk failure last month but was fixed, but once it was fixed, we started receiving status 84, 83, 636 and 2074 backups. Last week, we had the disk replaced, but we are still receiving these errors. I've already increased client read timeouts and added touch files, I'm not sure what the cause of the delay though...

Hi Marianne, We haven't

Hi Marianne,

 

We haven't received any error in the Even Viewer since we had the disk replaced (Thursday last week). Though, I'm seeing these errors in DNS in the bptm logs. Please see attachment. Thanks.

Any chance you can post logs

Any chance you can post logs as .txt files?

.docx needs to be downloaded before it can be opened. 
.txt files can be viewed online.

I think you've lost some

I think you've lost some data, there are a lot of errors such as this "failed (2:No such file or directory)"

Hi Marianne, Please see

Hi Marianne,

Please see attached .txt file. Thanks.

Hi Riaan, I'm not sure how to

Hi Riaan,

I'm not sure how to check though. This has been looked into by several support engineers already and none hasn't gotten back to me yet on the results. I'm not sure if they were able to capture the bptm logs before, as they were too big too...

There are also network

There are also network errors

 

07:54:50.038 [4300.1456] <8> 51843:bptm:4300:znbumedia1: [WARN] PDSTS: impl_set_imh_image_prop: PdvfsUnlink(/znbumedia1#1/2/zmunmbx01.pcppi.com/zmunmbx01_vmware_backup/zmunmbx01.pcppi.com_1445350631_C1_F2.info) failed (2:No such file or directory)

07:55:24.288 [4300.1456] <16> 51843:bptm:4300:znbumedia1: [ERROR] PDSTS: write_buf: pd_write(0 812384256) failed (10054:An existing connection was forcibly closed by the remote host. ) for img

07:55:24.288 [4300.1456] <16> 51843:bptm:4300:znbumedia1: [ERROR] PDSTS: write_multi_buf: write_buf() failed (2060019:error occurred on network socket) with imh_status (0x1) for actual size

07:55:24.288 [4300.1456] <16> 51843:bptm:4300:znbumedia1: [ERROR] PDSTS: write_special_buf: write_buf failed: (2060019:error occurred on network socket). bytes: 0 status: 0x1

07:55:24.288 [4300.1456] <16> 51843:bptm:4300:znbumedia1: [ERROR] PDSTS: write_unknown_len_buf: write_special_buf() failed, byteswritten:<0>, status:<0x1> (2060019:error occurred on network socket)

07:55:24.288 [4300.1456] <16> 51843:bptm:4300:znbumedia1: [ERROR] PDSTS: impl_write_image: write_unknown_len_buf() failed (2060019:error occurred on network socket)

07:55:24.288 [4300.1456] <16> 51843:bptm:4300:znbumedia1: [ERROR] PDSTS: pi_write_image_v7: impl_write_image() failed (2060019:error occurred on network socket)

07:55:24.476 [4300.1456] <32> do_write_image: sts_write_image failed: byteswritten = 0, len = 262144

07:55:24.476 [4300.1456] <32> do_write_image: sts_write_image failed: error 2060019 (812384256 262144 0)

07:55:24.491 [4300.1456] <2> set_job_details: Tfile (51843): LOG 1445352924 32 bptm 4300 image write failed: error 2060019: error occurred on network socket

07:55:24.491 [4300.1456] <2> send_job_file: job ID 51843, ftype = 3 msg len = 96, msg = LOG 1445352924 32 bptm 4300 image write failed: error 2060019: error occurred on network socket

07:55:24.648 [4300.1456] <32> write_data: image write failed: error 2060019:

07:55:29.215 [4300.1456] <16> 51843:bptm:4300:znbumedia1: [ERROR] PDSTS: close_image: PdvfsClose(/znbumedia1#1/2/zmunmbx01.pcppi.com/zmunmbx01_vmware_backup/zmunmbx01.pcppi.com_1445350631_C1_F2.img) failed (5:Input/output error) for image fd

07:55:31.106 [4300.1456] <16> 51843:bptm:4300:znbumedia1: [ERROR] PDSTS: impl_close_image: close_image() failed (2060017:system call failed)

07:55:31.653 [4300.1456] <2> 51843:bptm:4300:znbumedia1: [DEBUG] PDSTS: impl_close_image: exit (2060017:system call failed) 0x4

07:55:31.653 [4300.1456] <16> 51843:bptm:4300:znbumedia1: [ERROR] PDSTS: pi_close_image_v7: impl_close_image() failed (2060017:system call failed)

07:55:31.653 [4300.1456] <2> 51843:bptm:4300:znbumedia1: [DEBUG] PDSTS: pi_close_image_v7: exit (2060017:system call failed)

07:55:31.793 [4300.1456] <2> set_job_details: Tfile (51843): LOG 1445352931 32 bptm 4300 sts_close_handle failed: 2060017 system call failed

Hi Riaan, Currently, we have

Hi Riaan,

 

Currently, we have 2 media servers. The other one backs up clearly fine, but this one seems unstable. When it comes to the network, could you suggest what actions we can take? We tried continuous ping and connection is stable. 

 

Thanks!

Hi, Sorry i should have said

Hi,

 

Sorry i should have said networking related error, not necessarily the network. Software could be cuasing the ports to be closed etc.

 

As you've indicated that you had a hardware failure on this server you should maybe think about reinstalling it after you duplicate the images you have on it.

 

Have you tried restores or duplication of any images from this pool since the failure?

Hi Riaan, We only did MSDP

Hi Riaan,

 

We only did MSDP reconfiguration for this. We have done replication of images and restoration to the DR site though of some VMs...

Moved:

Re: Hi Riaan,We only did MSDP