cancel
Showing results forΒ 
Search instead forΒ 
Did you mean:Β 

Exchange 2010 DAG backup Failures.

David_V_NL
Level 3

Hello all,

For some time now we get intermittent backup failures in the exchange 2010 DAG backup.
The failures only occur in the full backup (on the weekend), incremental backups during the week run without problems.

We see an error in the netbackup activity monitor: socket write failed(24)

Job details:
13-Sep-14 9:13:33 PM - begin writing
13-Sep-14 11:29:18 PM - Critical bpbrm(pid=26883) from client dag.xx.xx: FTL - socket write failed    
13-Sep-14 11:29:20 PM - Error bptm(pid=27781) media manager terminated by parent process      
13-Sep-14 11:44:58 PM - Info bpbkar(pid=14560) done. status: 24: socket write failed      
13-Sep-14 11:44:58 PM - end writing; write time: 2:31:25
socket write failed(24)

When looking in the applicatation log on the exchange client we see 2 errors at the time of the failure:

Application Log
13-Sep-14 11:29:18 PM
eventid 401
Instance 1: The physical consistency check has completed, but one or more errors were detected. The consistency check has terminated with error code of -106 (0xffffff96).
eventid 403
Instance 1: The physical consistency check successfully validated 4191658 out of 12526160 pages of database '\\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy15\MB014\MB014.edb'. Because some database pages were either not validated or failed validation, the consistency check has been considered unsuccessful.

Netbackup Logging on the Client

In the exchange client bpbkar log we see:
21:13:39.549 [8160.16316] <4> V_Snapshot::V_Snapshot_ExcludeRemoteFiles: INF -   Excluding /\\?/Volume{4390bc2e-a934-11e2-8296-005056ac2864}/pagefile.sys
23:29:18.402 [14560.14712] <16> tar_tfi::processException:

An Exception of type [SocketWriteException] has occured at:

  Module: @(#) $Source: src/ncf/tfi/lib/TransporterRemote.cpp,v $ $Revision: 1.55 $ , Function: TransporterRemote::write[2](), Line: 338

  Module: @(#) $Source: src/ncf/tfi/lib/Packer.cpp,v $ $Revision: 1.91.94.2 $ , Function: Packer::getBuffer(), Line: 653

  Module: tar_tfi::getBuffer, Function: D:\NB\NB_7.6.0.3\src\cl\clientpc\util\tar_tfi.cpp, Line: 311

  Local Address: [::]:0

  Remote Address: [::]:0

  OS Error: 10060 (A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

)

  Expected bytes: 524288

23:29:18.433 [14560.14712] <2> tar_base::V_vTarMsgW: FTL - socket write failed
23:29:18.433 [14560.14712] <4> tar_backup::backup_done_state: INF - number of file directives not found: 0
23:29:18.433 [14560.14712] <4> tar_backup::backup_done_state: INF -     number of file directives found: 5
23:29:18.433 [14560.12468] <4> tar_base::keepaliveThread: INF - keepalive thread terminating (reason: WAIT_OBJECT_0)
23:29:18.448 [14560.14712] <4> tar_base::stopKeepaliveThread: INF - keepalive thread has exited. (reason: WAIT_OBJECT_0)
23:29:18.464 [14560.14712] <2> tar_base::V_vTarMsgW: INF - EXIT STATUS 24: socket write failed
23:29:18.464 [14560.14712] <4> tar_backup::backup_done_state: INF - Not waiting for server status
23:29:18.464 [14560.14712] <2> ov_log::V_GlobalLog: ERR - endChksgfilesCCheck:ErrTerm() failed with error code -106.
23:29:18.464 [14560.14712] <2> exchange_shadowcopy_access::V_CloseForRead(): ERR - consistency check failed for 'Microsoft Information Store:\MB014\'
23:29:18.464 [14560.14712] <2> tar_base::V_vTarMsgW: WRN - Exchange Validation for 'Microsoft Information Store:\MB014\' failed.  Please refer to the backup and application event logs for more details.
23:29:18.464 [14560.14712] <2> ov_log::V_GlobalLog: ERR - endChksgfilesCCheck:ErrTerm() failed with error code -1029.
23:29:18.480 [14560.14712] <4> dos_backup::tfs_reset: INF - Snapshot deletion start
23:29:18.480 [14560.14712] <4> ov_log::OVLoop: Timestamp
23:29:18.480 [14560.14712] <4> OVStopCmd: INF - EXIT - status = 0
23:29:18.495 [14560.14712] <2> tar_base::V_Close: closing...
23:29:18.495 [14560.14712] <4> dos_backup::tfs_reset: INF - Snapshot deletion start
23:29:18.604 [14560.14712] <2> ov_log::V_GlobalLog: INF - BEDS_Term(): enter - InitFlags:0x00000001
23:31:18.803 [14560.14712] <4> OVShutdown: INF - Finished process
23:31:18.803 [14560.14712] <4> WinMain: INF - Exiting C:\Program Files\Veritas\NetBackup\bin\bpbkar32.exe

Symantec Tech Note

We found:
http://www.symantec.com/business/support/index?page=content&id=TECH136986
and we set the shadow copy to: "No limit". this was set only on the disks with the database (both active and passive)
We did this three weeks ago.
The backups ran fine for two weeks and it looked that this solved the problem.
But no... this weekend the problem was back again.

In all cases when we had the failure we did a rerun of the failed databases an the rerun always ended good.

Our enviroment:

  • Master: Windows 2008 R2, Netbackup version 7.6.0.3
  • Media servers: Windows 2008 R2, Netbackup version 7.6.0.3
  • The policy is set to backup the passive copy and if not available the active copy
  • Snapshot method: VSS
  • Exchange 2010 DAG, Netbackup version 7.6.0.3
  • It is a 4 node DAG
  • The Database has an active and pasive copy
  • We only backup the Database

Any help on where we can look to resolve this problem would be much appreciated

David

1 ACCEPTED SOLUTION

Accepted Solutions

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Hello,

 

NetBackup will run a consistency check on the database (as you can see in the application logs). What you need to investigate is why its failing. What is wrong with them?

 

Or you need to disable this on the properties of the Exchange client.

Perform consistency check before backup with Microsoft Volume Shadow Copy Service (VSS)

View solution in original post

8 REPLIES 8

Nicolai
Moderator
Moderator
Partner    VIP   

Did the phycial drive run out of space while backup was running ?

Have you tried to divert the VSS snap to another disk drive than the orginal one ?

See test 1 - step in this tech note:

http://www.symantec.com/docs/TECH47808

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

Hello,

 

NetBackup will run a consistency check on the database (as you can see in the application logs). What you need to investigate is why its failing. What is wrong with them?

 

Or you need to disable this on the properties of the Exchange client.

Perform consistency check before backup with Microsoft Volume Shadow Copy Service (VSS)

Marianne
Level 6
Partner    VIP    Accredited Certified

Get your Exchange Admin to investigate this:

23:29:18.464 [14560.14712] <2> exchange_shadowcopy_access::V_CloseForRead(): ERR - consistency check failed for 'Microsoft Information Store:\MB014\'
23:29:18.464 [14560.14712] <2> tar_base::V_vTarMsgW: WRN - Exchange Validation for 'Microsoft Information Store:\MB014\' failed.  Please refer to the backup and application event logs for more details.
23:29:18.464 [14560.14712] <2> ov_log::V_GlobalLog: ERR - endChksgfilesCCheck:ErrTerm() failed with error code -1029.

NBU is reporting the error. 
Not causing it.

 

David_V_NL
Level 3

Hello Nicolai,

 

We didn't see any evidence of space running out in the windows eventlog.

In other cases where we had trouble with vss snapshot there was a clear message in the events of snapshots running out of space. Here we didn't see anything of this.

The disk where the database is running is 2 TB in size with aprox 1.5 TB free space.

The size of the database (without logs) is about 400 GB in size.

If the VSS snapshot is on the same disk ther should be enough space for a complete copy.

David

David_V_NL
Level 3

Hello Marianne,

Our Exchange admin is (at this moment) unable to find something wrong with the (original) database. The VSS copy is already deleted so we cannot investigate that anymore

Exchange Eventlogs don't reveal any extra information other than that is fails.

The Backup runs OK on a rerun. That is the strange part.

I will ask the exchnage admin to see if he can investigate deeper and if more logging could be turned on.

 

David

 

David_V_NL
Level 3

Hello Riaan,

It is only the consistency check of the copy that is failing. The original database seens to bee ok.

Is disabling the consistency check a good idea?

 

David

RiaanBadenhorst
Moderator
Moderator
Partner    VIP    Accredited Certified

That would still cause issues.If the copy is not consistent that could also cause your log to not be truncated. I would try and figure out why its not consistent, after all, you'd want to use it in case you have a failure in the active copy.

David_V_NL
Level 3

Hello all,

It seems that our exchange enviroment was and stil is experiencing performance and stability issues.

We are looking into that, maybe it could have caused the failures

David