Please post all info in job

backupblerg · ‎06-12-2012

We have 2 exchange clusters running ccr replication. On one of them backups on eight mailbox stores are failing and nothing is being logged in the event log on exchange itself. When the failure occurs I am see errors like the following in the netbackup logs. Right now we are okay but these stores will rapidly fill our log drive if we cannot capture a full backup.

09:48:16.433 [16344.8960] <2> bpcd exit_bpcd: exit status 0 ----------->exiting

09:48:16.526 [16492.14496] <2> process_requests: BPCD_READ_HOST_CONFIG_RQST

09:48:16.916 [16492.14496] <2> process_requests: BPCD_DISCONNECT_RQST

09:48:16.916 [16492.14496] <2> bpcd exit_bpcd: exit status 0 ----------->exiting

09:48:17.088 [4692.4696] <2> vnet_pbxAcceptSocket: Accepted sock[492] from 150.228.21.26:3181

09:48:17.088 [4692.4696] <2> bpcd main: accept sock = 492

09:48:17.166 [12492.11244] <2> setup_debug_log: switched debug log file for bpcd

09:48:17.166 [12492.11244] <2> bpcd main: VERBOSE = 0

09:48:17.166 [12492.11244] <2> logparams: C:\Program Files\VERITAS\NetBackup\bin\bpcd.exe -standalone

09:48:17.166 [12492.11244] <2> process_requests: offset to GMT 21600

09:48:17.166 [12492.11244] <2> logconnections: BPCD ACCEPT FROM 150.228.21.26.3181 TO 10.228.12.1.1556 fd = 492

09:48:17.166 [12492.11244] <2> process_requests: setup_sockopts complete

09:48:17.182 [12492.11244] <2> bpcd peer_hostname: Connection from host stptape2ms.adminsys.mrll.com (150.228.21.26) port 3181

09:48:17.182 [12492.11244] <2> bpcd valid_server: comparing master and stptape2ms.adminsys.mrll.com

09:48:19.490 [12492.11244] <2> hosts_equal: gethostbyname failed for master: The requested name is valid, but no data of the requested type was found.

(0)

09:48:19.490 [12492.11244] <2> bpcd valid_server: comparing media2 and stptape2ms.adminsys.mrll.com

09:48:21.799 [12492.11244] <2> hosts_equal: gethostbyname failed for media2: The requested name is valid, but no data of the requested type was found.

(0)

09:48:21.799 [12492.11244] <2> bpcd valid_server: comparing media3 and stptape2ms.adminsys.mrll.com

09:48:24.108 [12492.11244] <2> hosts_equal: gethostbyname failed for media3: The requested name is valid, but no data of the requested type was found.

(0)

09:48:24.108 [12492.11244] <2> bpcd valid_server: comparing media4 and stptape2ms.adminsys.mrll.com

09:48:26.417 [12492.11244] <2> hosts_equal: gethostbyname failed for media4: The requested name is valid, but no data of the requested type was found.

(0)

09:48:26.417 [12492.11244] <2> bpcd valid_server: comparing osiris.bck.mrll.com and stptape2ms.adminsys.mrll.com

09:48:26.417 [12492.11244] <2> bpcd valid_server: comparing osiris.stp.mrll.com and stptape2ms.adminsys.mrll.com

09:48:26.417 [12492.11244] <2> bpcd valid_server: comparing bertape2ms.bck.mrll.com and stptape2ms.adminsys.mrll.com

09:48:26.417 [12492.11244] <2> bpcd valid_server: comparing lax1pbe02.bck.mrll.com and stptape2ms.adminsys.mrll.com

09:48:26.417 [12492.11244] <2> bpcd valid_server: comparing stp2pbe01.bck.mrll.com and stptape2ms.adminsys.mrll.com

09:48:26.417 [12492.11244] <2> bpcd valid_server: comparing stp2pbe01.stp.mrll.com and stptape2ms.adminsys.mrll.com

09:48:26.417 [12492.11244] <2> bpcd valid_server: comparing stptape1ms.bck.mrll.com and stptape2ms.adminsys.mrll.com

09:48:26.417 [12492.11244] <2> bpcd valid_server: comparing stptape1ms.stp.mrll.com and stptape2ms.adminsys.mrll.com

09:48:26.417 [12492.11244] <2> bpcd valid_server: comparing stptape2ms.bck.mrll.com and stptape2ms.adminsys.mrll.com

09:48:26.417 [12492.11244] <2> bpcd valid_server: comparing stptape2ms.stp.mrll.com and stptape2ms.adminsys.mrll.com

09:48:26.417 [12492.11244] <4> bpcd valid_server: hostname comparison succeeded

09:48:26.417 [12492.11244] <2> process_requests: output socket port number = 1

09:48:26.448 [12492.11244] <2> process_requests: Duplicated vnetd socket on stderr

09:48:26.448 [12492.11244] <2> process_requests: <---- NetBackup 7.0 0 ------------initiated

09:48:26.448 [12492.11244] <2> process_requests: VERBOSE = 0

09:48:26.448 [12492.11244] <2> process_requests: Not using VxSS authentication with stptape2ms.adminsys.mrll.com

09:48:27.150 [12492.11244] <2> process_requests: BPCD_GET_VERSION_RQST

09:48:27.353 [12492.11244] <2> process_requests: BPCD_GET_VERSION_RQST

09:48:27.556 [12492.11244] <2> process_requests: BPCD_READ_USERS_CONFIG_RQST

09:48:27.758 [12492.11244] <2> process_requests: BPCD_GET_VERSION_RQST

09:48:27.961 [12492.11244] <2> process_requests: BPCD_READ_HOST_CONFIG_RQST

09:48:28.164 [12492.11244] <2> process_requests: BPCD_DISCONNECT_RQST

09:48:28.164 [12492.11244] <2> bpcd exit_bpcd: exit status 0 ----------->exiting

09:49:32.249 [4692.4696] <2> vnet_pbxAcceptSocket: Accepted sock[492] from 150.228.21.26:3323

09:49:32.249 [4692.4696] <2> bpcd main: accept sock = 492

Mark_Solutions · ‎06-12-2012

So has anything changed recently (active node, DNS, patches etc.)

Have you checked that the NetBackup client service is running on all nodes and that replication is up to date

The early errors indicate name resolution errors or network errors so check all of those things

backupblerg · ‎06-12-2012

The active nodes have not changed and I am not aware of any DNS changes. There was some patches that were installed on the 2nd which is when the issues started on these particular stores. These patches were installed on all 4 of the mailbox servers and the stores having issues are not the only stores on the cluster.

Get-storagegroupcopystatus indicates all stores are healthy. We have also checked the network cards to ensure they are not experiencing errors. I have included the information on the last backups of the problematic stores.

StorageGroupName : Ar-Az

LatestFullBackupTime : 6/5/2012 1:29:11 PM

LatestIncrementalBackupTime :

SnapshotBackup : True

StorageGroupName : Ba-Bo

LatestFullBackupTime : 6/1/2012 6:10:28 AM

LatestIncrementalBackupTime : 6/2/2012 10:01:05 AM

SnapshotBackup : True

StorageGroupName : Bp-Bz

LatestFullBackupTime : 6/1/2012 6:13:31 AM

LatestIncrementalBackupTime : 6/6/2012 8:03:28 PM

SnapshotBackup : True

StorageGroupName : Ca-Ck

LatestFullBackupTime : 6/5/2012 1:28:46 PM

LatestIncrementalBackupTime :

SnapshotBackup : True

StorageGroupName : Cl-Cz

LatestFullBackupTime : 5/29/2012 12:03:39 PM

LatestIncrementalBackupTime : 6/2/2012 6:25:41 AM

SnapshotBackup : True

StorageGroupName : Dj-Dz

LatestFullBackupTime : 5/29/2012 6:42:48 AM

LatestIncrementalBackupTime : 6/2/2012 6:49:59 AM

SnapshotBackup : True

Marianne · ‎06-12-2012

Please post all info in job details for one of the failed jobs as well as bpcd and bpbkar logs. Please rename logs to reflect process (e.g. bpbkar.txt) and post as attachments. Probably good idea to increase logging level to something like 3...

Handy NetBackup Links

backupblerg · ‎06-12-2012

Here are the bpcd and bpbkar logs. The storage group is the only group that has access to the job details. My knowledge is more on the exchange side. When I got them engaged in the issue they said the following. I had suspected it had something to do with the client on this particular box

"The list of Master/Media servers for both nodes in EXHA1 contained several bogus entries, including one in the 1^st position which is reserved for the Master server.

I have removed these bogus entries and I’m going to try one of the backups again. I am not certain, but the discrepancy in server lists could have been a contributing factor."

After these changes we are are 8% done with a backup on one of the stores so we will see how that goes. Before they were failing instantly.

Mark_Solutions · ‎06-13-2012

The bpcd log is huge! - not even going to try and open it.

This does imply something was going mad on this client - that could have been caused by your bogus entries

Update us after the backups finish and then see if it starts to create smaller bpcd logs indicating that things have settled a little

VOX

8 mailbox stores failing in netbackup