We are running a NetBackup Enterprise 6.0 MP6 environment which backs up several servers of various different flavours as well as running NDMP backups on our IBM (NetApp) filer.
In the past few months one of our NDMP policies has started to complete with a status of 1 maybe once or twice a week. Viewing the job details of the job in question reveals an error listed as:
Error ndmpagent(pid=xxxx) NDMP backup failed, path = /vol/xxxx
The path that fails is more often than not the same but on odd occasions it has been different paths seemingly with no pattern.
I had a call logged with our third party support company that looks after our NAS device and they suggested it may be lack of space on the volume for the snapshot to be created. After clearing down some space we had a short run of successful backups but then the problem returned. I am now not convinced that it is a space issue as last night one of the paths failed that appears to have space on both the volume as a whole and in the space set aside for the snapshots.
I am not really sure what to try next or where to look for answers, below are some log entries I retrieved by running: vxlogview -i ndmpagent -d all for the time period that the path failed:
1/25/2010 21:50:25.411 [Diagnostic] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 3 V-134-133 [XmServerControl::MesssageReceive] [369] Received 15 (GET_KBYTES)
1/25/2010 21:50:25.411 [Diagnostic] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 3 V-134-134 [XmServerControl::MesssageReceive] [369] Replying 15 161758548, error = 0
1/25/2010 21:50:26.786 [Diagnostic] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 4 V-134-24 [NdmpBackupManager::UpdateRecordNumber] Records 2591103 KbytesThisPath 163239489 KbytesThisFragment 163239489
1/25/2010 21:50:26.786 [Diagnostic] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 4 V-134-188 [NdmpFhManager::UpdateCounts] received 4096 nodes, 557056 total nodes
1/25/2010 21:50:48.207 [Debug] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 1 [CtnLogMsgCB] NDMP_LOG_NORMAL 0 DUMP: Tape write failed.
1/25/2010 21:50:48.426 [Debug] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 1 [CtnLogMsgCB] NDMP_LOG_NORMAL 0 DUMP: DUMP IS ABORTED
1/25/2010 21:50:49.848 [Debug] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 1 [CtnLogMsgCB] NDMP_LOG_NORMAL 0 DUMP: Deleting "/vol/vol_xdrive/../snapshot_for_backup.836" snapshot.
1/25/2010 21:50:52.598 [Diagnostic] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 2 V-134-33 [NdmpBackupManager::NotifyDataHalted] received DATA_HALTED reason = 4 (NDMP_DATA_HALT_CONNECT_ERROR)
1/25/2010 21:50:52.598 [Debug] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 2 [NdmpManager::DoHalt] halt reason = THIS OPERATION FAILED
1/25/2010 21:50:52.598 [Debug] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 2 [NdmpManager::SetState] state change from ACTIVE to HALTING
1/25/2010 21:50:52.598 [Debug] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 1 [CtnLogMsgCB] NDMP_LOG_NORMAL 0 Connection or IO Error.
1/25/2010 21:50:52.598 [Diagnostic] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 2 V-134-35 [NdmpBackupManager::NotifyMoverHalted] received MOVER_HALTED reason = 3 (NDMP_MOVER_HALT_INTERNAL_ERROR)
1/25/2010 21:50:52.598 [Debug] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 1 [CtnLogMsgCB] NDMP_LOG_NORMAL 0 MoveletOutput: Internal Error.
1/25/2010 21:50:52.816 [Diagnostic] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 4 V-134-24 [NdmpBackupManager::UpdateRecordNumber] Records 2607993 KbytesThisPath 164303559 KbytesThisFragment 164303559
1/25/2010 21:50:52.816 [Diagnostic] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 4 V-134-188 [NdmpFhManager::UpdateCounts] received 4096 nodes, 561152 total nodes
1/25/2010 21:50:52.816 [Application] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] [Error] V-134-32 NDMP backup failed, path = /vol/vol_xdrive
1/25/2010 21:50:53.691 [Debug] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 2 [NdmpManager::SetState] state change from HALTING to IDLE
Can anyone suggest what the problem may be from the above or help with what steps I should take next in troubleshooting this issue?