cancel
Showing results for 
Search instead for 
Did you mean: 

Intermittent NDMP backup failures

Hallucyn8
Level 2
We are running a NetBackup Enterprise 6.0 MP6 environment which backs up several servers of various different flavours as well as running NDMP backups on our IBM (NetApp) filer.

In the past few months one of our NDMP policies has started to complete with a status of 1 maybe once or twice a week. Viewing the job details of the job in question reveals an error listed as:

Error ndmpagent(pid=xxxx) NDMP backup failed, path = /vol/xxxx

The path that fails is more often than not the same but on odd occasions it has been different paths seemingly with no pattern.

I had a call logged with our third party support company that looks after our NAS device and they suggested it may be lack of space on the volume for the snapshot to be created. After clearing down some space we had a short run of successful backups but then the problem returned. I am now not convinced that it is a space issue as last night one of the paths failed that appears to have space on both the volume as a whole and in the space set aside for the snapshots.

I am not really sure what to try next or where to look for answers, below are some log entries I retrieved by running: vxlogview -i ndmpagent -d all for the time period that the path failed:

1/25/2010 21:50:25.411 [Diagnostic] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 3 V-134-133 [XmServerControl::MesssageReceive] [369] Received 15 (GET_KBYTES) 
1/25/2010 21:50:25.411 [Diagnostic] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 3 V-134-134 [XmServerControl::MesssageReceive] [369] Replying 15 161758548, error = 0
1/25/2010 21:50:26.786 [Diagnostic] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 4 V-134-24 [NdmpBackupManager::UpdateRecordNumber] Records 2591103 KbytesThisPath 163239489 KbytesThisFragment 163239489
1/25/2010 21:50:26.786 [Diagnostic] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 4 V-134-188 [NdmpFhManager::UpdateCounts] received 4096 nodes, 557056 total nodes
1/25/2010 21:50:48.207 [Debug] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 1 [CtnLogMsgCB] NDMP_LOG_NORMAL 0 DUMP: Tape write failed.
1/25/2010 21:50:48.426 [Debug] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 1 [CtnLogMsgCB] NDMP_LOG_NORMAL 0 DUMP: DUMP IS ABORTED
1/25/2010 21:50:49.848 [Debug] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 1 [CtnLogMsgCB] NDMP_LOG_NORMAL 0 DUMP: Deleting "/vol/vol_xdrive/../snapshot_for_backup.836" snapshot.
1/25/2010 21:50:52.598 [Diagnostic] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 2 V-134-33 [NdmpBackupManager::NotifyDataHalted] received DATA_HALTED reason = 4 (NDMP_DATA_HALT_CONNECT_ERROR)
1/25/2010 21:50:52.598 [Debug] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 2 [NdmpManager::DoHalt] halt reason = THIS OPERATION FAILED
1/25/2010 21:50:52.598 [Debug] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 2 [NdmpManager::SetState] state change from ACTIVE to HALTING
1/25/2010 21:50:52.598 [Debug] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 1 [CtnLogMsgCB] NDMP_LOG_NORMAL 0 Connection or IO Error.
1/25/2010 21:50:52.598 [Diagnostic] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 2 V-134-35 [NdmpBackupManager::NotifyMoverHalted] received MOVER_HALTED reason = 3 (NDMP_MOVER_HALT_INTERNAL_ERROR)
1/25/2010 21:50:52.598 [Debug] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 1 [CtnLogMsgCB] NDMP_LOG_NORMAL 0 MoveletOutput: Internal Error.
1/25/2010 21:50:52.816 [Diagnostic] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 4 V-134-24 [NdmpBackupManager::UpdateRecordNumber] Records 2607993 KbytesThisPath 164303559 KbytesThisFragment 164303559
1/25/2010 21:50:52.816 [Diagnostic] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 4 V-134-188 [NdmpFhManager::UpdateCounts] received 4096 nodes, 561152 total nodes
1/25/2010 21:50:52.816 [Application] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] [Error] V-134-32 NDMP backup failed, path = /vol/vol_xdrive
1/25/2010 21:50:53.691 [Debug] NB 51216 ndmpagent 134 PID:5380 TID:496 File ID:134 [No context] 2 [NdmpManager::SetState] state change from HALTING to IDLE


Can anyone suggest what the problem may be from the above or help with what steps I should take next in troubleshooting this issue?
2 REPLIES 2

lu
Level 6
If you have a firewall between NBU and your filer, you can try to create the file /usr/openv/netbackup/db/config/ndmp.cfg and put the following keyword in it : NDMP_MOVER_CLIENT_DISABLE

Hallucyn8
Level 2
Thanks for the suggestion, however this is not relevant as there is no firewall between NBU and the filer.

Just to mention as well that NBU is running on Windows Server 2003 within our environment.

A bit more background to this problem which may or may not prompt some more suggestions.... the problems seem to coincide with some data moves on the storage which resulted in data sizes varying quite a lot and also more paths added to the policy that is causing the problem. However no data moves have happened for a couple of weeks and the last failure was last night.

Any more suggestions greatly appreciated