cancel
Showing results for 
Search instead for 
Did you mean: 

NDMP Backup failure- HELP

Yashwanth
Level 3
Hello,
We have a problem with our NDMP backups. The jobs are failing with status code 99. The same jobs were running fine earlier. No changes have been made which would cause this failure. Below is the detailed status.
03/07/2010 16:28:21 - estimated 775551555 kbytes needed
03/07/2010 16:28:22 - started process bpbrm (5936)
03/07/2010 16:28:23 - connecting
03/07/2010 16:28:23 - connected; connect time: 00:00:00
03/07/2010 16:28:26 - mounting T3MO07
03/07/2010 16:29:07 - mounted; mount time: 00:00:41
03/07/2010 16:29:08 - positioning T3MO07 to file 5
03/07/2010 16:29:17 - positioned T3MO07; position time: 00:00:09
03/07/2010 16:29:17 - begin writing
03/07/2010 16:40:56 - Error ndmpagent(pid=412) NDMP backup failed, path = /vol/cc02fdvol/.snapshot/hourly.0/      
03/07/2010 16:41:32 - end writing; write time: 00:12:15
NDMP backup failure(99)

Netbackup master server : 6.5.4
Netapp filer version: 7.3.1
Master server acts as medias server for NDMP backups.
Could any one please help resolving this issue?

Regards,
Yash
1 ACCEPTED SOLUTION

Accepted Solutions

Andy_Welburn
Level 6
e.g. in /vol/vol0/etc/log/backup

That may indicate an issue with the actual dump from the filers point of view.

***EDIT***
Altho' having looked at one of our failed jobs it might not (there was actually more info in Job Details!) - this was caused by dodgy tape:
dmp Thu May 27 19:24:20 BST /vol/vol1/qtree(1) Tape_close (ndmp)
dmp Thu May 27 19:24:20 BST /vol/vol1/qtree(1) Abort (152382 MB)

Could you maybe by-pass NB & try a dump straight off the filer? That would at least indicate whether the issue was with NB or not.
i.e. similar to the "Troubleshooting" section in this T/N:
NDMP backup fails with Status Code 99 - DUMP: could not create "backup" snapshot : No space left on device.
http://seer.entsupport.symantec.com/docs/321198.htm
(altho' yours shouldn't be a space issue as you are taking a backup of a snaphot directly in the path)

View solution in original post

25 REPLIES 25

Andy_Welburn
Level 6

Possible someone has deleted the snapshot to create space?

Anyone amended the path in the policy & introduced a typo?

Any more info from the logs/reports e.g. problems report, all log entries or the backup log (etc/log/backup) on the filer?

Yashwanth
Level 3
Hi Andy,
Thanks for your quick reply.
No one has deleted the snapshot. I have checked with my other colleagues. These backups are failing everyday.
I am the only one who manages the backups and I can confirm that no changes are made to the policy.
All log entries say "backup of client san1 exited with status 99 (NDMP backup failure)
NDMPD log on the filer say

NDMP message type: NDMP_CONNECT_CLOSE
Jun 29 14:45:11 GMT [ndmpd:83]: NDMP message replysequence: 7
Jun 29 14:45:11 GMT [ndmpd:83]: Message Header:
Jun 29 14:45:11 GMT [ndmpd:83]: Sequence 0
Jun 29 14:45:11 GMT [ndmpd:83]: Timestamp 0
Jun 29 14:45:11 GMT [ndmpd:83]: Msgtype 1
Jun 29 14:45:11 GMT [ndmpd:83]: Method 2306
Jun 29 14:45:11 GMT [ndmpd:83]: ReplySequence 7
Jun 29 14:45:11 GMT [ndmpd:83]: Error NDMP_NO_ERR
Jun 29 14:45:11 GMT [ndmpd:83]: Cleaning up connection
Jun 29 14:45:11 GMT [ndmpd:83]: Error sending notify shutdown message
Jun 29 14:45:11 GMT [ndmpd:83]: Ndmpd session closed successfully
Jun 29 14:45:11 GMT [ndmpd:83]: Calling NdmpServer.kill
Jun 29 14:45:13 GMT [ndmpd:84]: Created an NDMP server connection
Jun 29 14:45:13 GMT [ndmpd:84]: Message NDMP_NOTIFY_CONNECTION_STATUS sent
Jun 29 14:45:13 GMT [ndmpd:84]: Message Header:
Jun 29 14:45:13 GMT [ndmpd:84]: Sequence 1
Jun 29 14:45:13 GMT [ndmpd:84]: Timestamp 1277819113
Jun 29 14:45:13 GMT [ndmpd:84]:

Marianne
Level 6
Partner    VIP    Accredited Certified

Can you find logs on the filer that correspond with the backup failure date/time?
03/07/2010 16:40:56 

The NDMP log that you posted is a completely different date and time:
Jun 29 14:45:13

demo4119
Level 6
Partner Accredited Certified

Please make sure the backup selection path is correct.

Andy_Welburn
Level 6
Can you create (not copy exisiting) a new policy to test backup of, say, /vol/cc02fdvol ?

Nothing of note in the 'reports'?

***EDIT***
A couple of T/Ns that may give other avenues of investigation:

STATUS CODE: 99 "NDMP backup failure" occurs when backing up a NDMP client.
http://seer.entsupport.symantec.com/docs/275330.htm

DOCUMENTATION: How to troubleshoot NDMP Backups failures when status code 99 (NDMP backup failure) is reported.
http://seer.entsupport.symantec.com/docs/295366.htm

Yashwanth
Level 3
Hi Marianne,
here are the corresponding logs both detailed logs on master server and ndmpd log on filer.
NDMPD log from filer:
Jul 05 17:19:47 GMT [ndmpd:182]: NDMP message replysequence: 3
Jul 05 17:19:47 GMT [ndmpd:182]: Message Header:
Jul 05 17:19:47 GMT [ndmpd:182]: Sequence 0
Jul 05 17:19:47 GMT [ndmpd:182]: Timestamp 0
Jul 05 17:19:47 GMT [ndmpd:182]: Msgtype 1
Jul 05 17:19:47 GMT [ndmpd:182]: Method 259
Jul 05 17:19:47 GMT [ndmpd:182]: ReplySequence 3
Jul 05 17:19:47 GMT [ndmpd:182]: Error NDMP_NO_ERR
Jul 05 17:19:47 GMT [ndmpd:182]: Auth type: 2
Jul 05 17:19:47 GMT [ndmpd:182]: Error code: NDMP_NO_ERR
Jul 05 17:19:47 GMT [ndmpd:182]: Challenge: [B@31c6568
Jul 05 17:19:47 GMT [ndmpd:182]: NDMP message type: NDMP_CONNECT_CLIENT_AUTH
Jul 05 17:19:47 GMT [ndmpd:182]: NDMP message replysequence: 4
Jul 05 17:19:47 GMT [ndmpd:182]: Message Header:
Jul 05 17:19:47 GMT [ndmpd:182]: Sequence 0
Jul 05 17:19:47 GMT [ndmpd:182]: Timestamp 0
Jul 05 17:19:47 GMT [ndmpd:182]: Msgtype 1
Jul 05 17:19:47 GMT [ndmpd:182]: Method 2305
Jul 05 17:19:47 GMT [ndmpd:182]: ReplySequence 4
Jul 05 17:19:47 GMT [ndmpd:182]: Error NDMP_NO_ERR
Jul 05 17:19:47 GMT [ndmpd:182]: Auth Type: 2
Jul 05 17:19:47 GMT [ndmpd:182]: Error code: NDMP_NO_ERR
Jul 05 17:19:48 GMT [ndmpd:182]: NDMP message type: NDMP_CONFIG_GET_HOST_INFO_V3
Jul 05 17:19:48 GMT [ndmpd:182]: NDMP message replysequence: 5
Jul 05 17:19:48 GMT [ndmpd:182]: Message Header:
Jul 05 17:19:48 GMT [ndmpd:182]: Sequence 0
Jul 05 17:19:48 GMT [ndmpd:182]: Timestamp 0
Jul 05 17:19:48 GMT [ndmpd:182]: Msgtype 1
Jul 05 17:19:48 GMT [ndmpd:182]: Method 256
Jul 05 17:19:48 GMT [ndmpd:182]: ReplySequence 5
Jul 05 17:19:48 GMT [ndmpd:182]: Error NDMP_NO_ERR
Jul 05 17:19:48 GMT [ndmpd:182]: Error code: NDMP_NO_ERR
Jul 05 17:19:48 GMT [ndmpd:182]: NDMP message type: NDMP_TAPE_OPEN
Jul 05 17:19:48 GMT [ndmpd:182]: NDMP message replysequence: 6
Jul 05 17:19:48 GMT [ndmpd:182]: Message Header:
Jul 05 17:19:48 GMT [ndmpd:182]: Sequence 0
Jul 05 17:19:48 GMT [ndmpd:182]: Timestamp 0
Jul 05 17:19:48 GMT [ndmpd:182]: Msgtype 1
Jul 05 17:19:48 GMT [ndmpd:182]: Method 768
Jul 05 17:19:48 GMT [ndmpd:182]: ReplySequence 6
Jul 05 17:19:48 GMT [ndmpd:182]: Error NDMP_NO_ERR
Jul 05 17:19:48 GMT [ndmpd:182]: Error code: NDMP_NO_TAPE_LOADED_ERR
Jul 05 17:19:48 GMT [ndmpd:182]: Device name: nrst0a
Jul 05 17:19:48 GMT [ndmpd:182]: Mode: 0
Jul 05 17:19:48 GMT [ndmpd:182]: IOException: Device cannot be opened. Device may have no tape.
Jul 05 17:19:48 GMT [ndmpd:182]: NDMP message type: NDMP_CONNECT_CLOSE
Jul 05 17:19:48 GMT [ndmpd:182]: NDMP message replysequence: 7
Jul 05 17:19:48 GMT [ndmpd:182]: Message Header:
Jul 05 17:19:48 GMT [ndmpd:182]: Sequence 0
Jul 05 17:19:48 GMT [ndmpd:182]: Timestamp 0
Jul 05 17:19:48 GMT [ndmpd:182]: Msgtype 1
Jul 05 17:19:48 GMT [ndmpd:182]: Method 2306
Jul 05 17:19:48 GMT [ndmpd:182]: ReplySequence 7
Jul 05 17:19:48 GMT [ndmpd:182]: Error NDMP_NO_ERR
Jul 05 17:19:48 GMT [ndmpd:182]: Cleaning up connection
Jul 05 17:19:48 GMT [ndmpd:182]: Message NDMP_NOTIFY_CONNECTION_STATUS sent
Jul 05 17:19:48 GMT [ndmpd:182]: Message Header:
Jul 05 17:19:48 GMT [ndmpd:182]: Sequence 8
Jul 05 17:19:48 GMT [ndmpd:182]: Timestamp 1278346788
Jul 05 17:19:48 GMT [ndmpd:182]: Msgtype 0
Jul 05 17:19:48 GMT [ndmpd:182]: Method 1282
Jul 05 17:19:48 GMT [ndmpd:182]: ReplySequence 0
Jul 05 17:19:48 GMT [ndmpd:182]: Error NDMP_NO_ERR
Jul 05 17:19:48 GMT [ndmpd:182]: Reason: 1
Jul 05 17:19:48 GMT [ndmpd:182]: version: 4
Jul 05 17:19:48 GMT [ndmpd:182]: Text: Connection shutdown
Jul 05 17:19:48 GMT [ndmpd:182]: Ndmpd session closed successfully
Jul 05 17:19:48 GMT [ndmpd:182]: Calling NdmpServer.kill
Jul 05 17:20:05 GMT [ndmpd:183]: Created an NDMP server connection
Jul 05 17:20:05 GMT [ndmpd:183]: Message NDMP_NOTIFY_CONNECTION_STATUS sent
Jul 05 17:20:05 GMT [ndmpd:183]: Message Header:
Jul 05 17:20:05 GMT [ndmpd:183]: Sequence 1
Jul 05 17:20:05 GMT [ndmpd:183]: Timestamp 1278346805
Jul 05 17:20:05 GMT [ndmpd:183]: Msgtype 0
Jul 05 17:20:05 GMT [ndmpd:183]: Method 1282
Jul 05 17:20:05 GMT [ndmpd:183]: ReplySequence 0
Jul 05 17:20:05 GMT [ndmpd:183]: Error NDMP_NO_ERR
Jul 05 17:20:05 GMT [ndmpd:183]: Reason: 0
Jul 05 17:20:05 GMT [ndmpd:183]: version: 4
Jul 05 17:20:05 GMT [ndmpd:183]: Text:
Jul 05 17:20:05 GMT [ndmpd:183]: NDMP message type: NDMP_CONNECT_OPEN
Jul 05 17:20:05 GMT [ndmpd:183]: NDMP message replysequence: 1
Jul 05 17:20:05 GMT [ndmpd:183]: Message Header:
Jul 05 17:20:05 GMT [ndmpd:183]: Sequence 0
Jul 05 17:20:05 GMT [ndmpd:183]: Timestamp 0
Jul 05 17:20:05 GMT [ndmpd:183]: Msgtype 1
Jul 05 17:20:05 GMT [ndmpd:183]: Method 2304
Jul 05 17:20:05 GMT [ndmpd:183]: ReplySequence 1
Jul 05 17:20:05 GMT [ndmpd:183]: Error NDMP_NO_ERR
Jul 05 17:20:05 GMT [ndmpd:183]: Request version: 4
Jul 05 17:20:05 GMT [ndmpd:183]: Error code: NDMP_NO_ERR
Jul 05 17:20:05 GMT [ndmpd:183]: NDMP message type: NDMP_CONFIG_GET_SERVER_INFO
Jul 05 17:20:05 GMT [ndmpd:183]: NDMP message replysequence: 2
Jul 05 17:20:05 GMT [ndmpd:183]: Message Header:
Jul 05 17:20:05 GMT [ndmpd:183]: Sequence 0
Jul 05 17:20:05 GMT [ndmpd:183]: Timestamp 0
Jul 05 17:20:05 GMT [ndmpd:183]: Msgtype 1
Jul 05 17:20:05 GMT [ndmpd:183]: Method 264
Jul 05 17:20:05 GMT [ndmpd:183]: ReplySequence 2
Jul 05 17:20:05 GMT [ndmpd:183]: Error NDMP_NO_ERR
Jul 05 17:20:05 GMT [ndmpd:183]: Error code: NDMP_NO_ERR
Jul 05 17:20:05 GMT [ndmpd:183]: Vendor:
Jul 05 17:20:05 GMT [ndmpd:183]: Product:
Jul 05 17:20:05 GMT [ndmpd:183]: Revision:
Jul 05 17:20:05 GMT [ndmpd:183]: NDMP message type: NDMP_CONFIG_GET_AUTH_TYPE_ATTR
Jul 05 17:20:05 GMT [ndmpd:183]: NDMP message replysequence: 3
Jul 05 17:20:05 GMT [ndmpd:183]: Message Header:
Jul 05 17:20:05 GMT [ndmpd:183]: Sequence 0
Jul 05 17:20:05 GMT [ndmpd:183]: Timestamp 0
Jul 05 17:20:05 GMT [ndmpd:183]: Msgtype 1
Jul 05 17:20:05 GMT [ndmpd:183]: Method 259
Jul 05 17:20:05 GMT [ndmpd:183]: ReplySequence 3
Jul 05 17:20:05 GMT [ndmpd:183]: Error NDMP_NO_ERR
Jul 05 17:20:05 GMT [ndmpd:183]: Auth type: 2
Jul 05 17:20:05 GMT [ndmpd:183]: Error code: NDMP_NO_ERR
Jul 05 17:20:05 GMT [ndmpd:183]: Challenge: [B@3224b0c
Jul 05 17:20:05 GMT [ndmpd:183]: NDMP message type: NDMP_CONNECT_CLIENT_AUTH
Jul 05 17:20:05 GMT [ndmpd:183]: NDMP message replysequence: 4
Jul 05 17:20:05 GMT [ndmpd:183]: Message Header:
Jul 05 17:20:05 GMT [ndmpd:183]: Sequence 0
Jul 05 17:20:05 GMT [ndmpd:183]: Timestamp 0
Jul 05 17:20:05 GMT [ndmpd:183]: Msgtype 1
Jul 05 17:20:05 GMT [ndmpd:183]: Method 2305
Jul 05 17:20:05 GMT [ndmpd:183]: ReplySequence 4
Jul 05 17:20:05 GMT [ndmpd:183]: Error NDMP_NO_ERR
Jul 05 17:20:05 GMT [ndmpd:183]: Auth Type: 2
Jul 05 17:20:05 GMT [ndmpd:183]: Error code: NDMP_NO_ERR
Jul 05 17:20:06 GMT [ndmpd:183]: NDMP message type: NDMP_CONFIG_GET_HOST_INFO_V3
Jul 05 17:20:06 GMT [ndmpd:183]: NDMP message replysequence: 5
Jul 05 17:20:06 GMT [ndmpd:183]: Message Header:
Jul 05 17:20:06 GMT [ndmpd:183]: Sequence 0
Jul 05 17:20:06 GMT [ndmpd:183]: Timestamp 0
Jul 05 17:20:06 GMT [ndmpd:183]: Msgtype 1
Jul 05 17:20:06 GMT [ndmpd:183]: Method 256
Jul 05 17:20:06 GMT [ndmpd:183]: ReplySequence 5
Jul 05 17:20:06 GMT [ndmpd:183]: Error NDMP_NO_ERR
Jul 05 17:20:06 GMT [ndmpd:183]: Error code: NDMP_NO_ERR
Jul 05 17:20:06 GMT [ndmpd:183]: NDMP message type: NDMP_TAPE_OPEN
Jul 05 17:20:06 GMT [ndmpd:183]: NDMP message replysequence: 6
Jul 05 17:20:06 GMT [ndmpd:183]: Message Header:
Jul 05 17:20:06 GMT [ndmpd:183]: Sequence 0
Jul 05 17:20:06 GMT [ndmpd:183]: Timestamp 0
Jul 05 17:20:06 GMT [ndmpd:183]: Msgtype 1
Jul 05 17:20:06 GMT [ndmpd:183]: Method 768
Jul 05 17:20:06 GMT [ndmpd:183]: ReplySequence 6
Jul 05 17:20:06 GMT [ndmpd:183]: Error NDMP_NO_ERR
Jul 05 17:20:06 GMT [ndmpd:183]: Error code: NDMP_NO_TAPE_LOADED_ERR
Jul 05 17:20:06 GMT [ndmpd:183]: Device name: nrst0a
Jul 05 17:20:06 GMT [ndmpd:183]: Mode: 0
Jul 05 17:20:06 GMT [ndmpd:183]: IOException: Device cannot be opened. Device may have no tape.
Jul 05 17:20:06 GMT [ndmpd:183]: NDMP message type: NDMP_CONNECT_CLOSE
Jul 05 17:20:06 GMT [ndmpd:183]: NDMP message replysequence: 7
Jul 05 17:20:06 GMT [ndmpd:183]: Message Header:
Jul 05 17:20:06 GMT [ndmpd:183]: Sequence 0
Jul 05 17:20:06 GMT [ndmpd:183]: Timestamp 0
Jul 05 17:20:06 GMT [ndmpd:183]: Msgtype 1
Jul 05 17:20:06 GMT [ndmpd:183]: Method 2306
Jul 05 17:20:06 GMT [ndmpd:183]: ReplySequence 7
Jul 05 17:20:06 GMT [ndmpd:183]: Error NDMP_NO_ERR
Jul 05 17:20:06 GMT [ndmpd:183]: Cleaning up connection
Jul 05 17:20:06 GMT [ndmpd:183]: Message NDMP_NOTIFY_CONNECTION_STATUS sent
Jul 05 17:20:06 GMT [ndmpd:183]: Message Header:
Jul 05 17:20:06 GMT [ndmpd:183]: Sequence 8
Jul 05 17:20:06 GMT [ndmpd:183]: Timestamp 1278346806
Jul 05 17:20:06 GMT [ndmpd:183]: Msgtype 0
Jul 05 17:20:06 GMT [ndmpd:183]: Method 1282
Jul 05 17:20:06 GMT [ndmpd:183]: ReplySequence 0
Jul 05 17:20:06 GMT [ndmpd:183]: Error NDMP_NO_ERR
Jul 05 17:20:06 GMT [ndmpd:183]: Reason: 1
Jul 05 17:20:06 GMT [ndmpd:183]: version: 4
Jul 05 17:20:06 GMT [ndmpd:183]: Text: Connection shutdown
Jul 05 17:20:06 GMT [ndmpd:183]: Ndmpd session closed successfully
Jul 05 17:20:06 GMT [ndmpd:183]: Calling NdmpServer.kill


Detailed job status from Master server:

05/07/2010 17:26:41 - requesting resource SAN2
05/07/2010 17:26:41 - requesting resource shr1nb01.NBU_CLIENT.MAXJOBS.san1
05/07/2010 17:26:41 - requesting resource shr1nb01.NBU_POLICY.MAXJOBS.SAN1-CC02FD
05/07/2010 17:26:41 - granted resource shr1nb01.NBU_CLIENT.MAXJOBS.san1
05/07/2010 17:26:41 - granted resource shr1nb01.NBU_POLICY.MAXJOBS.SAN1-CC02FD
05/07/2010 17:26:41 - granted resource T1W1MO
05/07/2010 17:26:41 - granted resource IBM.ULTRIUM-TD3.001
05/07/2010 17:26:41 - granted resource SAN2
05/07/2010 17:26:42 - estimated 0 kbytes needed
05/07/2010 17:26:43 - started process bpbrm (5320)
05/07/2010 17:26:43 - connecting
05/07/2010 17:26:43 - connected; connect time: 00:00:00
05/07/2010 17:26:47 - mounting T1W1MO
05/07/2010 17:27:23 - mounted; mount time: 00:00:36
05/07/2010 17:27:23 - positioning T1W1MO to file 1
05/07/2010 17:27:26 - positioned T1W1MO; position time: 00:00:03
05/07/2010 17:27:26 - begin writing
05/07/2010 17:44:06 - Error ndmpagent(pid=4364) NDMP backup failed, path = /vol/cc02fdvol/.snapshot/hourly.0/      
05/07/2010 17:44:34 - end writing; write time: 00:17:08
NDMP backup failure(99)


Demo,
The path is correct and I can confirm that.

Andy, I have tried creating a new policy as you mentioned but no joy. I will go through the T/Ns you have posted and update you.

I have raised a ticket with symantec and they are still working on this issue. It is almost a month and a half since this problem raised. :(

Regards,
Yash

Yashwanth
Level 3

Sorry, I think I missed the bit below. Here it is.

05/07/2010 17:00:02 - requesting resource SAN2
05/07/2010 17:00:02 - requesting resource shr1nb01.NBU_CLIENT.MAXJOBS.san1
05/07/2010 17:00:02 - requesting resource shr1nb01.NBU_POLICY.MAXJOBS.SAN1-CC02FD
05/07/2010 17:00:03 - granted resource shr1nb01.NBU_CLIENT.MAXJOBS.san1
05/07/2010 17:00:03 - granted resource shr1nb01.NBU_POLICY.MAXJOBS.SAN1-CC02FD
05/07/2010 17:00:03 - granted resource T1W1MO
05/07/2010 17:00:03 - granted resource IBM.ULTRIUM-TD3.001
05/07/2010 17:00:03 - granted resource SAN2
05/07/2010 17:00:03 - estimated 0 kbytes needed
05/07/2010 17:00:03 - begin Parent Job
05/07/2010 17:00:03 - begin Stream Discovery, Start Notify Script
05/07/2010 17:00:04 - started process RUNCMD (4804)
05/07/2010 17:00:05 - ended process 0 (4804)
Status 0
05/07/2010 17:00:05 - end Stream Discovery, Start Notify Script; elapsed time: 00:00:02
05/07/2010 17:00:05 - begin Stream Discovery, Stream Discovery
Status 0
05/07/2010 17:00:05 - end Stream Discovery, Stream Discovery; elapsed time: 00:00:00
05/07/2010 17:00:05 - begin Stream Discovery, Policy Execution Manager Preprocessed
Status 99
05/07/2010 17:44:34 - end Stream Discovery, Policy Execution Manager Preprocessed; elapsed time: 00:44:29
05/07/2010 17:44:34 - begin Stream Discovery, Stop On Error
Status 0
05/07/2010 17:44:34 - end Stream Discovery, Stop On Error; elapsed time: 00:00:00
05/07/2010 17:44:34 - begin Stream Discovery, End Notify Script
05/07/2010 17:44:36 - started process RUNCMD (5460)
05/07/2010 17:44:36 - ended process 0 (5460)
Status 0
05/07/2010 17:44:36 - end Stream Discovery, End Notify Script; elapsed time: 00:00:02
Status 99
05/07/2010 17:44:36 - end Parent Job; elapsed time: 00:44:33
NDMP backup failure(99)

Andy_Welburn
Level 6
Have you any other NDMP saves that work to this filer?

Andy_Welburn
Level 6
e.g. in /vol/vol0/etc/log/backup

That may indicate an issue with the actual dump from the filers point of view.

***EDIT***
Altho' having looked at one of our failed jobs it might not (there was actually more info in Job Details!) - this was caused by dodgy tape:
dmp Thu May 27 19:24:20 BST /vol/vol1/qtree(1) Tape_close (ndmp)
dmp Thu May 27 19:24:20 BST /vol/vol1/qtree(1) Abort (152382 MB)

Could you maybe by-pass NB & try a dump straight off the filer? That would at least indicate whether the issue was with NB or not.
i.e. similar to the "Troubleshooting" section in this T/N:
NDMP backup fails with Status Code 99 - DUMP: could not create "backup" snapshot : No space left on device.
http://seer.entsupport.symantec.com/docs/321198.htm
(altho' yours shouldn't be a space issue as you are taking a backup of a snaphot directly in the path)

Rick_Brown
Level 4
I've seen this issue once the data volume reached past 1.55 TB's. Does the backup fail consistently or at different intervals throughput the backup?

Nefarious
Level 4
There is an issues w/ large TIR files and NB 7.0.  I was given an EEB after a support call to address an issue I was having with restoring and duplicating backup image of a 2TB NetApp volume.  The patch resolve the issue and the support staff told me this EEB would be included in the 7.0.1 release. 

Have you tried backing up your NDMP volumes without specifying a snapshot?  In all of my NDMP policies (All NetApp 7.x based) I'm going directly to the volume and letting NDMP create and manage the snapshot.  When I have a backup running (set to backup /vol/vol0) my snap list output is:

  %/used       %/total  date          name
----------  ----------  ------------  --------
  0% ( 0%)    0% ( 0%)  Jul 06 15:40  snapshot_for_backup.2 (busy,backup[0],dump)
  0% ( 0%)    0% ( 0%)  Jul 06 12:00  hourly.0      
  0% ( 0%)    0% ( 0%)  Jul 06 08:01  hourly.1      
  0% ( 0%)    0% ( 0%)  Jul 06 00:00  nightly.0      

--Craig

Yashwanth
Level 3
Andy / Rick,
Thanks for your comments.
We are backing up the snapshots of the vfilers.  we have 9 vfilres which we are backing up using NDMP. Backups are failing almost every day. But I have seen some of the vfilers (4 or 5 vfilers not the same always) getting backed up successfully over the weekend or when there is less network activity. We have also checked Network components, switch ports and everything looks fine. I have also changed the NIC card on Master server and updated with the latest drivers. Still no Joy.
The backups work fine if the file history is set to no using the command "SET HIST =N". This was suggested by a symantec engineer for testing purpose. But if this is set, we cannot restore single file or folder.
Logged a call with Netapp aswell but they think it is a Netbackup issue.
Is there anything I need to check on the filer?

David_McMullin
Level 6
I had this same issue - never really resolved it, I think it may be memory related on my filer.

If I set max jobs to 1 ( so it backed up one directory path at a time ) I can run them fine.

Interestingly, this happens for me only on full backups, not incrementals.

So - in the policy attributes tab, set "Limit jobs per policy:" to 1, try it and see if it works.

Also, settign HIST = N, resolved my issue, but you cannot restore a single file - it was worth it for me to extend the backup to get that restore ability.

Andy_Welburn
Level 6

but did you have a look in the etc/log/dump (EDIT: should be etc/log/backup) log on the filer for any errors relating to the backup of the path in question?

Looking at your job details output again it looked like it failed very quickly after "begin writing"
i.e.
05/07/2010 17:27:26 - begin writing
05/07/2010 17:44:06 - Error ndmpagent(pid=4364) NDMP backup failed, path = /vol/cc02fdvol/.snapshot/hourly.0/

& this initial period is used to generate a list of files that need to be backed up (inode map) then writing the directory structure prior to dumping the actual file data on tape. These early stages can take quite some time for large volumes so you never know, there may be something in there?

Did you try dumping to tape outside of NB as I suggested earlier?

Yashwanth
Level 3
Thanks David,
I will try this for tonights backup and see how that goes. What exactly this attribute does? Would it take longer to finish the backups?


Yash

Andy_Welburn
Level 6

As a matter of interest how many streams do you run at once? Noticed from the output from your parent process that there was stream discovery so that was going to be one of my next questions along with how your policy was set up with respect to NEW_STREAM attributes.

Nefarious
Level 4

NDMP is a low priority process in ONTAP.   We've never had much luck running more than 1 stream at a time on our NDMP filers.  One stream would get the resouces it was allowed and the other(s) would essentially flounder around and eventually fail.

Yashwanth
Level 3
Hi Andy,
I did not find etc/log/dump file. If this is the file created after doing the volume dump by-passing NB, I have not done this yet.
I will try this and update with the logs.
The backups some times fails immediately but some time it does write data to tape and fails after an hour or some times after two hours.

Thanks all you guys for quick responses.

Yash

Andy_Welburn
Level 6

For us it is on /vol/vol0/etc/log/dump (EDIT: should be etc/log/backup) (/vol/vol0 being what I would compare to a UNIX root filesystem) - we have this (& many other) volumes NFS mounted on a Solaris box (actually our Master) to allow for interrogation of such files.

***EDIT***
Should be there for any of your NB instigated backups - well I presume it should as it's there for us!