Netbackup 6.5.4 and Netapp Ontap 7.2.6.1 NDMP backup problems
Hi,
We're using Netbackup 6.5.4 on a Solaris 10 host (master and media server) for backing up our Netapp FAS 3140 with ONTAP 7.2.6.1 on a Quantum PX 506 LTO tape library. Both media server and Netapp appliance are connected with fibrechannel to the tape library. Two drives are assigned to the Netapp appliance.
Since a few days we're having problems doing incremental backups on one of the volumes (/vol/home) on the NetApp filer. The incremental backup of other volumes run fine. The /vol/home which is causing the problems is our biggest volume. It contains about 30 million files and 1.6 TB (1.2 TB with NetApp block deduplication):
adnnfs03> df -h home Filesystem total used avail capacity Mounted on /vol/home/ 1440GB 1238GB 201GB 86% /vol/home/ /vol/home/.snapshot 360GB 253GB 106GB 70% /vol/home/.snapshot adnnfs03> df -i home Filesystem iused ifree %iused Mounted on /vol/home/ 30047941 4952040 86% /vol/home/ adnnfs03> df -hs home Filesystem used saved %saved /vol/home/ 1239GB 426GB 26%
Are there any known limitations or problems with this constellation and that amount of data? We didn't change our setup in the last few months and backing up the smaller volumes works fine. The backup process of /vol/home stalls for hours and then aborts with this error message:
03/15/2010 09:32:31 - requesting resource adnnfs03-ndmp 03/15/2010 09:32:31 - requesting resource adnbackup.NBU_CLIENT.MAXJOBS.adnnfs03 03/15/2010 09:32:31 - requesting resource adnbackup.NBU_POLICY.MAXJOBS.adnnfs03 03/15/2010 09:32:31 - granted resource adnbackup.NBU_CLIENT.MAXJOBS.adnnfs03 03/15/2010 09:32:31 - granted resource adnbackup.NBU_POLICY.MAXJOBS.adnnfs03 03/15/2010 09:32:31 - granted resource LM2727 03/15/2010 09:32:31 - granted resource HP.ULTRIUM3-SCSI.003 03/15/2010 09:32:31 - granted resource adnnfs03-ndmp 03/15/2010 09:32:32 - estimated 1905263392 kbytes needed 03/15/2010 09:32:32 - started process bpbrm (pid=29229) 03/15/2010 09:32:32 - connecting 03/15/2010 09:32:32 - connected; connect time: 0:00:00 03/15/2010 09:32:35 - mounting LM2727 03/15/2010 09:33:27 - mounted LM2727; mount time: 0:00:52 03/15/2010 09:33:31 - positioning LM2727 to file 39 03/15/2010 09:33:47 - positioned LM2727; position time: 0:00:16 03/15/2010 09:33:47 - begin writing 03/15/2010 17:34:18 - Error ndmpagent (pid=29234) aborting operation - no mover progress 03/15/2010 17:34:18 - Error ndmpagent (pid=29234) NDMP backup failed, path = /vol/home 03/15/2010 17:35:13 - end writing; write time: 8:01:26 NDMP backup failure (99)
During that time the NDMP debug logfile of the Netapp filer shows these messages:
Mar 15 23:59:46 GMT+01:00 [ndmpd:35]: Error code: NDMP_NO_TAPE_LOADED_ERR Mar 15 23:59:46 GMT+01:00 [ndmpd:35]: Device name: nrst0a Mar 15 23:59:46 GMT+01:00 [ndmpd:35]: Mode: 0 Mar 15 23:59:46 GMT+01:00 [ndmpd:35]: IOException: Device cannot be opened. Device may have no tape. Mar 15 23:59:46 GMT+01:00 [ndmpd:35]: NDMP message type: NDMP_CONNECT_CLOSE Mar 15 23:59:46 GMT+01:00 [ndmpd:35]: NDMP message replysequence: 7
I doubt that this message is true. There is a tape in the drive. Backing up other volumes works and a second drive is unused. In the meantime we begged our users to clean up their home directories and reduce the number of files and data size, hoping that this will help.
Thanks in advance for any hint.
Best regards,
Bernd
Hi,
Got a solution from Symantec support. By default the NDMP progress timeout is at 8 hours. Startign with patch levels 6.0MP7 and 6.5.4, a higher timeout can be defined. I increased it to 24 hours (1440 minutes):
echo 1440 > /usr/openv/netbackup/db/config/NDMP_PROGRESS_TIMEOUT
http://support.veritas.com/docs/249241
Best regards,
Bernd