03-16-2010 02:45 AM
Hi,
We're using Netbackup 6.5.4 on a Solaris 10 host (master and media server) for backing up our Netapp FAS 3140 with ONTAP 7.2.6.1 on a Quantum PX 506 LTO tape library. Both media server and Netapp appliance are connected with fibrechannel to the tape library. Two drives are assigned to the Netapp appliance.
Since a few days we're having problems doing incremental backups on one of the volumes (/vol/home) on the NetApp filer. The incremental backup of other volumes run fine. The /vol/home which is causing the problems is our biggest volume. It contains about 30 million files and 1.6 TB (1.2 TB with NetApp block deduplication):
adnnfs03> df -h home Filesystem total used avail capacity Mounted on /vol/home/ 1440GB 1238GB 201GB 86% /vol/home/ /vol/home/.snapshot 360GB 253GB 106GB 70% /vol/home/.snapshot adnnfs03> df -i home Filesystem iused ifree %iused Mounted on /vol/home/ 30047941 4952040 86% /vol/home/ adnnfs03> df -hs home Filesystem used saved %saved /vol/home/ 1239GB 426GB 26%
Are there any known limitations or problems with this constellation and that amount of data? We didn't change our setup in the last few months and backing up the smaller volumes works fine. The backup process of /vol/home stalls for hours and then aborts with this error message:
03/15/2010 09:32:31 - requesting resource adnnfs03-ndmp 03/15/2010 09:32:31 - requesting resource adnbackup.NBU_CLIENT.MAXJOBS.adnnfs03 03/15/2010 09:32:31 - requesting resource adnbackup.NBU_POLICY.MAXJOBS.adnnfs03 03/15/2010 09:32:31 - granted resource adnbackup.NBU_CLIENT.MAXJOBS.adnnfs03 03/15/2010 09:32:31 - granted resource adnbackup.NBU_POLICY.MAXJOBS.adnnfs03 03/15/2010 09:32:31 - granted resource LM2727 03/15/2010 09:32:31 - granted resource HP.ULTRIUM3-SCSI.003 03/15/2010 09:32:31 - granted resource adnnfs03-ndmp 03/15/2010 09:32:32 - estimated 1905263392 kbytes needed 03/15/2010 09:32:32 - started process bpbrm (pid=29229) 03/15/2010 09:32:32 - connecting 03/15/2010 09:32:32 - connected; connect time: 0:00:00 03/15/2010 09:32:35 - mounting LM2727 03/15/2010 09:33:27 - mounted LM2727; mount time: 0:00:52 03/15/2010 09:33:31 - positioning LM2727 to file 39 03/15/2010 09:33:47 - positioned LM2727; position time: 0:00:16 03/15/2010 09:33:47 - begin writing 03/15/2010 17:34:18 - Error ndmpagent (pid=29234) aborting operation - no mover progress 03/15/2010 17:34:18 - Error ndmpagent (pid=29234) NDMP backup failed, path = /vol/home 03/15/2010 17:35:13 - end writing; write time: 8:01:26 NDMP backup failure (99)
During that time the NDMP debug logfile of the Netapp filer shows these messages:
Mar 15 23:59:46 GMT+01:00 [ndmpd:35]: Error code: NDMP_NO_TAPE_LOADED_ERR Mar 15 23:59:46 GMT+01:00 [ndmpd:35]: Device name: nrst0a Mar 15 23:59:46 GMT+01:00 [ndmpd:35]: Mode: 0 Mar 15 23:59:46 GMT+01:00 [ndmpd:35]: IOException: Device cannot be opened. Device may have no tape. Mar 15 23:59:46 GMT+01:00 [ndmpd:35]: NDMP message type: NDMP_CONNECT_CLOSE Mar 15 23:59:46 GMT+01:00 [ndmpd:35]: NDMP message replysequence: 7
I doubt that this message is true. There is a tape in the drive. Backing up other volumes works and a second drive is unused. In the meantime we begged our users to clean up their home directories and reduce the number of files and data size, hoping that this will help.
Thanks in advance for any hint.
Best regards,
Bernd
Solved! Go to Solution.
03-24-2010 01:42 AM
Hi,
Got a solution from Symantec support. By default the NDMP progress timeout is at 8 hours. Startign with patch levels 6.0MP7 and 6.5.4, a higher timeout can be defined. I increased it to 24 hours (1440 minutes):
echo 1440 > /usr/openv/netbackup/db/config/NDMP_PROGRESS_TIMEOUT
http://support.veritas.com/docs/249241
Best regards,
Bernd
03-16-2010 03:13 AM
Nothing in NDMP logs between 09:30 and 17:35?
The log that you've posted is for 23:59 and seems to be correct - the NBU job has failed at 17:35 due to "no mover progress" and would've unloaded the tape within 5 minutes of the failure.
03-16-2010 03:53 AM
Hi Marianne,
Yes, there is NDP log on the Netapp, but it's just repeating the following sequence. Looks normal to me, except the "IOException: Device cannot be opened. Device may have no tape." line. They appear for both drives (nrst0a and nrst1a).
According to [1] the error code 99 most likely means in this case "An Incremental backup is run, but no files have changed since the last backup.". But files have changed. I assume that the incremental backup process takes too long to find all the files that have changed and aborts after 8 hours. The last full backup ran on 8 march and terminated fine. Our backup operator says that the incremental backups fail sometimes. I have to check if a full backup is working now.
Best regards,
Bernd
[1] http://seer.entsupport.symantec.com/docs/275330.htm
Netapp NDMP debug log:
Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Created an NDMP server connection Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Message NDMP_NOTIFY_CONNECTION_STATUS sent Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Message Header: Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Sequence 1 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Timestamp 1268670843 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Msgtype 0 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Method 1282 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: ReplySequence 0 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Error NDMP_NO_ERR Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Reason: 0 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: version: 4 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Text: Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: NDMP message type: NDMP_CONNECT_OPEN Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: NDMP message replysequence: 1 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Message Header: Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Sequence 0 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Timestamp 0 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Msgtype 1 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Method 2304 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: ReplySequence 1 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Error NDMP_NO_ERR Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Request version: 4 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Error code: NDMP_NO_ERR Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: NDMP message type: NDMP_CONFIG_GET_SERVER_INFO Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: NDMP message replysequence: 2 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Message Header: Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Sequence 0 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Timestamp 0 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Msgtype 1 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Method 264 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: ReplySequence 2 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Error NDMP_NO_ERR Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Error code: NDMP_NO_ERR Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Vendor: Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Product: Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Revision: Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: NDMP message type: NDMP_CONFIG_GET_AUTH_TYPE_ATTR Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: NDMP message replysequence: 3 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Message Header: Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Sequence 0 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Timestamp 0 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Msgtype 1 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Method 259 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: ReplySequence 3 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Error NDMP_NO_ERR Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Auth type: 2 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Error code: NDMP_NO_ERR Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Challenge: [B@fac8d3da Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: NDMP message type: NDMP_CONNECT_CLIENT_AUTH Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: NDMP message replysequence: 4 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Message Header: Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Sequence 0 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Timestamp 0 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Msgtype 1 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Method 2305 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: ReplySequence 4 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Error NDMP_NO_ERR Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Auth Type: 2 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Error code: NDMP_NO_ERR Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: NDMP message type: NDMP_CONFIG_GET_HOST_INFO_V3 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: NDMP message replysequence: 5 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Message Header: Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Sequence 0 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Timestamp 0 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Msgtype 1 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Method 256 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: ReplySequence 5 Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Error NDMP_NO_ERR Mar 15 17:34:03 GMT+01:00 [ndmpd:129]: Error code: NDMP_NO_ERR Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: NDMP message type: NDMP_TAPE_OPEN Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: NDMP message replysequence: 6 Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Message Header: Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Sequence 0 Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Timestamp 0 Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Msgtype 1 Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Method 768 Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: ReplySequence 6 Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Error NDMP_NO_ERR Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Error code: NDMP_NO_TAPE_LOADED_ERR Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Device name: nrst0a Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Mode: 0 Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: IOException: Device cannot be opened. Device may have no tape. Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: NDMP message type: NDMP_CONNECT_CLOSE Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: NDMP message replysequence: 7 Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Message Header: Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Sequence 0 Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Timestamp 0 Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Msgtype 1 Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Method 2306 Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: ReplySequence 7 Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Error NDMP_NO_ERR Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Cleaning up connection Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Message NDMP_NOTIFY_CONNECTION_STATUS sent Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Message Header: Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Sequence 8 Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Timestamp 1268670844 Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Msgtype 0 Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Method 1282 Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: ReplySequence 0 Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Error NDMP_NO_ERR Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Reason: 1 Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: version: 4 Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Text: Connection shutdown Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Ndmpd session closed successfully Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Calling NdmpServer.kill Mar 15 17:34:04 GMT+01:00 [ndmpd:129]: Error code: NDMP_NO_ERR Mar 15 17:34:04 GMT+01:00 [ndmpd:113]: NDMP message type: NDMP_DATA_GET_STATE_V2 Mar 15 17:34:04 GMT+01:00 [ndmpd:113]: NDMP message replysequence: 1840 Mar 15 17:34:04 GMT+01:00 [ndmpd:113]: Message Header: Mar 15 17:34:04 GMT+01:00 [ndmpd:113]: Sequence 0 Mar 15 17:34:04 GMT+01:00 [ndmpd:113]: Timestamp 0 Mar 15 17:34:04 GMT+01:00 [ndmpd:113]: Msgtype 1 Mar 15 17:34:04 GMT+01:00 [ndmpd:113]: Method 1024 Mar 15 17:34:04 GMT+01:00 [ndmpd:113]: ReplySequence 1840 Mar 15 17:34:04 GMT+01:00 [ndmpd:113]: Error NDMP_NO_ERR Mar 15 17:34:04 GMT+01:00 [ndmpd:113]: Error code: NDMP_NO_ERR Mar 15 17:34:04 GMT+01:00 [ndmpd:113]: NDMP message type: NDMP_MOVER_GET_STATE Mar 15 17:34:04 GMT+01:00 [ndmpd:113]: NDMP message replysequence: 1841 Mar 15 17:34:04 GMT+01:00 [ndmpd:113]: Message Header: Mar 15 17:34:04 GMT+01:00 [ndmpd:113]: Sequence 0 Mar 15 17:34:04 GMT+01:00 [ndmpd:113]: Timestamp 0 Mar 15 17:34:04 GMT+01:00 [ndmpd:113]: Msgtype 1 Mar 15 17:34:04 GMT+01:00 [ndmpd:113]: Method 2560 Mar 15 17:34:04 GMT+01:00 [ndmpd:113]: ReplySequence 1841 Mar 15 17:34:04 GMT+01:00 [ndmpd:113]: Error NDMP_NO_ERR Mar 15 17:34:04 GMT+01:00 [ndmpd:113]: Error code: NDMP_NO_ERR Mar 15 17:34:04 GMT+01:00 [ndmpd:113]: Mover state: NDMP_MOVER_STATE_ACTIVE
03-16-2010 07:05 AM
OK. Full backups are not affected - at least a full backup of /vol/home runs now for 5.5 hours and has written 105 GB onto tape. Only incremental is failing.
03-16-2010 04:52 PM
03-16-2010 04:53 PM
03-16-2010 10:10 PM
03/15/2010 09:32:35 - mounting LM2727 03/15/2010 09:33:27 - mounted LM2727; mount time: 0:00:52
03-17-2010 12:30 AM
03/16/2010 11:34:48 - estimated 0 kbytes needed 03/16/2010 11:34:49 - started process bpbrm (pid=21050) 03/16/2010 11:34:49 - connecting 03/16/2010 11:34:49 - connected; connect time: 0:00:00 03/16/2010 11:34:52 - mounting MD7569 03/16/2010 11:36:01 - mounted MD7569; mount time: 0:01:09 03/16/2010 11:36:05 - positioning MD7569 to file 1 03/16/2010 11:36:08 - positioned MD7569; position time: 0:00:03 03/16/2010 11:36:08 - begin writing 03/16/2010 17:05:00 - current media MD7569 complete, requesting next media Any 03/16/2010 17:05:01 - granted resource MD7578 03/16/2010 17:05:01 - granted resource HP.ULTRIUM3-SCSI.002 03/16/2010 17:05:01 - granted resource adnnfs03-ndmp 03/16/2010 17:05:02 - mounting MD7578 03/16/2010 17:05:53 - mounted MD7578; mount time: 0:00:51 03/16/2010 17:05:55 - positioning MD7578 to file 1 03/16/2010 17:05:58 - positioned MD7578; position time: 0:00:03 03/16/2010 17:05:58 - begin writing 03/16/2010 19:53:17 - current media MD7578 complete, requesting next media Any 03/16/2010 19:53:18 - granted resource MD7567 03/16/2010 19:53:18 - granted resource HP.ULTRIUM3-SCSI.003 03/16/2010 19:53:18 - granted resource adnnfs03-ndmp 03/16/2010 19:53:19 - mounting MD7567 03/16/2010 19:54:10 - mounted MD7567; mount time: 0:00:51 03/16/2010 19:54:13 - positioning MD7567 to file 1 03/16/2010 19:54:16 - positioned MD7567; position time: 0:00:03 03/16/2010 19:54:16 - begin writing 03/16/2010 23:08:08 - end writing; write time: 3:13:52 the requested operation was successfully completed (0)
03-17-2010 09:34 AM
adnnfs03> df -h /vol/home Filesystem total used avail capacity Mounted on /vol/home/ 1440GB 1171GB 268GB 81% /vol/home/ /vol/home/.snapshot 360GB 341GB 18GB 95% /vol/home/.snapshot adnnfs03> df -i /vol/home Filesystem iused ifree %iused Mounted on /vol/home/ 27748657 7251324 79% /vol/home/The incremental backup ran now fine:
03/17/2010 08:07:01 - estimated 0 kbytes needed 03/17/2010 08:07:02 - started process bpbrm (pid=6977) 03/17/2010 08:07:02 - connecting 03/17/2010 08:07:02 - connected; connect time: 0:00:00 03/17/2010 08:07:05 - mounting LM2727 03/17/2010 08:08:02 - mounted LM2727; mount time: 0:00:57 03/17/2010 08:08:06 - positioning LM2727 to file 45 03/17/2010 08:09:51 - positioned LM2727; position time: 0:01:45 03/17/2010 08:09:51 - begin writing 03/17/2010 11:17:35 - end writing; write time: 3:07:44 the requested operation was successfully completed (0)So there must be a bug when a Netapp volume contains more than 30 million inodes and ist backed up with NDMP. Is there a limitation known with Netbackup 6.5.4 and Data ONTAP 7.2.6.1? Restoring files to that volume also fails:
03/17/2010 08:16:49 - begin Restore 03/17/2010 09:49:43 - media needed: KVB802 03/17/2010 09:49:43 - media needed: KZ8690 03/17/2010 09:49:43 - media needed: KV6769 03/17/2010 09:49:43 - restoring from image adnnfs03_1265416147 03/17/2010 09:49:44 - connecting 03/17/2010 09:49:44 - connected; connect time: 0:00:00 03/17/2010 09:49:44 - started process bptm (pid=13279) 03/17/2010 09:49:47 - started process bptm (pid=13279) 03/17/2010 09:49:47 - mounting KZ8690 03/17/2010 09:49:45 - requesting resource KZ8690 03/17/2010 09:49:45 - granted resource KZ8690 03/17/2010 09:49:45 - granted resource HP.ULTRIUM3-SCSI.003 03/17/2010 09:50:44 - mounted KZ8690; mount time: 0:00:57 03/17/2010 09:50:44 - positioning KZ8690 to file 3 03/17/2010 09:51:29 - positioned KZ8690; position time: 0:00:45 03/17/2010 09:51:29 - begin reading 03/17/2010 10:51:29 - Error bpbrm (pid=13266) socket read failed: errno = 62 - Timer expired 03/17/2010 11:27:35 - Error bptm (pid=13279) media manager terminated by parent process 03/17/2010 11:27:35 - Error ndmpagent (pid=13278) NDMP restore failed from path /vol/home 03/17/2010 11:27:43 - Error bpbrm (pid=13266) client restore EXIT STATUS 13: file read failed 03/17/2010 11:27:43 - restored from image adnnfs03_1265416147; restore time: 1:38:00 03/17/2010 11:27:43 - end Restore; elapsed time 3:10:54 the restore failed to recover the requested files (5)Restoring a backup from /vol/home to another volume with less used inodes on the same Netapp filer is being tested. Best regards, Bernd
03-17-2010 10:41 AM
Hi,
Can you turn on logging on your netapp and post the results here?
I've never had to turn it on, but from the documentation the command is:
options backup.log.enable [on | off]
03-18-2010 01:49 AM
Hi Matt,
Master and media server is the same host and there is no firewall between. The backup server and the Netapp filer are both directly attached with fibre channel to the tape library. Backup log is enabled by default. Here's the content of the backup log from the failed incremental backup on 15 march:
dmp Mon Mar 15 09:58:54 CET /vol/home/(0) Start (Level 1, NDMP) dmp Mon Mar 15 09:58:54 CET /vol/home/(0) Options (b=63, u) dmp Mon Mar 15 09:58:54 CET /vol/home/(0) Snapshot (snapshot_for_backup.6050, Mon Mar 15 09:58:53 CET) dmp Mon Mar 15 09:58:54 CET /vol/home/(0) Base_dump (Level 0, Sun Mar 7 03:15:51 CET) dmp Mon Mar 15 09:58:57 CET /vol/home/(0) Tape_open (ndmp) dmp Mon Mar 15 09:58:57 CET /vol/home/(0) Phase_change (I) dmp Mon Mar 15 17:59:25 CET /vol/home/(0) Tape_close (ndmp) dmp Mon Mar 15 17:59:25 CET /vol/home/(0) Abort (0 MB) dmp Mon Mar 15 17:59:25 CET /vol/home/(0) Log_msg (reg inodes: 15870475 other inodes: 831105 dirs: 3850221 nt dirs: 1364 nt inodes: 1324 acls: 0) dmp Mon Mar 15 17:59:25 CET /vol/home/(0) Log_msg (Phase 3: directories dumped: 0) dmp Mon Mar 15 17:59:25 CET /vol/home/(0) Log_msg (Phase 3: wafl directory blocks read: 0) dmp Mon Mar 15 17:59:25 CET /vol/home/(0) Log_msg (Phase 3 throughput (MB sec): read 0 write 0) dmp Mon Mar 15 17:59:25 CET /vol/home/(0) Log_msg (Percent of phase3 time spent for: reading inos 0% dumping ino 0%) dmp Mon Mar 15 17:59:25 CET /vol/home/(0) Log_msg (Percent of phase3 dump time spent for: convert-wafl-dirs 0% lev0-ra 0%) dmp Mon Mar 15 17:59:25 CET /vol/home/(0) Log_msg (Phase 3 averages (usec): wafl load buf time 0 level 0 ra time 0) dmp Mon Mar 15 17:59:25 CET /vol/home/(0) Log_msg (Phase 4: inodes dumped: 0) dmp Mon Mar 15 17:59:25 CET /vol/home/(0) Log_msg (Phase 4: wafl data blocks read: 0) dmp Mon Mar 15 17:59:25 CET /vol/home/(0) Log_msg (Percent of phase4 time spent for: reading inos 0% dumping inos 0%) dmp Mon Mar 15 17:59:25 CET /vol/home/(0) Log_msg (Percent of phase4 dump time spent for: wafl read iovec: 0% lev0-ra 0%) dmp Mon Mar 15 17:59:25 CET /vol/home/(0) Log_msg (Phase 4 averages (usec): wafl read iovec time 0 level 0 ra time 0)
The log of the successful full backup from 16 march:
dmp Tue Mar 16 12:01:28 CET /vol/home/(0) Start (Level 0, NDMP) dmp Tue Mar 16 12:01:28 CET /vol/home/(0) Options (b=63, u) dmp Tue Mar 16 12:01:28 CET /vol/home/(0) Snapshot (snapshot_for_backup.6227, Tue Mar 16 12:01:18 CET) dmp Tue Mar 16 12:01:33 CET /vol/home/(0) Tape_open (ndmp) dmp Tue Mar 16 12:01:33 CET /vol/home/(0) Phase_change (I) dmp Tue Mar 16 12:13:56 CET /vol/home/(0) Phase_change (II) dmp Tue Mar 16 12:14:12 CET /vol/home/(0) Phase_change (III) dmp Tue Mar 16 14:39:48 CET /vol/home/(0) Phase_change (IV) dmp Tue Mar 16 22:12:28 CET /vol/home/(0) Phase_change (V) dmp Tue Mar 16 22:12:30 CET /vol/home/(0) Tape_close (ndmp) dmp Tue Mar 16 22:12:31 CET /vol/home/(0) End (1602458 MB) dmp Tue Mar 16 22:12:31 CET /vol/home/(0) Log_msg (reg inodes: 22542131 other inodes: 891108 dirs: 5013545 nt dirs: 1368 nt inodes: 1330 acls: 0) dmp Tue Mar 16 22:12:31 CET /vol/home/(0) Log_msg (Phase 3: directories dumped: 5014914) dmp Tue Mar 16 22:12:31 CET /vol/home/(0) Log_msg (Phase 3: wafl directory blocks read: 5204184) dmp Tue Mar 16 22:12:31 CET /vol/home/(0) Log_msg (Phase 3: average wafl directory blocks per inode: 1) dmp Tue Mar 16 22:12:31 CET /vol/home/(0) Log_msg (Phase 3: average tape blocks per inode: 2) dmp Tue Mar 16 22:12:31 CET /vol/home/(0) Log_msg (Phase 3 throughput (MB sec): read 3 write 1) dmp Tue Mar 16 22:12:31 CET /vol/home/(0) Log_msg (Percent of phase3 time spent for: reading inos 0% dumping ino 99%) dmp Tue Mar 16 22:12:31 CET /vol/home/(0) Log_msg (Percent of phase3 dump time spent for: convert-wafl-dirs 94% lev0-ra 1%) dmp Tue Mar 16 22:12:31 CET /vol/home/(0) Log_msg (Phase 3 averages (usec): wafl load buf time 14 level 0 ra time 12) dmp Tue Mar 16 22:12:31 CET /vol/home/(0) Log_msg (Phase 4: inodes dumped: 23434569) dmp Tue Mar 16 22:12:31 CET /vol/home/(0) Log_msg (Phase 4: wafl data blocks read: 415358068) dmp Tue Mar 16 22:12:31 CET /vol/home/(0) Log_msg (Phase 4: average wafl data blocks per inode: 17) dmp Tue Mar 16 22:12:31 CET /vol/home/(0) Log_msg (Phase 4: average tape data blocks per inode: 69) dmp Tue Mar 16 22:12:31 CET /vol/home/(0) Log_msg (Phase 4 throughput (MB sec): read 61 write 59) dmp Tue Mar 16 22:12:31 CET /vol/home/(0) Log_msg (Percent of phase4 time spent for: reading inos 0% dumping inos 98%) dmp Tue Mar 16 22:12:31 CET /vol/home/(0) Log_msg (Percent of phase4 dump time spent for: wafl read iovec: 19% lev0-ra 2%) dmp Tue Mar 16 22:12:31 CET /vol/home/(0) Log_msg (Phase 4 averages (usec): wafl read iovec time 112 level 0 ra time 25) dmp Tue Mar 16 22:12:31 CET /vol/home/(0) Log_msg (Tape write times (msec): average: 0 max: 69030) dmp Tue Mar 16 22:12:31 CET /vol/home/(0) Log_msg (Tape changes: 1)
The time of netapp filer and backup server differ by 24 minutes because NTP daemon wasn't running. We also have sometimes problems restoring files onto that volume. I'm currently testing this issue iteratively to find out under wich conditions this happens. Restoring takes many hours until it has found the requested file.
Best regards,
Bernd
03-18-2010 02:11 AM
03-18-2010 04:56 AM
Hi Andy,
In the backup policy we have /vol/home (without trailing slash) together with 19 other volumes. So it's a volume dump without directory traversal. We don't have qtrees in the user home volume for about 200 users. That policy ran fine for about a year with full and incremental backups. We upgraded then to Netbackup 6.5.4 due to a backup problem caused by unicode file names.
In the meantime the number of inodes sank from 30 mio to 27 mio. An incremental backup ran yesterday. Also a restore from a full backup with less than 30 mio inodes ran through. I'm currently testing restores from a full backup with more than 30 mio inodes some weeks in the past. That one failed previously.
30 mio is an odd number and not close to the border of computer data types. I would understand it, if were crossed 2^31 inodes. It looks more like a timing issue. Calculating the difference takes longer than a certain timeout. The backup and media server is this model:
[root@adnbackup ~]$ prtdiag | more System Configuration: Sun Microsystems sun4us Fujitsu Siemens PRIMEPOWER450 2x SPARC64 V System clock frequency: 220 MHz Memory size: 4096 Megabytes ========================= CPUs ========================= Run Ecache CPU CPU Brd CPU Module MHz MB Impl. Mask --- --- ------- ----- ------ ------ ---- 0 0 0 1869 3.0 5 5.1 0 1 1 1869 3.0 5 5.1
Not the most powerful model of today, but it ran fine for years.
Best regards,
Bernd
03-24-2010 01:42 AM
Hi,
Got a solution from Symantec support. By default the NDMP progress timeout is at 8 hours. Startign with patch levels 6.0MP7 and 6.5.4, a higher timeout can be defined. I increased it to 24 hours (1440 minutes):
echo 1440 > /usr/openv/netbackup/db/config/NDMP_PROGRESS_TIMEOUT
http://support.veritas.com/docs/249241
Best regards,
Bernd
03-24-2010 01:57 AM
03-24-2010 06:34 AM