cancel
Showing results for 
Search instead for 
Did you mean: 

NDMP Job take long time to write the data

Sagar_K
Level 3
Partner

Guys,

Need urgent help.

I have NDMP local backup, backup job comes to writing state but won't write any data.

It take very long time to write the data, i have attached the details logs from job monitor.

auperdrna03 is the NDMP host.

8/26/2013 11:15:11 AM - Info nbjm(pid=9828) starting backup job (jobid=45332) for client auperdrna03, policy AU-PER-DR-NA3-PI_TEST, schedule Month 
8/26/2013 11:15:11 AM - Info nbjm(pid=9828) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=45332, request id:{4689FD06-55C6-444A-9BEC-FC0FB242B2CA}) 
8/26/2013 11:15:11 AM - requesting resource auperbu6-hcart3-robot-tld-18-auperdrna03
8/26/2013 11:15:11 AM - requesting resource auperbu1.corp.riotinto.org.NBU_CLIENT.MAXJOBS.auperdrna03
8/26/2013 11:15:11 AM - requesting resource auperbu1.corp.riotinto.org.NBU_POLICY.MAXJOBS.AU-PER-DR-NA3-PI_TEST
8/26/2013 11:15:14 AM - granted resource auperbu1.corp.riotinto.org.NBU_CLIENT.MAXJOBS.auperdrna03
8/26/2013 11:15:14 AM - granted resource auperbu1.corp.riotinto.org.NBU_POLICY.MAXJOBS.AU-PER-DR-NA3-PI_TEST
8/26/2013 11:15:14 AM - granted resource DR5523
8/26/2013 11:15:14 AM - granted resource IBM.ULT3580-TD5.017
8/26/2013 11:15:14 AM - granted resource auperbu6-hcart3-robot-tld-18-auperdrna03
8/26/2013 11:15:15 AM - estimated 0 Kbytes needed
8/26/2013 11:15:15 AM - Info nbjm(pid=9828) started backup job for client auperdrna03, policy AU-PER-DR-NA3-PI_TEST, schedule Month on storage unit auperbu6-hcart3-robot-tld-18-auperdrna03
8/26/2013 11:15:17 AM - started process bpbrm (15016)
8/26/2013 11:15:18 AM - Info bpbrm(pid=15016) auperdrna03 is the host to backup data from    
8/26/2013 11:15:18 AM - Info bpbrm(pid=15016) reading file list from client       
8/26/2013 11:15:18 AM - Info bpbrm(pid=15016) starting ndmpagent on client        
8/26/2013 11:15:18 AM - Info ndmpagent(pid=11880) Backup started          
8/26/2013 11:15:18 AM - connecting
8/26/2013 11:15:18 AM - connected; connect time: 00:00:00
8/26/2013 11:15:19 AM - Info bptm(pid=5884) start           
8/26/2013 11:15:19 AM - Info bptm(pid=5884) using 256 data buffers        
8/26/2013 11:15:19 AM - Info bptm(pid=5884) using 262144 data buffer size       
8/26/2013 11:15:19 AM - Info bptm(pid=5884) start backup          
8/26/2013 11:15:19 AM - Info bptm(pid=5884) Waiting for mount of media id DR5523 (copy 1) on server auperbu6.corp.riotinto.org.
8/26/2013 11:15:19 AM - mounting DR5523
8/26/2013 11:15:50 AM - Info bptm(pid=5884) media id DR5523 mounted on drive index 36, drivepath nrst40a, drivename IBM.ULT3580-TD5.017, copy 1
8/26/2013 11:15:50 AM - mounted; mount time: 00:00:31
8/26/2013 11:15:51 AM - positioning DR5523 to file 3
8/26/2013 11:16:22 AM - positioned DR5523; position time: 00:00:31
8/26/2013 11:16:22 AM - begin writing
8/26/2013 11:21:44 AM - Error ndmpagent(pid=11880) terminated by parent process        
8/26/2013 11:21:45 AM - Info ndmpagent(pid=0) done           
8/26/2013 11:21:45 AM - Error ndmpagent(pid=11880) NDMP backup failed, path = /vol/TP_MYDOCS      
8/26/2013 11:21:48 AM - Error ndmpagent(pid=11880) NDMP_LOG_ERROR 10 DATA: Backup terminated (for /vol/TP_MYDOCS).     
8/26/2013 11:22:02 AM - end writing; write time: 00:05:40
termination requested by administrator(150)

NBU Version : 7.1.0.3

I have cancelled the job after waiting for sometime, this is a new volume and never got backup before, this volume have around 13TB of data.

Please let me know if you guys need any more details and help to check the possible causes.

 

 

 

7 REPLIES 7

Nicolai
Moderator
Moderator
Partner    VIP   

You need to view the NDMP filer syslog.

If this box is a EMC device you may find this tech note usefull : http://www.symantec.com/docs/TECH150646

Sagar_K
Level 3
Partner

It is a netapp device running ONTAP 8 OS.

Sagar_K
Level 3
Partner

I appriciate if someone help to understand how backup works when you give the backup selection as a SnapShot file name, like below.

It is a Snapshot image file which has 60GB of size but backup writes more than TB.

 

 

 

watsons
Level 6

You're saying the volume has about 13TB of data, and then you said it's just a snapshot file of 60GB, so which one is it?

A Netapp snapshot of weekly.0 , basically just a snapshot the system has taken at point-in-time in weekly basis, it's a group of pointers that keep track of changes to the original volume (baseline copy). Sometimes sysadmin can keep a copy of those snapshots by backing it up but usually it's much smaller than a full backup.

The first job details you provided, took only < 10 min, but did you just cancel it  (since it's error 150)? Can you wait a little bit longer before cancelling it?

Sagar_K
Level 3
Partner

Watson,

 

Thanks for your help,

Yes, the volume has 13TB in production and i am trying to backup from DR from its point-in-time snapshot.

I cancelled the job and as it was in writing state but actually not writing any data, will take long time to start writing on tapes?

Do you guys have any link to understand how point-in-time works, there is are no much details available in NDMP admin guide.

Appreciate all your help.

This is one of job which wrote some data but failed.

 

8/27/2013 6:53:11 PM - Info nbjm(pid=9828) starting backup job (jobid=47511) for client auperdrna03, policy RIOPERAP26_GEEFUSION-NDMP, schedule Weekly 
8/27/2013 6:53:11 PM - Info nbjm(pid=9828) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=47511, request id:{D552CBA5-A048-4F4C-AF7E-47FE98921E9A}) 
8/27/2013 6:53:11 PM - requesting resource NA3-NDMP-IBM-TS3500
8/27/2013 6:53:11 PM - requesting resource auperbu1.corp.riotinto.org.NBU_CLIENT.MAXJOBS.auperdrna03
8/27/2013 6:53:11 PM - requesting resource auperbu1.corp.riotinto.org.NBU_POLICY.MAXJOBS.RIOPERAP26_GEEFUSION-NDMP
8/27/2013 6:53:12 PM - granted resource auperbu1.corp.riotinto.org.NBU_CLIENT.MAXJOBS.auperdrna03
8/27/2013 6:53:12 PM - granted resource auperbu1.corp.riotinto.org.NBU_POLICY.MAXJOBS.RIOPERAP26_GEEFUSION-NDMP
8/27/2013 6:53:12 PM - granted resource DR5359
8/27/2013 6:53:12 PM - granted resource IBM.ULT3580-TD5.021
8/27/2013 6:53:12 PM - granted resource auperbu9-hcart3-robot-tld-18-auperdrna03
8/27/2013 6:53:12 PM - estimated 0 Kbytes needed
8/27/2013 6:53:12 PM - Info nbjm(pid=9828) started backup job for client auperdrna03, policy RIOPERAP26_GEEFUSION-NDMP, schedule Weekly on storage unit auperbu9-hcart3-robot-tld-18-auperdrna03
8/27/2013 6:53:14 PM - Info bpbrm(pid=6348) auperdrna03 is the host to backup data from    
8/27/2013 6:53:14 PM - Info bpbrm(pid=6348) reading file list from client       
8/27/2013 6:53:14 PM - started process bpbrm (6348)
8/27/2013 6:53:15 PM - Info bpbrm(pid=6348) starting ndmpagent on client        
8/27/2013 6:53:15 PM - Info ndmpagent(pid=3048) Backup started          
8/27/2013 6:53:15 PM - Info bptm(pid=8336) start           
8/27/2013 6:53:15 PM - Info bptm(pid=8336) using 30 data buffers        
8/27/2013 6:53:15 PM - Info bptm(pid=8336) using 65536 data buffer size       
8/27/2013 6:53:15 PM - connecting
8/27/2013 6:53:15 PM - connected; connect time: 00:00:00
8/27/2013 6:53:16 PM - Info bptm(pid=8336) start backup          
8/27/2013 6:53:16 PM - Info bptm(pid=8336) Waiting for mount of media id DR5359 (copy 1) on server auperbu9.
8/27/2013 6:53:16 PM - mounting DR5359
8/27/2013 6:53:48 PM - Info bptm(pid=8336) media id DR5359 mounted on drive index 30, drivepath nrst43a, drivename IBM.ULT3580-TD5.021, copy 1
8/27/2013 6:53:48 PM - mounted; mount time: 00:00:32
8/27/2013 6:53:49 PM - positioning DR5359 to file 9
8/27/2013 6:55:13 PM - positioned DR5359; position time: 00:01:24
8/27/2013 6:55:13 PM - begin writing
8/27/2013 8:57:26 PM - current media DR5359 complete, requesting next resource Any
8/27/2013 8:57:27 PM - Info bptm(pid=8336) Waiting for mount of media id DR6036 (copy 1) on server auperbu9.
8/27/2013 8:57:27 PM - granted resource DR6036
8/27/2013 8:57:27 PM - granted resource IBM.ULT3580-TD5.013
8/27/2013 8:57:27 PM - granted resource auperbu9-hcart3-robot-tld-18-auperdrna03
8/27/2013 8:57:27 PM - mounting DR6036
8/27/2013 8:57:59 PM - Info bptm(pid=8336) media id DR6036 mounted on drive index 32, drivepath nrst25a, drivename IBM.ULT3580-TD5.013, copy 1
8/27/2013 8:57:59 PM - mounted; mount time: 00:00:32
8/27/2013 8:58:00 PM - positioning DR6036 to file 1
8/27/2013 8:58:05 PM - positioned DR6036; position time: 00:00:05
8/27/2013 8:58:05 PM - begin writing
8/28/2013 4:57:22 AM - Info bptm(pid=8336) Waiting for mount of media id DR6192 (copy 1) on server auperbu9.
8/28/2013 4:57:22 AM - current media DR6036 complete, requesting next resource Any
8/28/2013 4:57:22 AM - granted resource DR6192
8/28/2013 4:57:22 AM - granted resource IBM.ULT3580-TD5.016
8/28/2013 4:57:22 AM - granted resource auperbu9-hcart3-robot-tld-18-auperdrna03
8/28/2013 4:57:22 AM - mounting DR6192
8/28/2013 4:58:07 AM - Info bptm(pid=8336) media id DR6192 mounted on drive index 35, drivepath nrst41a, drivename IBM.ULT3580-TD5.016, copy 1
8/28/2013 4:58:07 AM - mounted; mount time: 00:00:45
8/28/2013 4:58:07 AM - positioning DR6192 to file 1
8/28/2013 4:58:12 AM - positioned DR6192; position time: 00:00:05
8/28/2013 4:58:12 AM - begin writing
8/28/2013 9:03:24 AM - current media DR6192 complete, requesting next resource Any
8/28/2013 9:03:25 AM - Info bptm(pid=8336) Waiting for mount of media id DR6204 (copy 1) on server auperbu9.
8/28/2013 9:03:25 AM - granted resource DR6204
8/28/2013 9:03:25 AM - granted resource IBM.ULT3580-TD5.021
8/28/2013 9:03:25 AM - granted resource auperbu9-hcart3-robot-tld-18-auperdrna03
8/28/2013 9:03:25 AM - mounting DR6204
8/28/2013 9:04:10 AM - Info bptm(pid=8336) media id DR6204 mounted on drive index 30, drivepath nrst43a, drivename IBM.ULT3580-TD5.021, copy 1
8/28/2013 9:04:10 AM - mounted; mount time: 00:00:45
8/28/2013 9:04:10 AM - positioning DR6204 to file 1
8/28/2013 9:04:14 AM - positioned DR6204; position time: 00:00:04
8/28/2013 9:04:14 AM - begin writing
8/28/2013 1:58:04 PM - current media DR6204 complete, requesting next resource Any
8/28/2013 1:58:05 PM - Info bptm(pid=8336) Waiting for mount of media id DR6118 (copy 1) on server auperbu9.
8/28/2013 1:58:05 PM - granted resource DR6118
8/28/2013 1:58:05 PM - granted resource IBM.ULT3580-TD5.017
8/28/2013 1:58:05 PM - granted resource auperbu9-hcart3-robot-tld-18-auperdrna03
8/28/2013 1:58:05 PM - mounting DR6118
8/28/2013 1:58:49 PM - Info bptm(pid=8336) media id DR6118 mounted on drive index 36, drivepath nrst40a, drivename IBM.ULT3580-TD5.017, copy 1
8/28/2013 1:58:49 PM - mounted; mount time: 00:00:44
8/28/2013 1:58:49 PM - positioning DR6118 to file 1
8/28/2013 1:58:54 PM - positioned DR6118; position time: 00:00:05
8/28/2013 1:58:54 PM - begin writing
8/28/2013 5:47:01 PM - Error ndmpagent(pid=3048) NDMP_LOG_ERROR 275 DATA: Backup terminated: EVENT: I/O ERROR (for /vol/RIOPERAP26_GEEFUSION/.snapshot/weekly.0)  
8/28/2013 5:47:02 PM - Error ndmpagent(pid=3048) NDMP backup failed, path = /vol/RIOPERAP26_GEEFUSION/.snapshot/weekly.0      
8/28/2013 5:47:02 PM - Error bptm(pid=8336) none of the NDMP backups for client auperdrna03 completed successfully  

8/28/2013 5:51:44 PM - Info bptm(pid=8336) WriteFile() failed err = 0       
8/28/2013 5:51:44 PM - Error bptm(pid=8336) write error on media id DR6118, drive index 36, writing header block, 0
8/28/2013 5:51:45 PM - Error bptm(pid=8336) SUSPENDED media id DR6118, could not terminate correctly    
8/28/2013 5:51:45 PM - end writing; write time: 03:52:51
NDMP backup failure(99)

watsons
Level 6

Huge data indeed...

I will try with increasing the NDMP_PROGRESS_TIMEOUT to 2880 (2days) to see if it can run pass the point it fails. Detail of how to use the timeout value is in http://www.symantec.com/docs/TECH21246

If it still fails, you will need to check the error (of the exact time it fails) from the Netapp system & dump log. For your question below:

Do you guys have any link to understand how point-in-time works, there is are no much details available in NDMP admin guide.

You should consult this from Netapp, not Netbackup. The point-in-time snapshot created in Netapp volumes are different mechanism, find it from Netapp forum/support site. You won't find this from Netbackup docs.

Sagar_K
Level 3
Partner

Hi Watsons

NDMP_PROGRESS_TIMEOUT to 2880 - This is already in place, i have opened a case with symantec and they have requested sysstat output from the filer, i have intiated the job and capaturing the sysstat.

I will get symantec and netapp on the call :)

Thanks for your help.