NDMP backup failure
Need help on NDMP failure with error code 84
ALL cifs and nfs streams running successfully. A job with the file names imagenow/osm_01.00001 got failed .What could be a reason for this is similar files from same policy are completed successfully.
05/04/2015 08:48:34 - Info nbjm (pid=28528) starting backup job (jobid=5930722) for client isilonbkp, policy PROD_NFS_IMAGENOW_osm_1, schedule DLY_INCR 05/04/2015 08:48:34 - Info nbjm (pid=28528) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=5930722, request id:{41A92AE4-F264-11E4-B86F-CF01C23B3EE4}) 05/04/2015 08:48:34 - requesting resource STU_DDR1_DALMEDIA3 05/04/2015 08:48:34 - requesting resource nbutx2.NBU_CLIENT.MAXJOBS.isilonbkp 05/04/2015 08:48:34 - requesting resource nbutx2.NBU_POLICY.MAXJOBS.PROD_NFS_IMAGENOW_osm_1 05/04/2015 08:48:34 - granted resource nbutx2.NBU_CLIENT.MAXJOBS.isilonbkp 05/04/2015 08:48:34 - granted resource nbutx2.NBU_POLICY.MAXJOBS.PROD_NFS_IMAGENOW_osm_1 05/04/2015 08:48:34 - granted resource MediaID=@aaae7;DiskVolume=ddr1_vw;DiskPool=DDR1_DP; Path=ddr1_vw;StorageServer=ddr1;MediaServer=dalmedia3 05/04/2015 08:48:34 - granted resource STU_DDR1_DALMEDIA3 05/04/2015 08:48:34 - estimated 37417771 kbytes needed 05/04/2015 08:48:34 - Info nbjm (pid=28528) started backup (backupid=isilonbkp_1430747314) job for client isilonbkp, policy PROD_NFS_IMAGENOW_osm_1, schedule DLY_INCR on storage unit STU_DDR1_DALMEDIA3 05/04/2015 08:48:34 - started process bpbrm (pid=12509) 05/04/2015 08:48:34 - connecting 05/04/2015 08:48:35 - connected; connect time: 0:00:00 05/04/2015 08:48:36 - Info bptm (pid=12560) start backup 05/04/2015 08:48:37 - begin writing 05/04/2015 13:25:36 - Critical bptm (pid=12560) image write failed: error 2060046: plugin error 05/04/2015 13:25:36 - Critical bptm (pid=12560) sts_get_image_prop failed: error 2060046: plugin error 05/04/2015 13:25:38 - Error bptm (pid=12560) cannot write image to disk, Invalid argument 05/04/2015 13:25:38 - Error ndmpagent (pid=12559) NDMP backup failed, path = /ifs/HMS/imagenowapp/imagenow/ImageNow6/osm_01.00001 05/04/2015 13:25:41 - Info ndmpagent (pid=12559) isilonbkp: Filetransfer: Transferred 12658688 bytes in 16622.546 seconds throughput of 0.744 KB/s 05/04/2015 13:25:41 - Info ndmpagent (pid=12559) isilonbkp: Filetransfer: Transferred 12658688 total bytes 05/04/2015 13:25:41 - Info ndmpagent (pid=12559) isilonbkp: CPU user=51.328257 sys=4683.290557 ft=16622.538847 cdb=0.000000 05/04/2015 13:25:41 - Info ndmpagent (pid=12559) isilonbkp: maxrss=114340 in=710112 out=57 vol=5126083 inv=3738523 05/04/2015 13:25:41 - Error ndmpagent (pid=12559) isilonbkp: Failed to write data at offset 8462336 05/04/2015 13:25:41 - Info bptm (pid=12560) EXITING with status 84 <---------- 05/04/2015 13:25:41 - Info ndmpagent (pid=0) done. status: 84: media write error 05/04/2015 13:25:41 - end writing; write time: 4:37:04 media write error (84)
Looks like you will need to increase logging level.
There is no indication of what went wrong with current level 0 logging.
These are the only entries in bptm for this job (PID 12560):08:48:35.401 [12560] <2> bptm: INITIATING (VERBOSE = 0): -w -c isilonbkp -dpath ddr1_vw -stunit STU_DDR1_DALMEDIA3 -cl PROD_NFS_IMAGENOW_osm_1 -bt 1430747314 -b isilonbkp_1430747314 -st 1 -cj 1 -reqid -1429538155 -jm -brm -hostname isilonbkp -ru root -rclnt isilonbkp -rclnthostname isilonbkp -rl 1 -rp 1209600 -sl DLY_INCR -ct 19 -maxfrag 524288 -eari 0 -mediasvr dalmedia3 -connect_options 0x01030202 -jobid 5930722 -jobgrpid 5930721 -masterversion 760000 -bpbrm_shm_id 136019987 -blks_per_buffer 512 -ndmpport 48116 -df 3 08:48:36.758 [12560] <4> report_client: VBRC 2 12560 1 isilonbkp_1430747314 19 PROD_NFS_IMAGENOW_osm_1 1 DLY_INCR 0 1 1 08:48:36.877 [12560] <2> construct_sts_isid: master_server nbutx2.hms.hmsy.com, client isilonbkp, backup_time 1430747314, copy_number 1, stream_number 1, fragment_number 0, resume_number 0, spl_name NULL 08:48:37.195 [12560] <2> construct_sts_isid: master_server nbutx2.hms.hmsy.com, client isilonbkp, backup_time 1430747314, copy_number 1, stream_number 1, fragment_number 1, resume_number 0, spl_name NULL 08:48:37.284 [12560] <2> io_open_disk: file isilonbkp_1430747314_C1_F1 successfully opened 08:48:37.346 [12560] <4> write_backup: begin writing backup id isilonbkp_1430747314, copy 1, fragment 1, destination path ddr1_vw ..... 13:25:36.321 [12560] <2> delete_image_disk_sts_impl: Deleting disk header for isilonbkp_1430747314_C1_HDR
Strangely none of these entries for PID 12560 can be found in bptm log:
05/04/2015 13:25:36 - Critical bptm (pid=12560) image write failed: error 2060046: plugin error 05/04/2015 13:25:36 - Critical bptm (pid=12560) sts_get_image_prop failed: error 2060046: plugin error 05/04/2015 13:25:38 - Error bptm (pid=12560) cannot write image to disk, Invalid argument
It also seems that multiple other jobs ran successfully to DD and tape for the same NDMP client at the same time:
11:33:14.758 [13640] <4> write_backup: successfully wrote backup id isilonbkp_1430549662 11:55:25.479 [25889] <4> write_backup: successfully wrote backup id isilonbkp_1430549356 12:03:24.661 [1624] <4> write_backup: successfully wrote backup id isilonbkp_1430548954
How many backups are running simultaneously on the media server to tape and DD?
This maybe an indication of media server or DD overload.
Which OS and NBU version on the media server?
These touchfiles normally helps with process overload on MSDP media server.
See if this helps (Extract from http://www.symantec.com/docs/TECH156490 ):On Windows media server.--------------------------Create this empty file.<install path>\netbackup\db\config\DPS_PROXYNOEXPIREBe certain that there are no extensions on the file. For example, no .txt extension.Create these two files and put inside it the value 300.The 300 value is the only thing inside each file<install path>\netbackup\db\config\DPS_PROXYDEFAULTSENDTMO<install path>\netbackup\db\config\DPS_PROXYDEFAULTRECVTMORestart nbrmms (NetBackup Remote Manager and Monitor Service) on the media server,or just stop and restart all services.<install path>\netbackup\bin\bpdown -f -v<install path>\netbackup\bin\bpup -f -vOn UNIX or Linux:--------------------------# touch /usr/openv/netbackup/db/config/DPS_PROXYNOEXPIRE# echo "300" > /usr/openv/netbackup/db/config/DPS_PROXYDEFAULTSENDTMO# echo "300" > /usr/openv/netbackup/db/config/DPS_PROXYDEFAULTRECVTMORestart nbrmms (NetBackup Remote Manager and Monitor Service) on the media server.# pkill nbrmms# /usr/openv/netbackup/bin/nbrmmsOr, stop and restart all services on the MSDP media server.# /usr/openv/netbackup/bin/goodies/netbackup stop# /usr/openv/netbackup/bin/goodies/netbackup startAnother TN (TECH159707) suggests these values:
DPS_PROXYDEFAULTSENDTMO (value of 1800 inside)
DPS_PROXYDEFAULTRECVTMO (value of 1800 inside)I have also seen suggestions saying 800 in DPS_PROXYDEFAULTRECVTMO.
See if job is successful when you run less jobs to this media server.