cancel
Showing results for 
Search instead for 
Did you mean: 

NDMP backup failure

ankur1809
Level 5

Need help on NDMP failure with error code 84

ALL cifs and nfs streams running successfully. A job with the file names imagenow/osm_01.00001 got failed .What could be a reason for this is similar files from same policy are completed successfully.

05/04/2015 08:48:34 - Info nbjm (pid=28528) starting backup job (jobid=5930722) 
for client isilonbkp, policy PROD_NFS_IMAGENOW_osm_1, schedule DLY_INCR 
05/04/2015 08:48:34 - Info nbjm (pid=28528) requesting STANDARD_RESOURCE resources 
from RB for backup job (jobid=5930722, request id:{41A92AE4-F264-11E4-B86F-CF01C23B3EE4}) 
05/04/2015 08:48:34 - requesting resource STU_DDR1_DALMEDIA3 
05/04/2015 08:48:34 - requesting resource nbutx2.NBU_CLIENT.MAXJOBS.isilonbkp 
05/04/2015 08:48:34 - requesting resource nbutx2.NBU_POLICY.MAXJOBS.PROD_NFS_IMAGENOW_osm_1 
05/04/2015 08:48:34 - granted resource  nbutx2.NBU_CLIENT.MAXJOBS.isilonbkp 
05/04/2015 08:48:34 - granted resource  nbutx2.NBU_POLICY.MAXJOBS.PROD_NFS_IMAGENOW_osm_1 
05/04/2015 08:48:34 - granted resource  MediaID=@aaae7;DiskVolume=ddr1_vw;DiskPool=DDR1_DP;
Path=ddr1_vw;StorageServer=ddr1;MediaServer=dalmedia3 
05/04/2015 08:48:34 - granted resource  STU_DDR1_DALMEDIA3 
05/04/2015 08:48:34 - estimated 37417771 kbytes needed 
05/04/2015 08:48:34 - Info nbjm (pid=28528) started backup (backupid=isilonbkp_1430747314) 
job for client isilonbkp, policy PROD_NFS_IMAGENOW_osm_1, schedule DLY_INCR on storage unit STU_DDR1_DALMEDIA3 
05/04/2015 08:48:34 - started process bpbrm (pid=12509) 
05/04/2015 08:48:34 - connecting 
05/04/2015 08:48:35 - connected; connect time: 0:00:00 
05/04/2015 08:48:36 - Info bptm (pid=12560) start backup 
05/04/2015 08:48:37 - begin writing 
05/04/2015 13:25:36 - Critical bptm (pid=12560) image write failed: error 2060046: plugin error 
05/04/2015 13:25:36 - Critical bptm (pid=12560) sts_get_image_prop failed: error 2060046: plugin error 
05/04/2015 13:25:38 - Error bptm (pid=12560) cannot write image to disk, Invalid argument 
05/04/2015 13:25:38 - Error ndmpagent (pid=12559) NDMP backup failed, 
path = /ifs/HMS/imagenowapp/imagenow/ImageNow6/osm_01.00001 
05/04/2015 13:25:41 - Info ndmpagent (pid=12559) isilonbkp: Filetransfer: Transferred 12658688 
bytes in 16622.546 seconds throughput of 0.744 KB/s 
05/04/2015 13:25:41 - Info ndmpagent (pid=12559) isilonbkp: Filetransfer: Transferred 12658688 total bytes 
05/04/2015 13:25:41 - Info ndmpagent (pid=12559) isilonbkp: CPU  user=51.328257  sys=4683.290557  
ft=16622.538847  cdb=0.000000 
05/04/2015 13:25:41 - Info ndmpagent (pid=12559) isilonbkp: maxrss=114340  in=710112  out=57  vol=5126083  inv=3738523 
05/04/2015 13:25:41 - Error ndmpagent (pid=12559) isilonbkp: Failed to write data at offset 8462336 
05/04/2015 13:25:41 - Info bptm (pid=12560) EXITING with status 84 <---------- 
05/04/2015 13:25:41 - Info ndmpagent (pid=0) done. status: 84: media write error 
05/04/2015 13:25:41 - end writing; write time: 4:37:04 
media write error  (84)
1 ACCEPTED SOLUTION

Accepted Solutions

Marianne
Level 6
Partner    VIP    Accredited Certified

Looks like you will need to increase logging level.

There is no indication of what went wrong with current level 0 logging. 
These are the only entries in bptm for this job (PID 12560):

08:48:35.401 [12560] <2> bptm: INITIATING (VERBOSE = 0): -w -c isilonbkp -dpath ddr1_vw -stunit STU_DDR1_DALMEDIA3 -cl PROD_NFS_IMAGENOW_osm_1 -bt 1430747314 -b isilonbkp_1430747314 -st 1 -cj 1 -reqid -1429538155 -jm -brm -hostname isilonbkp -ru root -rclnt isilonbkp -rclnthostname isilonbkp -rl 1 -rp 1209600 -sl DLY_INCR -ct 19 -maxfrag 524288 -eari 0 -mediasvr dalmedia3 -connect_options 0x01030202 -jobid 5930722 -jobgrpid 5930721 -masterversion 760000 -bpbrm_shm_id 136019987 -blks_per_buffer 512 -ndmpport 48116 -df 3
08:48:36.758 [12560] <4> report_client: VBRC 2 12560 1 isilonbkp_1430747314 19 PROD_NFS_IMAGENOW_osm_1 1 DLY_INCR 0 1 1
08:48:36.877 [12560] <2> construct_sts_isid: master_server nbutx2.hms.hmsy.com, client isilonbkp, backup_time 1430747314, copy_number 1, stream_number 1, fragment_number 0, resume_number 0, spl_name NULL
08:48:37.195 [12560] <2> construct_sts_isid: master_server nbutx2.hms.hmsy.com, client isilonbkp, backup_time 1430747314, copy_number 1, stream_number 1, fragment_number 1, resume_number 0, spl_name NULL
08:48:37.284 [12560] <2> io_open_disk: file isilonbkp_1430747314_C1_F1 successfully opened
08:48:37.346 [12560] <4> write_backup: begin writing backup id isilonbkp_1430747314, copy 1, fragment 1, destination path ddr1_vw
.....
13:25:36.321 [12560] <2> delete_image_disk_sts_impl: Deleting disk header for isilonbkp_1430747314_C1_HDR

Strangely none of these entries for PID 12560 can be found in bptm log:

05/04/2015 13:25:36 - Critical bptm (pid=12560) image write failed: error 2060046: plugin error 
05/04/2015 13:25:36 - Critical bptm (pid=12560) sts_get_image_prop failed: error 2060046: plugin error 
05/04/2015 13:25:38 - Error bptm (pid=12560) cannot write image to disk, Invalid argument 

It also seems that multiple other jobs ran successfully to DD and tape for the same NDMP client at the same time:

11:33:14.758 [13640] <4> write_backup: successfully wrote backup id isilonbkp_1430549662 
11:55:25.479 [25889] <4> write_backup: successfully wrote backup id isilonbkp_1430549356 
12:03:24.661 [1624] <4> write_backup: successfully wrote backup id isilonbkp_1430548954

 

How many backups are running simultaneously on the media server to tape and DD?

This maybe an indication of media server or DD overload.

Which OS and NBU version on the media server?

These touchfiles normally helps with process overload on MSDP media server. 
See if this helps (Extract from http://www.symantec.com/docs/TECH156490 :(

On Windows media server. 
-------------------------- 
Create this empty file. 
<install path>\netbackup\db\config\DPS_PROXYNOEXPIRE 
Be certain that there are no extensions on the file. For example, no .txt extension. 
 
Create these two files and put inside it the value 300. 
The 300 value is the only thing inside each file 
<install path>\netbackup\db\config\DPS_PROXYDEFAULTSENDTMO 
<install path>\netbackup\db\config\DPS_PROXYDEFAULTRECVTMO 
 
Restart nbrmms (NetBackup Remote Manager and Monitor Service) on the media server, 
    or just stop and restart all services. 
<install path>\netbackup\bin\bpdown -f -v 
<install path>\netbackup\bin\bpup -f -v 
 
On UNIX or Linux: 
-------------------------- 
# touch /usr/openv/netbackup/db/config/DPS_PROXYNOEXPIRE 
# echo "300" > /usr/openv/netbackup/db/config/DPS_PROXYDEFAULTSENDTMO 
# echo "300" > /usr/openv/netbackup/db/config/DPS_PROXYDEFAULTRECVTMO 
 
Restart nbrmms (NetBackup Remote Manager and Monitor Service) on the media server. 
# pkill nbrmms 
# /usr/openv/netbackup/bin/nbrmms 
    Or, stop and restart all services on the MSDP media server. 
# /usr/openv/netbackup/bin/goodies/netbackup stop 
# /usr/openv/netbackup/bin/goodies/netbackup start 
 

 Another TN (TECH159707) suggests these values:

DPS_PROXYDEFAULTSENDTMO (value of 1800 inside)
DPS_PROXYDEFAULTRECVTMO (value of 1800 inside)

I have also seen suggestions saying 800 in DPS_PROXYDEFAULTRECVTMO.

See if job is successful when you run less jobs to this media server.

 

View solution in original post

6 REPLIES 6

Marianne
Level 6
Partner    VIP    Accredited Certified

All we can see is a write and plugin error.
The problem seems to be with backup destination, not the source.

What type of storage is STU_DDR1_DALMEDIA3?

Do you have log folders on MediaServer dalmedia3?
You will need bptm and bpdm log folders.

Please copy logs to .txt files (e.g. bptm.txt) and upload them as File attachments.

ankur1809
Level 5

stu_ddr1_dalmedia3 is a data domain storage unit.

 

ankur1809
Level 5

bpbrm logs as well

Marianne
Level 6
Partner    VIP    Accredited Certified

Looks like you will need to increase logging level.

There is no indication of what went wrong with current level 0 logging. 
These are the only entries in bptm for this job (PID 12560):

08:48:35.401 [12560] <2> bptm: INITIATING (VERBOSE = 0): -w -c isilonbkp -dpath ddr1_vw -stunit STU_DDR1_DALMEDIA3 -cl PROD_NFS_IMAGENOW_osm_1 -bt 1430747314 -b isilonbkp_1430747314 -st 1 -cj 1 -reqid -1429538155 -jm -brm -hostname isilonbkp -ru root -rclnt isilonbkp -rclnthostname isilonbkp -rl 1 -rp 1209600 -sl DLY_INCR -ct 19 -maxfrag 524288 -eari 0 -mediasvr dalmedia3 -connect_options 0x01030202 -jobid 5930722 -jobgrpid 5930721 -masterversion 760000 -bpbrm_shm_id 136019987 -blks_per_buffer 512 -ndmpport 48116 -df 3
08:48:36.758 [12560] <4> report_client: VBRC 2 12560 1 isilonbkp_1430747314 19 PROD_NFS_IMAGENOW_osm_1 1 DLY_INCR 0 1 1
08:48:36.877 [12560] <2> construct_sts_isid: master_server nbutx2.hms.hmsy.com, client isilonbkp, backup_time 1430747314, copy_number 1, stream_number 1, fragment_number 0, resume_number 0, spl_name NULL
08:48:37.195 [12560] <2> construct_sts_isid: master_server nbutx2.hms.hmsy.com, client isilonbkp, backup_time 1430747314, copy_number 1, stream_number 1, fragment_number 1, resume_number 0, spl_name NULL
08:48:37.284 [12560] <2> io_open_disk: file isilonbkp_1430747314_C1_F1 successfully opened
08:48:37.346 [12560] <4> write_backup: begin writing backup id isilonbkp_1430747314, copy 1, fragment 1, destination path ddr1_vw
.....
13:25:36.321 [12560] <2> delete_image_disk_sts_impl: Deleting disk header for isilonbkp_1430747314_C1_HDR

Strangely none of these entries for PID 12560 can be found in bptm log:

05/04/2015 13:25:36 - Critical bptm (pid=12560) image write failed: error 2060046: plugin error 
05/04/2015 13:25:36 - Critical bptm (pid=12560) sts_get_image_prop failed: error 2060046: plugin error 
05/04/2015 13:25:38 - Error bptm (pid=12560) cannot write image to disk, Invalid argument 

It also seems that multiple other jobs ran successfully to DD and tape for the same NDMP client at the same time:

11:33:14.758 [13640] <4> write_backup: successfully wrote backup id isilonbkp_1430549662 
11:55:25.479 [25889] <4> write_backup: successfully wrote backup id isilonbkp_1430549356 
12:03:24.661 [1624] <4> write_backup: successfully wrote backup id isilonbkp_1430548954

 

How many backups are running simultaneously on the media server to tape and DD?

This maybe an indication of media server or DD overload.

Which OS and NBU version on the media server?

These touchfiles normally helps with process overload on MSDP media server. 
See if this helps (Extract from http://www.symantec.com/docs/TECH156490 :(

On Windows media server. 
-------------------------- 
Create this empty file. 
<install path>\netbackup\db\config\DPS_PROXYNOEXPIRE 
Be certain that there are no extensions on the file. For example, no .txt extension. 
 
Create these two files and put inside it the value 300. 
The 300 value is the only thing inside each file 
<install path>\netbackup\db\config\DPS_PROXYDEFAULTSENDTMO 
<install path>\netbackup\db\config\DPS_PROXYDEFAULTRECVTMO 
 
Restart nbrmms (NetBackup Remote Manager and Monitor Service) on the media server, 
    or just stop and restart all services. 
<install path>\netbackup\bin\bpdown -f -v 
<install path>\netbackup\bin\bpup -f -v 
 
On UNIX or Linux: 
-------------------------- 
# touch /usr/openv/netbackup/db/config/DPS_PROXYNOEXPIRE 
# echo "300" > /usr/openv/netbackup/db/config/DPS_PROXYDEFAULTSENDTMO 
# echo "300" > /usr/openv/netbackup/db/config/DPS_PROXYDEFAULTRECVTMO 
 
Restart nbrmms (NetBackup Remote Manager and Monitor Service) on the media server. 
# pkill nbrmms 
# /usr/openv/netbackup/bin/nbrmms 
    Or, stop and restart all services on the MSDP media server. 
# /usr/openv/netbackup/bin/goodies/netbackup stop 
# /usr/openv/netbackup/bin/goodies/netbackup start 
 

 Another TN (TECH159707) suggests these values:

DPS_PROXYDEFAULTSENDTMO (value of 1800 inside)
DPS_PROXYDEFAULTRECVTMO (value of 1800 inside)

I have also seen suggestions saying 800 in DPS_PROXYDEFAULTRECVTMO.

See if job is successful when you run less jobs to this media server.

 

sdo
Moderator
Moderator
Partner    VIP    Certified

1) What version of OneFS are you running on the Isilon devices?

2) Does your model and version of Isilon appear in this list?

https://symwisedownload.symantec.com/resources/sites/SYMWISE/content/live/SOLUTIONS/76000/TECH76495/en_US/nbu_76_hcl.html?__gda__=1430935507_7668372624cef105a9bb22001e3d0229#ndmp_devices-ndmp_devices_-_vendor_compatibility-isilon

 

watsons
Level 6

Most likely you need to engage DataDomain to troubleshoot as well.

Enable verbose=5 bptm logs & DebugLevel=6 ndmpagent (OID=134) logs, that should give you more details of what the exact error code of plugin is.

Similar issue: http://www.symantec.com/docs/TECH216451

Sometimes if the backup data is too large, OST plugin will timeout so on DataDomain side you need to increase timeout value in OST_ABANDON_TIMEOUT  (Anyway, check with DataDomain first)