cancel
Showing results for 
Search instead for 
Did you mean: 

NDMP issue. error status: 86: media position error

plsl
Level 2

We have been using NDMP to backup NetApp snapshot for about 2 years now. Since yesterday a new issue as arise. All ndmp backup are failing, either on status 86, media position or on status 24, socket write failed(24).

 

20/01/2015 11:43:19 AM - Info nbjm(pid=5220) starting backup job (jobid=175381) for client dick-san01, policy ddc_tld0_ndmp_nfsdsprd_01, schedule incr_daily 

20/01/2015 11:43:19 AM - Info nbjm(pid=5220) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=175381, request id:{2E765ED3-9D2A-4DDD-8082-799E8E1C4CF0}) 

20/01/2015 11:43:19 AM - requesting resource smtlhq143-hcart2-robot-tld-0-dick-san01

20/01/2015 11:43:19 AM - requesting resource smtlhq143.NBU_CLIENT.MAXJOBS.dick-san01

20/01/2015 11:43:19 AM - requesting resource smtlhq143.NBU_POLICY.MAXJOBS.ddc_tld0_ndmp_nfsdsprd_01

20/01/2015 11:43:19 AM - Info bpbrm(pid=9432) dick-san01 is the host to backup data from    

20/01/2015 11:43:19 AM - Info bpbrm(pid=9432) reading file list for client       

20/01/2015 11:43:19 AM - Info bpbrm(pid=9432) starting ndmpagent on client        

20/01/2015 11:43:19 AM - Info ndmpagent(pid=6756) Backup started          

20/01/2015 11:43:19 AM - Info bptm(pid=7096) start           

20/01/2015 11:43:19 AM - granted resource smtlhq143.NBU_CLIENT.MAXJOBS.dick-san01.viarail.intra

20/01/2015 11:43:19 AM - granted resource smtlhq143.NBU_POLICY.MAXJOBS.ddc_tld0_ndmp_nfsdsprd_01

20/01/2015 11:43:19 AM - granted resource 015257

20/01/2015 11:43:19 AM - granted resource ddc-tld0-lto5-003

20/01/2015 11:43:19 AM - granted resource smtlhq143-hcart2-robot-tld-0-dick-san01

20/01/2015 11:43:19 AM - estimated 3222944486 Kbytes needed

20/01/2015 11:43:19 AM - Info nbjm(pid=5220) started backup (backupid=dick-san01_1421772199) job for client dick-san01, policy ddc_tld0_ndmp_nfsdsprd_01, schedule incr_daily on storage unit smtlhq143-hcart2-robot-tld-0-dick-san01

20/01/2015 11:43:19 AM - started process bpbrm (9432)

20/01/2015 11:43:19 AM - connecting

20/01/2015 11:43:19 AM - connected; connect time: 0:00:00

20/01/2015 11:43:20 AM - Info bptm(pid=7096) using 64 data buffers        

20/01/2015 11:43:20 AM - Info bptm(pid=7096) using 65536 data buffer size       

20/01/2015 11:43:20 AM - Info bptm(pid=7096) start backup           

20/01/2015 11:43:20 AM - Info bptm(pid=7096) Waiting for mount of media id 015257 (copy 1) on server smtlhq143.

20/01/2015 11:43:20 AM - mounting 015257

20/01/2015 11:44:15 AM - Info bptm(pid=7096) media id 015257 mounted on drive index 0, drivepath nrst1a, drivename ddc-tld0-lto5-003, copy 1

20/01/2015 11:44:15 AM - mounted; mount time: 0:00:55

20/01/2015 11:44:19 AM - Info ndmpagent(pid=6756) dick-san01: SCSI: TAPE READ: short read for nrst1a    

20/01/2015 11:44:19 AM - positioning 015257 to file 73

20/01/2015 11:45:19 AM - Error bptm(pid=7096) io_ioctl_ndmp (MTFSF) failed on media id 015257, drive index 0, return code 18 (NDMP_XDR_DECODE_ERR) (bptm.c.7157)

20/01/2015 11:45:19 AM - Error ndmpagent(pid=6756) eof is set, connection 000000000101AF80       

20/01/2015 11:45:19 AM - Error ndmpagent(pid=6756) NDMP backup failed, path = UNKNOWN      

20/01/2015 11:45:20 AM - Error ndmpagent(pid=6756) connection 0x101b360 ndmp_message_process_one_failed, status = NDMP_ILLEGAL_STATE_ERR       

20/01/2015 11:45:20 AM - Error ndmpagent(pid=6756) eof is set - connection 0x101b360      

20/01/2015 11:45:20 AM - Info bptm(pid=7096) EXITING with status 86 <----------       

20/01/2015 11:45:20 AM - Info ndmpagent(pid=0) done. status: 86: media position error      

20/01/2015 11:45:20 AM - end writing

media position error(86)

 

 

Error 2 (start writing then fail)

 

20/01/2015 9:58:47 AM - Info nbjm(pid=5220) starting backup job (jobid=175367) for client dick-san01, policy ddc_tld0_ndmp_nfsdsprd_05, schedule incr_daily 

20/01/2015 9:58:47 AM - Info nbjm(pid=5220) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=175367, request id:{A7DF3C66-7600-4CE0-B086-CD43C24F91D5}) 

20/01/2015 9:58:47 AM - requesting resource smtlhq143-hcart2-robot-tld-0-dick-san01

20/01/2015 9:58:47 AM - requesting resource smtlhq143.NBU_CLIENT.MAXJOBS.dick-san01

20/01/2015 9:58:47 AM - requesting resource smtlhq143.NBU_POLICY.MAXJOBS.ddc_tld0_ndmp_nfsdsprd_05

20/01/2015 9:58:48 AM - Info bpbrm(pid=7036) dick-san01 is the host to backup data from    

20/01/2015 9:58:48 AM - Info bpbrm(pid=7036) reading file list for client       

20/01/2015 9:58:48 AM - Info bpbrm(pid=7036) starting ndmpagent on client        

20/01/2015 9:58:48 AM - Info ndmpagent(pid=9764) Backup started          

20/01/2015 9:58:48 AM - Info bptm(pid=1392) start           

20/01/2015 9:58:48 AM - Info bptm(pid=1392) using 64 data buffers        

20/01/2015 9:58:48 AM - Info bptm(pid=1392) using 65536 data buffer size       

20/01/2015 9:58:48 AM - granted resource smtlhq143.NBU_CLIENT.MAXJOBS.dick-san01

20/01/2015 9:58:48 AM - granted resource smtlhq143.NBU_POLICY.MAXJOBS.ddc_tld0_ndmp_nfsdsprd_05

20/01/2015 9:58:48 AM - granted resource 015176

20/01/2015 9:58:48 AM - granted resource ddc-tld0-lto5-002

20/01/2015 9:58:48 AM - granted resource smtlhq143-hcart2-robot-tld-0-dick-san01

20/01/2015 9:58:48 AM - estimated 1340055948 Kbytes needed

20/01/2015 9:58:48 AM - Info nbjm(pid=5220) started backup (backupid=dick-san01_1421765928) job for client dick-san01, policy ddc_tld0_ndmp_nfsdsprd_05, schedule incr_daily on storage unit smtlhq143-hcart2-robot-tld-0-dick-san01

20/01/2015 9:58:48 AM - started process bpbrm (7036)

20/01/2015 9:58:48 AM - connecting

20/01/2015 9:58:48 AM - connected; connect time: 0:00:00

20/01/2015 9:58:49 AM - Info bptm(pid=1392) start backup          

20/01/2015 9:58:49 AM - Info bptm(pid=1392) Waiting for mount of media id 015176 (copy 1) on server smtlhq143.

20/01/2015 9:58:49 AM - mounting 015176

20/01/2015 9:59:45 AM - Info bptm(pid=1392) media id 015176 mounted on drive index 2, drivepath nrst0a, drivename ddc-tld0-lto5-002, copy 1

20/01/2015 9:59:45 AM - mounted; mount time: 0:00:56

20/01/2015 9:59:49 AM - Info ndmpagent(pid=9764) dick-san01: SCSI: TAPE READ: short read for nrst0a    

20/01/2015 9:59:49 AM - positioning 015176 to file 1

20/01/2015 9:59:53 AM - Info ndmpagent(pid=9764) dick-san01: DUMP: Using inowalk incremental dump algorithm for qtree   

20/01/2015 9:59:53 AM - positioned 015176; position time: 0:00:04

20/01/2015 9:59:53 AM - begin writing

20/01/2015 10:00:04 AM - Info ndmpagent(pid=9764) dick-san01: DUMP: Date of this level 2 dump: Tue Jan 20 02:39:03 2015.

20/01/2015 10:00:04 AM - Info ndmpagent(pid=9764) dick-san01: DUMP: Date of last level 1 dump: Mon Jan 19 02:40:09 2015.

20/01/2015 10:00:04 AM - Info ndmpagent(pid=9764) dick-san01: DUMP: Dumping /vol/v_nfsds_prd_05/q_nfsds_prd_05/.snapshot/smvi_backup_datastore_prod_05_mfm_only_novmsnap-daily_recent to NDMP connection     

20/01/2015 10:00:04 AM - Info ndmpagent(pid=9764) dick-san01: DUMP: mapping (Pass I)[regular files]      

20/01/2015 10:00:40 AM - Info ndmpagent(pid=9764) dick-san01: DUMP: mapping (Pass II)[directories]       

20/01/2015 10:00:40 AM - Info ndmpagent(pid=9764) dick-san01: DUMP: estimated 848756807 KB.       

20/01/2015 10:00:40 AM - Info ndmpagent(pid=9764) dick-san01: DUMP: dumping (Pass III) [directories]      

20/01/2015 10:00:40 AM - Info ndmpagent(pid=9764) dick-san01: DUMP: dumping (Pass IV) [regular files]     

20/01/2015 10:01:40 AM - Info ndmpagent(pid=9764) dick-san01: MOVER: signaling backup EOW at 10485760000 (160000 records)   

20/01/2015 10:02:49 AM - Info ndmpagent(pid=9764) dick-san01: MOVER: signaling backup EOW at 20971520000 (320000 records)   

20/01/2015 10:02:50 AM - Error ndmpagent(pid=9764) terminated by parent process        

20/01/2015 10:02:50 AM - Info ndmpagent(pid=0) done           

20/01/2015 10:03:03 AM - Info bptm(pid=1392) EXITING with status 24 <----------       

20/01/2015 10:03:03 AM - Error ndmpagent(pid=9764) Connection was closed but has not yet been destroyed.   

20/01/2015 10:03:03 AM - Info ndmpagent(pid=0) done. status: 24: socket write failed      

20/01/2015 10:03:03 AM - Error ndmpagent(pid=9764) Connection was closed but has not yet been destroyed.   

20/01/2015 10:03:03 AM - end writing; write time: 0:03:10

socket write failed(24)

20/01/2015 10:03:03 AM - Error ndmpagent(pid=9764) MoverGetState called with no session       

20/01/2015 10:03:03 AM - Error ndmpagent(pid=9764) NDMP backup failed, path = /vol/v_nfsds_prd_05/q_nfsds_prd_05/.snapshot/smvi_backup_datastore_prod_05_mfm_only_novmsnap-daily_recent      

20/01/2015 10:03:03 AM - Error ndmpagent(pid=9764) Connection was closed but has not yet been destroyed.   

20/01/2015 10:03:03 AM - Error ndmpagent(pid=9764) Connection was closed but has not yet been destroyed.   

20/01/2015 10:03:03 AM - Error ndmpagent(pid=9764) Connection was closed but has not yet been destroyed.   

20/01/2015 10:03:03 AM - Error ndmpagent(pid=9764) Connection was closed but has not yet been destroyed.   

20/01/2015 10:03:03 AM - Error ndmpagent(pid=9764) Connection was closed but has not yet been destroyed.   

 

 

I was able to write a dump 0f of Netapp snapshot to tape using NepApp command line.

Robot has been restarted.

still no luck. any help will be appreciated. Thank you.

 

4 REPLIES 4

SymTerry
Level 6
Employee Accredited

Status 86 means The system's device driver returned an I/O error while NetBackup was positioning the tape media. (HOWTO104014). Please check your media and drive:

Check for the following:

  • A defective or a dirty drive. Clean it or have it repaired (see the tpclean command for cleaning).

  • Incorrect drive configuration. Verify the Media and Device Management and system configuration for the drive.

    For example, on UNIX, the drive may be configured for fixed mode when it must be variable mode.

    See the NetBackup Device Configuration Guide for more information.

  • Defective media. In this case, some data may be lost. Use the bpmedia command to set the volume to the FROZEN state so it is not used for future backups.

    See "Frozen media troubleshooting considerations" in the Troubleshooting Guide.

  • The wrong media type. Verify that the media matches the drive type you use.

 

 

jim_dalton
Level 6

Ive had a few of these, not found the root cause, suspect ndmpd on filer. Interested to know the answer! Jim

plsl
Level 2

I did suspect also the ndmpd daemon. we did a stop/start but it didn't fix the issue.

But, we found out that someone from my team decrease the client read timeout on the master server to 60 seconds to debug an other problem a few hours before the issue start. Setting back the timeout to 300 seconds fix the issue.

 

if after 60 seconds the tape wasn't positionned, the timeout kill the process.

case close.

thank you

jim_dalton
Level 6

Interesting! Jim