Forum Discussion

backupguy2014's avatar
10 years ago

NBU 7.6.0.2, NDMP, NetApp to LT04s

A couple of times a night, NBU marks a path used by a NetApp DOWN:

# grep -n DOWN /var/log/messages
7291:Oct  2 16:14:31 fdsup00002 ltid[20725]: Operator/EMM server has DOWN'ed drive Drive018 (device 12)
7607:Oct  3 00:11:32 fdsup00002 ltid[20725]: Operator/EMM server has DOWN'ed drive Drive009 (device 9)
7746:Oct  3 02:20:54 fdsup00002 ltid[20725]: Operator/EMM server has DOWN'ed drive Drive010 (device 11)
8096:Oct  3 05:35:47 fdsup00002 ltid[20725]: Operator/EMM server has DOWN'ed drive Drive007 (device 8)
8198:Oct  3 06:49:52 fdsup00002 ltid[20725]: Operator/EMM server has DOWN'ed drive Drive014 (device 5)
8204:Oct  3 06:50:09 fdsup00002 ltid[20725]: Operator/EMM server has DOWN'ed drive Drive013 (device 1)

# grep -n 'Actual status' /var/log/messages | grep -v SUCCESS
7276:Oct  2 16:13:22 fdsup00002 tldd[20805]: DecodeMount: TLD(0) drive 2, Actual status: Robotic dismount failure
7592:Oct  3 00:10:23 fdsup00002 tldd[20805]: DecodeMount: TLD(0) drive 1, Actual status: Robotic dismount failure
7728:Oct  3 02:19:37 fdsup00002 tldd[20805]: DecodeMount: TLD(0) drive 5, Actual status: Robotic dismount failure
7740:Oct  3 02:20:17 fdsup00002 tldd[20805]: DecodeDismount: TLD(0) drive 5, Actual status: Robotic dismount failure
8078:Oct  3 05:34:37 fdsup00002 tldd[20805]: DecodeMount: TLD(0) drive 7, Actual status: Robotic dismount failure
8167:Oct  3 06:48:42 fdsup00002 tldd[20805]: DecodeMount: TLD(0) drive 15, Actual status: Robotic dismount failure
8187:Oct  3 06:49:31 fdsup00002 tldd[20805]: DecodeMount: TLD(0) drive 23, Actual status: Robotic dismount failure


A closer look in the messages log:

# view +8167 /var/log/messages

Oct  3 02:17:26 fdsup00002 tldd[20805]: TLD(0) DismountTape MW7392 from drive 5
Oct  3 02:17:26 fdsup00002 tldd[20026]: TLD(0) io_open: using ndmp
Oct  3 02:17:26 fdsup00002 tldd[20026]: TLD(0) io_open: using ndmp
Oct  3 02:19:16 fdsup00002 tldd[20805]: DecodeDismount: TLD(0) drive 5, Actual status: STATUS_SUCCESS
Oct  3 02:19:30 fdsup00002 tldd[20805]: TLD(0) MountTape MW7392 on drive 5, from slot 476
Oct  3 02:19:30 fdsup00002 tldd[22024]: TLD(0) io_open: using ndmp
Oct  3 02:19:30 fdsup00002 tldd[22024]: TLD(0) io_open: using ndmp
Oct  3 02:19:33 fdsup00002 tldd[22024]: TLD(0) ndmp_tape_open_func() failed with -10 in io_open()
Oct  3 02:19:33 fdsup00002 tldd[22024]: TLD(0) unload==TRUE, but no unload, drive 5 (device 11)
Oct  3 02:19:37 fdsup00002 tldd[20805]: DecodeMount: TLD(0) drive 5, Actual status: Robotic dismount failure

Oct  3 02:19:46 fdsup00002 tldd[20805]: TLD(0) DismountTape ****** from drive 5
Oct  3 02:19:46 fdsup00002 tldd[22190]: TLD(0) can't lock drive 5 (device 11), retry...
Oct  3 02:19:51 fdsup00002 tldd[22190]: TLD(0) io_open: using ndmp
Oct  3 02:19:51 fdsup00002 tldd[22190]: TLD(0) io_open: using ndmp
Oct  3 02:19:54 fdsup00002 tldd[22190]: TLD(0) ndmp_tape_open_func() failed with -10 in io_open()
Oct  3 02:20:13 fdsup00002 tldd[22190]: TLD(0) ndmp_tape_open_func() failed with -10 in io_open()
Oct  3 02:20:13 fdsup00002 tldd[22190]: TLD(0) unload==TRUE, but no unload, drive 5 (device 11)
Oct  3 02:20:17 fdsup00002 tldd[20805]: DecodeDismount: TLD(0) drive 5, Actual status: Robotic dismount failure
Oct  3 02:20:54 fdsup00002 ltid[20725]: Operator/EMM server has DOWN'ed drive Drive010 (device 11)

In bptm:

02:20:18.685 [22109] <2> send_MDS_msg: OP_STATUS 0 2845546 fdsup00001nbu 6 2 0 0 0 0 0 0 *NULL* 0
02:20:18.702 [22109] <2> send_operation_error: Decoded status = 6 from 2
02:20:18.703 [22109] <16> really_tpunmount: error unloading media, TpErrno = Robot operation failed


The debug/robots log:

02:19:30.147 [20805] <5> MountTape: TLD(0) MountTape MW7392 on drive 5, from slot 476
02:19:30.147 [20805] <2> SpawnChild: TLD(0) Creating Child Process
02:19:37.543 [20805] <5> GetResponseStatus: DecodeMount: TLD(0) drive 5, Actual status: Robotic dismount failure
02:19:46.150 [20805] <4> AddTldLtiReqEntry: Processing ROBOT_DISMOUNT request...
02:19:46.150 [20805] <2> ProcessTable: found active request id -1. Child pid = 0 flags = 0 request type 3
02:19:46.150 [20805] <5> DismountTape: TLD(0) DismountTape ****** from drive 5
02:19:46.150 [20805] <2> SpawnChild: TLD(0) Creating Child Process
02:20:17.199 [20805] <5> GetResponseStatus: DecodeDismount: TLD(0) drive 5, Actual status: Robotic dismount failure

The debug/ltid log:

02:17:26.971 [20725] <4> tpunmount_withMntPtr: drive has been assigned
02:17:26.971 [20725] <4> add_unload_to_queue: Unload for media id MW7392 added to queue (PID 19917, OP 2)
02:17:26.972 [20725] <2> SendRobotReq: LTID - Sent ROBOTIC request, Type=3, Param2=11
02:19:16.995 [20725] <4> LtidProcCmd: Pid=20805, Data.Pid=20805, Type=56, Param1=11, Param2=0, LongParam=0
02:19:16.995 [20725] <6> WriteEntry: Updating drive Drive010 at path nrst7a on attach host fdwonetapp02
02:19:17.145 [20725] <4> LtidProcCmd: Sending UNLOAD DONE msg to BPTM (dev 11, PID 19917)
02:19:17.145 [20725] <4> send_request_answer: Pid=19917, Type=0, ReturnType=12, DataType=0, MsgRes.SC=0
02:19:30.146 [20725] <4> LtidProcCmd: Pid=21985, Data.Pid=21985, Type=12, Param1=0, Param2=0, LongParam=0
02:19:30.147 [20725] <4> tpreq: User tape request: RVSN:MW7392 EVSN:MW7392 Drive:Drive010 Path:nrst7a AttachHost:fdwonetapp02
02:19:30.147 [20725] <4> IssueMount: Index = 3, pid =0, removemount = 0, signalopr = 1
02:19:30.147 [20725] <4> IssueMount: mounting on drive Drive010 path nrst7a
02:19:30.147 [20725] <4> IssueMount: Found drive Drive010 at index 11.  Override path with nrst7a, fdwonetapp02
02:19:30.147 [20725] <4> IssueMount: Sending ROBOT_MOUNT request
02:19:30.147 [20725] <4> SendRobotReq: ROBOT_MOUNT request found
02:19:30.147 [20725] <4> SendRobotReq: Found drive name and path specified Drive010: nrst7a fdwonetapp02
02:19:30.147 [20725] <2> SendRobotReq: LTID - Sent ROBOTIC request, Type=1, Param2=3
02:19:38.148 [20725] <4> LtidProcCmd: Pid=20805, Data.Pid=20805, Type=54, Param1=11, Param2=3, LongParam=0
02:19:38.148 [20725] <2> ProcessRobotMsg: LTID - received ROBOT MESSAGE, Type=54, LongParam=0, Param1=11, Param2=3
02:19:38.148 [20725] <4> ProcessRobotMsg: Sending robotic error 6 to BPTM
02:19:38.148 [20725] <2> CancelMount: Mount request canceled for media id MW7392
02:19:38.148 [20725] <4> send_request_answer: Pid=21985, Type=330, ReturnType=12, DataType=2, MsgRes.SC=6
02:19:38.148 [20725] <4> tpunmount: Optical_spec, remove mount_request, leave mounted
02:19:46.149 [20725] <4> LtidProcCmd: Pid=22109, Data.Pid=22109, Type=12, Param1=0, Param2=0, LongParam=0
02:19:46.149 [20725] <4> tpreq: ENOFILE3: cannot find pending mount_req
02:19:46.150 [20725] <4> tpreq: unlink failed (/usr/openv/netbackup/db/media/tpreq/drive_Drive010), errno = 2 (No such file or directory)
02:19:46.150 [20725] <4> add_unload_to_queue: Unload for media id MW7392 added to queue (PID 22109, OP 2)
02:19:46.150 [20725] <2> SendRobotReq: LTID - Sent ROBOTIC request, Type=3, Param2=11
02:20:18.154 [20725] <4> LtidProcCmd: Pid=20805, Data.Pid=20805, Type=56, Param1=11, Param2=0, LongParam=0
02:20:18.154 [20725] <6> WriteEntry: Updating drive Drive010 at path nrst7a on attach host fdwonetapp02
02:20:18.215 [20725] <4> LtidProcCmd: Sending UNLOAD DONE msg to BPTM (dev 11, PID 22109)
02:20:18.215 [20725] <4> send_request_answer: Pid=22109, Type=330, ReturnType=12, DataType=2, MsgRes.SC=6
02:20:53.220 [20725] <4> LtidProcCmd: Pid=22830, Data.Pid=22830, Type=100, Param1=0, Param2=0, LongParam=0
02:20:54.220 [20725] <4> LtidProcCmd: Pid=22830, Data.Pid=22830, Type=126, Param1=11, Param2=1, LongParam=54
02:20:54.221 [20725] <6> DownDrivePath: DOWNing path nrst7a for Drive010 on host fdsup00002.fda.gov
02:20:54.221 [20725] <6> WriteEntry: Updating drive Drive010 at path nrst7a on attach host fdwonetapp02


It looks like something happened between 2:19:30 and 2:29:38 -- like maybe the tape was not finished mounting and NBU interpreted the delay as an error?

I don't know what to make of this.  I can address the symptoms, but I would like to understand the cause (and fix it).

 

 

7 Replies

  • So the error to pay attention to is:

    really_tpunmount: error unloading media, TpErrno = Robot operation failed

    The request is to unmount a tape. MW7392 is the tape I see. The response to NetBackup is an error and we see that as a Robotic dismount failure and log as such. 

    I would try to mount and unload tape MW7392 with robtest to see if you get any further errors, but there is something going on with the drive or that tape. 

  • Did that.  The drives that are DOWNed and the media they are using are completely random.  The only common element is the Netapps.  We have EMC h/w that uses NDMP, and those paths are never DOWNed.

     

    My understanding is, when mounting media to a tape drive, a unmount command is sent to ensure the drive is empty.  When NDMP is involved, NetBackup sends a command to the NDMP server, and asks it to send SCSI commands to the drive, then relay the response codes back to NBU.

     

    I am also lead to believe that the NBU Master allows for 3 'errors' in a 24 hour period before marking a path as DOWN.

     

    I've combed through the NDMP logs generated by the NetApps several times.  I keep coming back with a vague sense that NBU just isn't waiting long enough for the mount to complete, or, NBU isn't waiting long enough for the NetApp to tell NBU that the mount is complete.  In other words, the drives are being DOWNed out of an over-abundance of caution.

     

    I don't know how to address that directly, but I wonder if there is a way to deal with the symptoms?  Is there a way to tell NBU to wait a little longer before deciding something is wrong?  Or to increase the number of 'errors' per 24 hour period?

  • I should add we are running NetApp Release 8.1.2 7-Mode: Tue Oct 30 19:56:51 PDT 2012 on our NetApps.

  • Here's one from last night.  Tape MW8133 is to be mounted in drive 1.

    Oct  7 23:46:24 fdsup00002 tldd[20805]: TLD(0) MountTape MW8133 on drive 1, from slot 1
    Oct  7 23:46:24 fdsup00002 tldd[6469]: TLD(0) io_open: using ndmp
    Oct  7 23:46:24 fdsup00002 tldd[6469]: TLD(0) io_open: using ndmp
    Oct  7 23:46:28 fdsup00002 tldd[6469]: TLD(0) ndmp_tape_open_func() failed with -10 in io_open()
    Oct  7 23:46:28 fdsup00002 tldd[6469]: TLD(0) unload==TRUE, but no unload, drive 1 (device 9)

    As NBU does, it first sends an unload command to a tape drive to ensure a tape is not present.

    Oct  7 23:46:32 fdsup00002 tldd[20805]: DecodeMount: TLD(0) drive 1, Actual status: Robotic dismount failure

    Something is wrong.

    Oct  7 23:46:32 fdsup00002 ltid[20725]: ltid received request type 54
    Oct  7 23:46:32 fdsup00002 ltid[20725]: LTID - received ROBOT MESSAGE, Type=54, LongParam=0, Param1=9, Param2=0

    Another dismount request is sent.

    Oct  7 23:46:39 fdsup00002 ltid[20725]: ltid received request type 12
    Oct  7 23:46:39 fdsup00002 ltid[20725]: LTID - Sent ROBOTIC request, Type=3, Param2=9
    Oct  7 23:46:39 fdsup00002 tldd[20805]: TLD(0) DismountTape ****** from drive 1
    Oct  7 23:46:39 fdsup00002 tldd[6879]: TLD(0) io_open: using ndmp
    Oct  7 23:46:39 fdsup00002 tldd[6879]: TLD(0) io_open: using ndmp
    Oct  7 23:46:43 fdsup00002 tldd[6879]: TLD(0) ndmp_tape_open_func() failed with -10 in io_open()
    Oct  7 23:47:02 fdsup00002 tldd[6879]: TLD(0) ndmp_tape_open_func() failed with -10 in io_open()
    Oct  7 23:47:02 fdsup00002 tldd[6879]: TLD(0) unload==TRUE, but no unload, drive 1 (device 9)
    Oct  7 23:47:04 fdsup00002 tldd[20805]: DecodeDismount: TLD(0) drive 1, Actual status: STATUS_SUCCESS

    This one reports success.

    Oct  7 23:47:04 fdsup00002 ltid[20725]: ltid received request type 56
    Oct  7 23:47:39 fdsup00002 ltid[20725]: ltid received request type 100
    Oct  7 23:47:40 fdsup00002 ltid[20725]: ltid received request type 126

    And DOWNs the drive.

    Oct  7 23:47:40 fdsup00002 ltid[20725]: Operator/EMM server has DOWN'ed drive Drive009 (device 9)

    In the bptm log:

    23:46:24.152 [6356] <4> expandpath: /usr/openv/netbackup/db/media/tpreq/drive_Drive009
    23:46:24.821 [6356] <2> tapelib: wait_for_ltid, Mount, timeout 0
    23:46:32.829 [6356] <2> send_MDS_msg: OP_STATUS 0 2941088 fdsup00001nbu 6 1 0 0 0 0 0 0 *NULL* 0
    23:46:32.849 [6356] <2> send_operation_error: Decoded status = 6 from 1
    23:46:32.849 [6356] <2> set_job_details: Tfile (808377): LOG 1412739992 16 bptm 6356 error requesting media, TpErrno = Robot operation failed

    23:46:32.849 [6356] <2> send_job_file: job ID 808377, ftype = 3 msg len = 85, msg = LOG 1412739992 16 bptm 6356 error requesting media, TpErrno = Roobot operation failed

    23:46:32.850 [6356] <16> mount_open_media: error requesting media, TpErrno = Robot operation failed
    ...

    23:46:37.342 [6792] <2> drivename_open: Called with Create 1, file Drive009
    23:46:37.342 [6792] <2> drivename_lock: lock established
    ...

    23:46:39.654 [6792] <2> really_tpunmount: tpunmount'ing /usr/openv/netbackup/db/media/tpreq/drive_Drive009
    23:46:39.824 [6792] <2> tapelib: wait_for_ltid, UnMount, timeout 2400
    23:47:05.350 [6792] <4> report_resource_done: VBRD 1 6792 0 Drive009 MW8133

    I interpret the above as a mount failure, possibly due to timing (maybe NBU didn't wait long enough).  The second dismount request succeeded because there was nothing in the drive.

     

  • Here's excerpts from the extremely verbose ndmpd logs from the NetApp at the same times:

    Tue Oct 07 2014 23:46:21 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: SCSI cb_open_mode_sense: no problems
    Tue Oct 07 2014 23:46:21 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: SCSI: OPEN: nrst10a READY, closing and reopening the tape
    Tue Oct 07 2014 23:46:21 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: MGMT_RPC::client_send: using timeout 30
    Tue Oct 07 2014 23:46:21 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: C>TD (scsi_client) TD_CLOSE_TAPE
    Tue Oct 07 2014 23:46:21 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG:    tape handle 0x7f20006 PID 41395
    Tue Oct 07 2014 23:46:21 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: C<TD (scsi_client) TD_CLOSE_TAPE
    Tue Oct 07 2014 23:46:21 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG:    Error TD_ERROR_SUCCESS
    Tue Oct 07 2014 23:46:21 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: SCSI cb_reopen_closed: called
    Tue Oct 07 2014 23:46:21 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: SCSI cb_reopen_closed: closed device nrst10a
    Tue Oct 07 2014 23:46:21 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: SCSI reopen_tape: Trying to open tape nrst10a
    Tue Oct 07 2014 23:46:21 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: MGMT_RPC::client_send: using timeout 30
    Tue Oct 07 2014 23:46:21 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: C>TD (scsi_client) TD_OPEN_TAPE
    Tue Oct 07 2014 23:46:21 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG:    Open tape nrst10a PID 41395
    Tue Oct 07 2014 23:46:21 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: C<TD (scsi_client) TD_OPEN_TAPE
    Tue Oct 07 2014 23:46:21 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG:    Error TD_ERROR_SUCCESS
    Tue Oct 07 2014 23:46:21 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: SCSI cb_reopen_device: called
    Tue Oct 07 2014 23:46:21 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: REOPEN: tape_handle=0x7f30006(133365766) pid 0xa1b3(41395)
    Tue Oct 07 2014 23:46:21 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: SCSI: OPEN: delay 3 nrst10a
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: SCSI: OPEN: (null) READY
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: cb_ndmp4_tape_open: here

    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: C>TD (scsi_client) TD_SCSI_CMD
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG:    Device type Tape, handle 0x7f30006
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG:    timeout 0x3d (61)
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG:    cdb 0: operation code 0x12 (group 0, command 0x12) SCSI_OP_

    ...
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: C>TD (scsi_client) TD_SCSI_CMD
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG:    Device type Tape, handle 0x7f30006
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG:    timeout 0x3d (61)
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG:    cdb 0: operation code 0x1a (group 0, command 0x1a) SCSI_OP_

    ...
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: SCSI close: called
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: SCSI load_tape_stats: called
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: C>TD (scsi_client) TD_GET_TAPE_STATS
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG:    tape handle 0x7f30006
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: C<TD (scsi_client) TD_GET_TAPE_STATS
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG:    error TD_ERROR_SUCCESS
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG:    record_num 4294967295
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG:    file_num 4294967295
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG:    written_to FALSE
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: SCSI cb_load_tape_stats: called
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: SCSI cb_load_tape_stats: Tape written to? 0
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: SCSI cb_close_stats: called
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: SCSI close_release: called 0 (0)
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: MGMT_RPC::client_send: using timeout 30

    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: SCSI cb_close_release: called
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: SCSI cb_close_release: closed device nrst10a
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: MGMT_RPC::client_close: 0x800865340 called
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: MGMT_RPC::qrelease: 0x800865340 start (scsi_client)
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: MGMT_RPC::buf_stream_close: called on 0x800865340
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: BUF_Stream::start_close: up 0x800865340 close 0x0 arg 0x0
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: BUF_Stream::start_close: fd=8 found 0x800865340 (scsi_client)
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: MGMT_RPC::qrelease: pending_releases 0x0
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: MGMT_RPC::qrelease: pending_releases 0x800865340 next 0x0
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: MGMT_RPC::qrelease: 0x800865340 done (scsi_client)
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: SCSI release: called
    Tue Oct 07 2014 23:46:24 -04:00 [kern_ndmpd:info:1903] [41395]  DEBUG: SCSI: released nrst10a

     

    No obvious errors (to my eyes).

  • Maybe a good idea to delete and rerun the device config wizard for Netapp filer.

    'Robotic dismount failure' can happen when there is a mismatch between NBU device config and actual device path at OS level.

    The dismount command is send to the OS device name in the form of 'rewind, offline'.
    If the device mapping is wrong, the command is obviously sent to the wrong drive.
    Backups will work because NBU will use any drive where it sees a tape being mounted.

  • We thought of that too, and traced out the naming and pathing - it was all correct.  Also, after having UP'd the path, NDMP would make use of it and not subsequently DOWN it.

    At this point I am regarding the DOWNing events as cosmetic and not meaningful.