cancel
Showing results for 
Search instead for 
Did you mean: 

DOWN'ing drive index

kakaluru
Level 4

 

Hi All

After running 2 minutes automatically drive is downing and getting below error. Please suggest me on this.

3/7/2016 11:34:47 AM - job 3463 was restarted as job 3467
3/7/2016 11:34:50 AM - started process bpbrm (6060)
3/7/2016 11:34:50 AM - connecting
3/7/2016 11:34:51 AM - connected; connect time: 00:00:01
3/7/2016 11:35:12 AM - started process bptm (6064)
3/7/2016 11:35:12 AM - started process bptm (2972)
3/7/2016 11:35:12 AM - mounting 0045L3
3/7/2016 11:36:01 AM - mounted; mount time: 00:00:49
3/7/2016 11:36:01 AM - positioning 0045L3 to file 1
3/7/2016 11:36:05 AM - Error bptm(pid=6064) cannot write image to media id 0045L3, drive index 0, The request was aborted.
3/7/2016 11:36:05 AM - positioned 0045L3; position time: 00:00:04
3/7/2016 11:36:05 AM - begin writing
3/7/2016 11:36:06 AM - Warning bptm(pid=6064) DOWN'ing drive index 0, it has had at least 3 errors in last 12 hour(s)
3/7/2016 11:36:13 AM - end writing; write time: 00:00:08
media write error(84)
3/7/2016 11:36:14 AM - Error bpsched(pid=5848) backup of client 10.174.0.12 exited with status 84 (media write error)
the required storage unit is unavailable(219)

1 ACCEPTED SOLUTION

Accepted Solutions

Nicolai
Moderator
Moderator
Partner    VIP   

Or if multiple post helped you use the "Request Split Solution".

View solution in original post

15 REPLIES 15

Marianne
Level 6
Partner    VIP    Accredited Certified
"DOWN'ing drive index 0, it has had at least 3 errors in last 12 hour(s)" So, at least 3 status 84 errors in 12 hours using this tape drive . Either device or media error. You need to enable logs on this media server. Add VERBOSE entry to vm.conf on the media server and restart NBU Device Manager service. Create bptm log folder under netbackup\logs. UP the drive and run another backup. NBU errors will be logged in bptm. Device errors will be logged in Event Viewer Application and System logs.

kakaluru
Level 4

Hi Marianne

Can you please guide me how to add VERBOSE entry in windows?

Marianne
Level 6
Partner    VIP    Accredited Certified
Open the file with text editor such as Notepad. Add VERBOSE on a new line. Save and close the file.

kakaluru
Level 4

Hi Marianne

1 . Verbose entries are :

       DEVICE_HOST = <hostname>

       VERBOSE

2. bptm log file already existing

3. NBU manager also restarted.

Again same error. Please find bptm log as below

12:40:56.418 [4204.5292] <2> bptm: INITIATING (VERBOSE = 0): -count -cmd -rt 8 -rn 0 -stunit ssltuifil1-hcart3 -den 20 -mt 2 -masterversion 510000
12:40:56.434 [4204.5292] <2> bptm: NUM UP 1 0 0 0 1 0 a-Drive1
12:40:56.434 [4204.5292] <2> bptm: EXITING with status 0 <----------
12:40:56.481 [5272.3332] <2> bptm: INITIATING (VERBOSE = 0): -count -cmd -rt 8 -rn 0 -stunit storageunit -den 20 -mt 2 -masterversion 510000
12:40:56.481 [5272.3332] <2> bptm: NUM UP 1 0 0 0 1 0 a-Drive1
12:40:56.481 [5272.3332] <2> bptm: EXITING with status 0 <----------
12:40:56.684 [1008.5640] <2> bptm: INITIATING (VERBOSE = 0): -U
12:40:56.684 [1008.5640] <2> bptm: EXITING with status 0 <----------
12:40:58.809 [4608.5456] <2> bptm: INITIATING (VERBOSE = 0): -count -cmd -rt 8 -rn 0 -stunit ssltuifil1-hcart3 -den 20 -mt 2 -masterversion 510000
12:40:58.809 [4608.5456] <2> bptm: NUM UP 1 0 0 0 1 0 a-Drive1
12:40:58.809 [4608.5456] <2> bptm: EXITING with status 0 <----------
12:40:58.871 [2016.3232] <2> bptm: INITIATING (VERBOSE = 0): -count -cmd -rt 8 -rn 0 -stunit storageunit -den 20 -mt 2 -masterversion 510000
12:40:58.871 [2016.3232] <2> bptm: NUM UP 1 0 0 0 1 0 a-Drive1
12:40:58.871 [2016.3232] <2> bptm: EXITING with status 0 <----------
12:41:23.481 [5764.5792] <2> bptm: INITIATING (VERBOSE = 0): -w -pid 5748 -c 10.174.0.12 -den 20 -rt 8 -rn 0 -stunit ssltuifil1-hcart3 -cl Friday_Full -bt 1457334658 -b 10.174.0.12_1457334658 -st 0 -cj 1 -p Friday_Full -hostname 10.174.0.12 -ru root -rclnt 10.174.0.12 -rclnthostname 10.174.0.12 -rl 5 -rp 8035200 -sl Friday_Full -ct 13 -maxfrag 1048576 -mediasvr ssltuifil1 -jobid 3468 -jobgrpid 3468 -masterversion 510000
12:41:23.481 [5764.5792] <2> logconnections: bpjobd CONNECT FROM 10.174.0.14.2039 TO 10.174.0.14.13723
12:41:23.481 [5764.5792] <2> logconnections: BPJOBD CONNECT FROM 10.174.0.14.2039 TO 10.174.0.14.13723
12:41:23.481 [5764.5792] <2> ParseConfigExA: Unknown configuration option on line 49: RenameIfExists = 0
12:41:23.497 [5764.5792] <2> job_authenticate_connection: ignoring VxSS authentication check for now...
12:41:23.497 [5764.5792] <2> job_connect: Connected to the host ssltuifil1 contype 10 jobid <3468> socket <1744>
12:41:23.497 [5764.5792] <2> job_connect: Connected on port 2039
12:41:23.497 [5764.5792] <2> set_job_details: Done
12:41:23.497 [5764.5792] <2> get_stunit: Buffer1 "STUNIT ssltuifil1-hcart3 2 ssltuifil1 8 0 20 1 0 *NULL*", Buffer2 "*NULL*", Buffer3 "0 1 1048576 *NULL*"
12:41:23.497 [5764.5792] <2> LOCAL CLASS_ATT_DEFS: Product ID = 7
12:41:23.497 [5764.5792] <2> get_stunit: Buffer1 "STUNIT storageunit 2 ssltuifil1 8 0 20 1 0 *NULL*", Buffer2 "*NULL*", Buffer3 "0 1 1048576 *NULL*"
12:41:23.497 [5764.5792] <2> io_init: using 65536 data buffer size
12:41:23.497 [5764.5792] <2> io_init: CINDEX 0, sched Kbytes for monitoring = 40000
12:41:23.497 [5764.5792] <2> io_set_recvbuf: setting receive network buffer to 263168 bytes
12:41:23.497 [5764.5792] <2> io_init: using 16 data buffers
12:41:23.497 [5764.5792] <2> io_init: child delay = 20, parent delay = 30 (milliseconds)
12:41:23.497 [5764.5792] <2> io_init: shm_size = 1048964, buffer address = 0xb70000, buf control = 0xc70000, ready ptr = 0xc70180
12:41:23.512 [5764.5792] <4> report_client: VBRC 2 5764 1 10.174.0.12_1457334658 13 Friday_Full 0 Friday_Full 0 1 1
12:41:23.512 [5764.5792] <2> logconnections: bpdbm CONNECT FROM 10.174.0.14.2041 TO 10.174.0.14.13721
12:41:23.512 [5764.5792] <2> logconnections: BPDBM CONNECT FROM 10.174.0.14.2041 TO 10.174.0.14.13721
12:41:23.512 [5764.5792] <2> ParseConfigExA: Unknown configuration option on line 49: RenameIfExists = 0
12:41:23.559 [5764.5792] <2> add_to_vmhost_list: added ssltuifil1 to vmhost list
12:41:23.559 [5764.5792] <2> logconnections: vmd CONNECT FROM 10.174.0.14.2043 TO 10.174.0.14.13701
12:41:23.606 [5764.5792] <2> select_media: getting new media id for retention level 5
12:41:23.606 [5764.5792] <2> logconnections: vmd CONNECT FROM 10.174.0.14.2045 TO 10.174.0.14.13701
12:41:23.622 [5764.5792] <2> vmdb_query_scratch_bypool2: server returned:  1 0045L3 ------ 24 000045L3 -------- 8 0 ssltuifil1 00_000_TLD - 27 0 0 0 0 root root 5 Friday_Full - 1455109347 1457334658 1455109465 1457330760 0 0 27 0 0 - 0 0 50 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 - Added by Media Manager

12:41:23.622 [5764.5792] <2> db_byid: search for media id 0045L3
12:41:23.622 [5764.5792] <2> db_put: write media id 0045L3, offset = 0
12:41:23.622 [5764.5792] <2> db_unlock_media: offset 0 was already unlocked
12:41:23.622 [5764.5792] <2> check_available_drives: checking drives, about to request media id 0045L3
12:41:23.622 [5764.5792] <2> bptm media_id_to_monitor: job_id = 3468
12:41:23.622 [5764.5792] <2> bptm media_id_to_monitor: pDstMediaId = 0045L3
12:41:23.622 [5764.5792] <2> set_job_details: Done
12:41:23.622 [5764.5792] <2> select_media: selected media id 0045L3 for backup[0], 10.174.0.12(rl = 5) <----------
12:41:23.622 [5764.5792] <2> write_backup: backup child process is pid 5228.5788
12:41:23.622 [5764.5792] <2> mount_open_media: Waiting for mount of media id 0045L3 (copy 1) on server ssltuifil1.
12:41:23.653 [5228.5788] <2> bptm: INITIATING (VERBOSE = 0): -w -pid 5748 -c 10.174.0.12 -den 20 -rt 8 -rn 0 -stunit ssltuifil1-hcart3 -cl Friday_Full -bt 1457334658 -b 10.174.0.12_1457334658 -st 0 -cj 1 -p Friday_Full -hostname 10.174.0.12 -ru root -rclnt 10.174.0.12 -rclnthostname 10.174.0.12 -rl 5 -rp 8035200 -sl Friday_Full -ct 13 -maxfrag 1048576 -mediasvr ssltuifil1 -jobid 3468 -jobgrpid 3468 -masterversion 510000 -mmfill 5764 0 65536 16 20 5166109 0
12:41:23.653 [5228.5788] <2> logconnections: bpjobd CONNECT FROM 10.174.0.14.2047 TO 10.174.0.14.13723
12:41:23.653 [5228.5788] <2> logconnections: BPJOBD CONNECT FROM 10.174.0.14.2047 TO 10.174.0.14.13723
12:41:23.653 [5228.5788] <2> ParseConfigExA: Unknown configuration option on line 49: RenameIfExists = 0
12:41:23.653 [5228.5788] <2> job_authenticate_connection: ignoring VxSS authentication check for now...
12:41:23.653 [5228.5788] <2> job_connect: Connected to the host ssltuifil1 contype 10 jobid <3468> socket <1720>
12:41:23.653 [5228.5788] <2> job_connect: Connected on port 2047
12:41:23.653 [5228.5788] <2> set_job_details: Done
12:41:23.653 [5228.5788] <2> setup_mm_child: [5764] child using 16 data buffers
12:41:23.653 [5228.5788] <2> setup_mm_child: [5764] child buffer size is 65536
12:41:23.653 [5228.5788] <2> setup_mm_child: [5764] buffer address = 0xb60000, buf control = 0xc60000
12:42:05.044 [5764.5792] <2> get_tape_path: drive index 0, DOS name: \\.\Tape0, PnP name: \\?\scsi#sequential&ven_ibm&prod_ult3580-td3&rev_93gp#4&1e646d7e&0&040#{53f5630b-b6bf-11d0-94f2-00a0c91efb8b}
12:42:05.044 [5764.5792] <2> check_serial_no: serial number match for drive \\.\Tape0, drive index 0, drive serial number 9210814607, expected serial number 9210814607
12:42:05.060 [5764.5792] <2> init_tape: \\.\Tape0 configured with blocksize 0
12:42:05.060 [5764.5792] <2> init_tape: \\.\Tape0 has compression enabled
12:42:05.060 [5764.5792] <2> io_open: SCSI RESERVE
12:42:05.076 [5764.5792] <2> io_open: file C:\Program Files\VERITAS\NetBackup\db\media\tpreq/0045L3 successfully opened (mode 2)
12:42:05.076 [5764.5792] <2> get_tape_path: drive index 0, DOS name: \\.\Tape0, PnP name: \\?\scsi#sequential&ven_ibm&prod_ult3580-td3&rev_93gp#4&1e646d7e&0&040#{53f5630b-b6bf-11d0-94f2-00a0c91efb8b}
12:42:05.076 [5764.5792] <2> write_backup: media id 0045L3 mounted on drive index 0, drivepath \\.\Tape0, drivename Drive1, copy 1
12:42:05.091 [5764.5792] <4> report_throughput: VBRT 1 5764 1 1 Drive1 0045L3 0 1 0 0  0 (bptm.c.16742)
12:42:05.091 [5764.5792] <2> io_read_media_header: drive index 0, reading media header, buflen = 65536, buff = 0xa6c010, copy 1
12:42:05.091 [5764.5792] <2> io_ioctl: command (5)MTREW 1 from (bptm.c.7522) on drive index 0
12:42:05.091 [5764.5792] <2> io_write_media_header: drive index 0, writing media header
12:42:05.091 [5764.5792] <2> io_ioctl: command (5)MTREW 1 from (bptm.c.8696) on drive index 0
12:42:05.169 [5764.5792] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.8736) on drive index 0
12:42:09.263 [5764.5792] <4> write_backup: begin writing backup id 10.174.0.12_1457334658, copy 1, fragment 1, to media id 0045L3 on drive index 0
12:42:09.263 [5764.5792] <2> signal_parent: set bpbrm media ready event (pid = 5748)
12:42:09.263 [5764.5792] <2> write_data: ndmp_dup_max_frag is set to 0
12:42:09.263 [5764.5792] <2> write_data: twin_index: 0 active: 1 dont_process: 0 wrote_backup_hdr: 0 finished_buff: 0 saved_cindex: 0 twin_is_disk 0 delay_brm: 0
12:42:09.279 [5764.5792] <2> write_data: absolute block position prior to writing backup header(s) is 2, copy 1
12:42:09.279 [5764.5792] <2> io_write_back_header: drive index 0, 10.174.0.12_1457334658, file num = 1, mpx_headers = 0, copy 1
12:42:09.279 [5764.5792] <2> write_data: completed writing backup header, start writing data when first buffer is available, copy 1
12:42:09.279 [5764.5792] <2> write_data: first write, twin_index: 0 cindex: 0 dont_process: 1 wrote_backup_hdr: 1 finished_buff: 0
12:42:09.279 [5764.5792] <2> write_data: received first buffer (65536 bytes), begin writing data
12:42:10.013 [5764.5792] <2> write_data: write of 65536 bytes indicated only 0 bytes were written, err = 1235
12:42:10.013 [5764.5792] <4> write_data: WriteFile failed with: The request was aborted. (1235); bytes written = 65536; size = 0
12:42:10.013 [5764.5792] <2> is_possible_recoverable_error: not attempting error recovery, errno = 1235
12:42:10.013 [5764.5792] <2> set_job_details: Done
12:42:10.029 [5764.5792] <16> write_data: cannot write image to media id 0045L3, drive index 0, The request was aborted.
12:42:10.029 [5764.5792] <2> log_media_error: successfully wrote to error file - 03/07/16 12:42:10 0045L3 0 WRITE_ERROR
12:42:10.138 [5764.5792] <2> check_error_history: called from bptm line 17746, EXIT_Status = 84
12:42:10.138 [5764.5792] <2> check_error_history: drive index = 0, media id = 0045L3, time = 03/07/16 12:42:10, both_match = 1, media_match = 0, drive_match = 2
12:42:10.138 [5764.5792] <2> io_close: closing C:\Program Files\VERITAS\NetBackup\db\media\tpreq/0045L3, from bptm.c.15839
12:42:10.154 [5764.5792] <2> tpunmount: tpunmount'ing C:\Program Files\VERITAS\NetBackup\db\media\tpreq/0045L3
12:42:10.170 [5764.5792] <2> get_tape_path: drive index 0, DOS name: \\.\Tape0, PnP name: \\?\scsi#sequential&ven_ibm&prod_ult3580-td3&rev_93gp#4&1e646d7e&0&040#{53f5630b-b6bf-11d0-94f2-00a0c91efb8b}
12:42:10.170 [5764.5792] <2> check_serial_no: serial number match for drive \\.\Tape0, drive index 0, drive serial number 9210814607, expected serial number 9210814607
12:42:10.201 [5764.5792] <2> init_tape: \\.\Tape0 configured with blocksize 0
12:42:10.201 [5764.5792] <2> init_tape: \\.\Tape0 has compression enabled
12:42:10.201 [5764.5792] <2> TpUnmountWrapper: SCSI RELEASE
12:42:10.232 [5764.5792] <2> get_tape_path: drive index 0, DOS name: \\.\Tape0, PnP name: \\?\scsi#sequential&ven_ibm&prod_ult3580-td3&rev_93gp#4&1e646d7e&0&040#{53f5630b-b6bf-11d0-94f2-00a0c91efb8b}
12:42:10.435 [5764.5792] <4> report_resource_done: VBRD 1 5764 0 Drive1 0045L3
12:42:10.435 [5764.5792] <2> set_job_details: Done
12:42:10.451 [5764.5792] <8> check_error_history: DOWN'ing drive index 0, it has had at least 3 errors in last 12 hour(s)
12:42:10.638 [5764.5792] <2> bptm: EXITING with status 84 <----------
12:42:16.482 [5732.5712] <2> bptm: INITIATING (VERBOSE = 0): -count -cmd -rt 8 -rn 0 -stunit ssltuifil1-hcart3 -den 20 -mt 2 -masterversion 510000
12:42:16.498 [5732.5712] <4> bptm: all drives are down for the specified robot number = 0, robot type = TLD and density = hcart3
12:42:16.498 [5732.5712] <2> bptm: NUM UP 0 0 0 0 0 0
12:42:16.498 [5732.5712] <2> bptm: EXITING with status 0 <----------
12:42:16.826 [3548.2236] <2> bptm: INITIATING (VERBOSE = 0): -U
12:42:16.826 [3548.2236] <2> bptm: EXITING with status 0 <----------
12:42:18.732 [5836.5236] <2> bptm: INITIATING (VERBOSE = 0): -U
12:42:18.732 [5836.5236] <2> bptm: EXITING with status 0 <----------
12:42:19.029 [5180.6052] <2> bptm: INITIATING (VERBOSE = 0): -delete_expired
12:42:19.029 [5180.6052] <2> delete_expired_media: media id 0045L3 is expired, removing it
12:42:19.029 [5180.6052] <2> add_to_vmhost_list: added ssltuifil1 to vmhost list
12:42:19.029 [5180.6052] <2> logconnections: vmd CONNECT FROM 10.174.0.14.2081 TO 10.174.0.14.13701
12:42:19.029 [5180.6052] <2> ParseConfigExA: Unknown configuration option on line 49: RenameIfExists = 0
12:42:19.045 [5180.6052] <2> ParseConfigExA: Unknown configuration option on line 49: RenameIfExists = 0
12:42:19.045 [5180.6052] <2> query_one_volume: server returned:  1 0045L3 ------ 24 000045L3 -------- 8 0 ssltuifil1 00_000_TLD - 27 0 0 0 0 root root 5 Friday_Full - 1455109347 0 1455109465 1457334724 0 0 28 0 0 - 0 0 50 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 - Added by Media Manager

12:42:19.045 [5180.6052] <4> deassign_media: media id 0045L3 removed from media manager database (expired)
12:42:19.045 [5180.6052] <2> bptm: EXITING with status 0 <----------
12:45:58.344 [5524.2628] <2> bptm: INITIATING (VERBOSE = 0): -count -cmd -rt 8 -rn 0 -stunit ssltuifil1-hcart3 -den 20 -mt 2 -masterversion 510000
12:45:58.360 [5524.2628] <2> bptm: NUM UP 1 0 0 0 1 0 a-Drive1
12:45:58.360 [5524.2628] <2> bptm: EXITING with status 0 <----------
12:45:58.406 [4808.5864] <2> bptm: INITIATING (VERBOSE = 0): -count -cmd -rt 8 -rn 0 -stunit storageunit -den 20 -mt 2 -masterversion 510000
12:45:58.406 [4808.5864] <2> bptm: NUM UP 1 0 0 0 1 0 a-Drive1
12:45:58.406 [4808.5864] <2> bptm: EXITING with status 0 <----------
12:45:58.485 [496.5992] <2> bptm: INITIATING (VERBOSE = 0): -delete_expired
12:45:58.485 [496.5992] <2> bptm: EXITING with status 0 <----------

 

 

 

kakaluru
Level 4

Hi Marianne

Please let me know you need any other info......

Nicolai
Moderator
Moderator
Partner    VIP   

Logging still at level 0

12:41:23.481 [5764.5792] <2> bptm: INITIATING (VERBOSE = 0):

This the error:

<16> write_data: cannot write image to media id 0045L3, drive index 0, The request was aborted.

Please take a look in the event viewer and look for SCSI sense key, as this does not look like at "real" write error.  There are multiple causes that will result in a Netbackup write error - 90% are the generic write error, but what different here is the "The request was aborted".

What version of Netbackup are you running ? Are you still on NBU 5.1 ?

 -masterversion 510000

Marianne
Level 6
Partner    VIP    Accredited Certified
Logs do not fix anything. It only helps with troubleshooting. Please check previous failures. Is it same or different media-id? If same tape, please eject that tape and try with new tape. Please also check error file under ...netbackup\db\media. Have a look at the entries for the last couple of days. Are errors seen on same tape drive with different media-ids? This will point to faulty tape drive. If your NBU version is still at 5.1, then your server, OS and devices are probably just as old and unsupported? PS : Please copy log files to .txt (e.g. bptm.txt ) and upload as File attachment.

kakaluru
Level 4

Hi Marianne

Even different medias and different polocies are also giving same error. This issue we are facing for more than a month. I have attached error.txt file.

Under event viewer i was unable to find out " SCSI sense key"?????

Nicolai
Moderator
Moderator
Partner    VIP   

Call the hardware vendor and let them fix the issue  - from the error file its index 0 all the time that cuases the "write" errors.

Simply down the drive. 

 

kakaluru
Level 4

Let me know how do i need to proceed further to resolve this

Marianne
Level 6
Partner    VIP    Accredited Certified
SCSI sense key is a hardware error. I agree with Nicolai. Call the hardware vendor. Give them the SCSI sense key code that you see in the logs.

Nicolai
Moderator
Moderator
Partner    VIP   

Take a look at the Wikipedia page : 

https://en.wikipedia.org/wiki/Key_Code_Qualifier

Scroll down to "Aborted Command". I would expect that the event log (or somwhere else) has recorded Sense key 0A, then use ASC and ASCQ to determine what fault happened.

 

kakaluru
Level 4

Hi All

How to close this now?

Will_Restore
Level 6

Find the response which helped you most, and select

Actions > Mark as Solution

 

Nicolai
Moderator
Moderator
Partner    VIP   

Or if multiple post helped you use the "Request Split Solution".