06-16-2015 01:21 AM
Good morning,
As part of my continued testing on the duplication of data from LTO2 to LTO6 tapes, the next phase of my plan is to fill a new LTO6 tape with X number of backup sessions from an LTO2 tape. Thus far I have successfully imported a test backup from a single LTO2 tape and duplicated this onto LTO6. Brand new tapes have been used for testing.
The next phase of testing is to ensure that I can duplicate additional LTO2 tape data onto the same LTO6 tape that I used for the initial duplication. However, when I attempt this I get the following error message.
Clearly the error is stating there is a positioning problem, however, the robots and drives have been operating absolutely fine.
16/06/2015 08:29:38 - begin Duplicate
16/06/2015 08:29:41 - started process bptm (1452)
16/06/2015 08:29:38 - requesting resource epany-archive-hcart3-robot-tld-1
16/06/2015 08:29:38 - requesting resource A001L2
16/06/2015 08:29:38 - reserving resource A001L2
16/06/2015 08:29:39 - reserved resource A001L2
16/06/2015 08:29:39 - granted resource A001L2
16/06/2015 08:29:39 - granted resource LTO3-1
16/06/2015 08:29:39 - granted resource A00010
16/06/2015 08:29:39 - granted resource HP.ULTRIUM6-SCSI.000
16/06/2015 08:29:39 - granted resource epany-archive-hcart3-robot-tld-1
16/06/2015 08:29:44 - started process bptm (2416)
16/06/2015 08:29:46 - started process bptm (2416)
16/06/2015 08:29:46 - mounting A001L2
16/06/2015 08:29:47 - started process bptm (1452)
16/06/2015 08:29:47 - mounting A00010
16/06/2015 08:30:47 - mounted; mount time: 00:01:01
16/06/2015 08:30:47 - positioning A001L2 to file 1
16/06/2015 08:30:47 - positioned A001L2; position time: 00:00:00
16/06/2015 08:30:47 - begin reading
16/06/2015 08:31:54 - Warning bptm(pid=1452) cannot locate on drive index 2, No more data is on the tape.
16/06/2015 08:32:23 - Error bptm(pid=1452) cannot position media id A00010 for write
16/06/2015 08:32:24 - Error bpduplicate(pid=900) host epany-archive backupid epany-archive_1434382187 write failed, media position error (86).
16/06/2015 08:32:24 - Error bpduplicate(pid=900) Duplicate of backup id epany-archive_1434382187 failed, media position error (86).
16/06/2015 08:32:24 - Error bpduplicate(pid=900) Status = no images were successfully processed.
16/06/2015 08:32:24 - end Duplicate; elapsed time: 00:02:46
no images were successfully processed(191)
Solved! Go to Solution.
06-16-2015 11:18 AM
Ok - now expire the media A00010, or expire the two images on it and bpexpdate -deassignempty, and then try two back to back (consecutive) duplications again.
{...clarified the process...}
If it works then
----who knows what was wrong before.
else if it fails then
----try back-to-back (i.e. two consecutive) duplications again but to a another different new LTO6 media
----if these work then
----------suspect bad media A00010
----else if these also fail then
----------it's looking likely to be a firmware/driver/compatibility type issue. (and probably not a buffer size issue).
----endif
endif
06-16-2015 01:39 AM
Need to see bptm logs. Please post as an attachment.
06-16-2015 01:43 AM
Load the 'problem' tape into a drive with the following command, it will take a few moments for the command prompt to return In this example I have used media id E03002, please use the media ID for your tape.
Example:
tpreq -m E03002 -f E03002
Run vmoprcmd command (in volmgr\bin dir), this will show which drive the tape is in.
IBM.ULT3580-TD4.001 Yes Yes E03002 E03002 Yes hcart
nbmaster1 /dev/nst4 SCAN-TLD
nbmedia1 {4,0,1,3} ACTIVE
Here we see the tape is loaded in the drive with the path {4,0,1,3}
Next, run scsi_command -map -d {4,0,1,3} command (substitute the correct drive path in from vmoprcmd output) and output this to a file
scsi_command -map {drive_path} >scsi_map_output.txt
When finished, unload :
tpunmount -f E03002
When complete, please attach scsi_map_output.txt file to this post.
06-16-2015 02:03 AM
Hi, not being too familiar with Netbackup, I'm not entirely sure which logs these are, location etc?
Is this the correct location... C:\Program Files\VERITAS\NetBackup\logs
06-16-2015 02:09 AM
Yes bptm logs should be under that folder path. If the 'bptm' folder does not exist, then create it with 'mkdir bptm'. Then adjust bptm logging level for the NetBackup Server(s) performing the work, i.e. the source media server, and the target media server, and then retry the duplication.
But do mph999's steps first. Let's see what that shows, before retrying the duplication to capture bptm logs.
06-16-2015 02:46 AM
did you power-off the tape drive while A001L2 was mounted ?
Looks like the tape wasn't dismounted properly. LTO tapes has a index tabel over data (somthing like a FAT tabel for disks). The index get updated upon dismount of the tape. If tape drive experince a power-cut the index doesn't get updated and you may get a "cannot locate" error.
Any SCSI sense key in event log ?
06-16-2015 02:53 AM
Not sure why but the system is failing to reconize the scsi command. I'm typing in the following:
scsi_command -map -d {7,0,0,0} >scsi_map_output.txt
06-16-2015 02:58 AM
It's possible that this has happened, not sure if the tape was definitely in the drive however..... I had powered the TLD off at the end of the day. It's a Tandberg Magnum that doesn't appear to have a soft powerdown option.
06-16-2015 02:59 AM
If the PATHs system variable doesn't have the NBU paths, yuo need to run the command from the volmgr \bin dir:
<install path>\veritas\volmgr\bin\scsi_command ...
06-16-2015 03:08 AM
That is indeed where I'm running it from.
DRIVE STATUS
Drive Name Label Ready RecMID ExtMID Wr.Enbl. Type
Host DrivePath Status
=============================================================================
HP.ULTRIUM6-SCSI.000 Yes Yes A00010 A00010 Yes hcart3
epany-archive {7,0,0,0} ACTIVE
LTO3-0 No No No hcart3
epany-archive {6,0,0,0} TLD
LTO3-1 No No No hcart3
epany-archive {6,0,1,0} TLD
C:\Program Files\VERITAS\Volmgr\bin>scsi_command -map -d {7,0,0,0} >scsi_map_out
put.txt
'scsi_command' is not recognized as an internal or external command,
operable program or batch file.
C:\Program Files\VERITAS\Volmgr\bin>
06-16-2015 05:10 AM
Waht version of NBU are you running ?
06-16-2015 05:13 AM
It's only version 6.0. Unfortunately I'm unable to change that situation.
06-16-2015 05:22 AM
Just x2 checked, it should be in volmgr\bin but I have a feeling it wasn't available in all versions, it's a more recent addition (to windows).
That is going to make lfe difficult - as it would show what is on the tape in an understandable format.
I suspect, but cannot show (without scsi_command) that the empty header is missing from the end of the tape. Without this, we cannot position to append more data at a later time, as we cannot determine where the 'logical' end of the data is. We look for the empty header, and this can be seen in the bptm log.
I've seen this before, and the cause was a faulty tape drive, which for some reason didn't write the empty header, but didn't error in any way, and thus affected many tapes.
Replacing the drive alone, resolved the issue.
06-16-2015 05:30 AM
I've enabled Robust Logging and created a folder called BPTM. I'll re-run the Duplication and see if any log data is produced.
06-16-2015 05:47 AM
I've attached the BPTM log files.
06-16-2015 05:47 AM
Do you have this file:
?:\Program Files*\Veritas\NetBackup\db\config\SIZE_DATA_BUFFERS
...if so, what value is within this file?
.
Also, would like to see bptm logs, because the original LTO2 media were probably written with 64KB buffers, but LTO6 typically likes to use 256KB buffers, so I'm wondering whether your issue of a missing empty tape mark has something to do with buffer sizes (i.e. the actual block sizes used for tape IO) - and maybe the LTO6 drive is simply too new and possibly incompatible with NetBackup v6.0 - which may also mean that your NetBackup v6.0 "device mapping file" maybe incompatible with the LTO6 tape drive.
06-16-2015 06:06 AM
SIZE_DATA_BUFFERS is not present in the specified folder location.
I zipped the BPTM logs and attached in the previous posting.
06-16-2015 06:27 AM
Filtering the log:
$ egrep -i "<4>|<8>|<16>|<32>|error|status|critical|fatal|sever|fail|cannot|unable" 061615_00001.log | grep -iv "status 0" 13:32:21.656 [1712.2216] <4> create_tpreq_file: {7,0,0,0} 13:33:57.640 [1192.744] <4> db_getSTUNIT: emmserver_name = epany-archive 13:33:57.640 [1192.744] <4> db_getSTUNIT: emmserver_port = 1556 13:35:02.859 [2652.2688] <2> db_error_add_to_file: dberrorq.c:midnite = 1434409200 13:35:02.859 [2652.2688] <4> read_backup: begin reading backup id epany-archive_1434382187 (duplicate), copy 1, fragment 1 from media id A001L2 on drive index 1 13:36:00.187 [1192.744] <2> db_error_add_to_file: dberrorq.c:midnite = 1434409200 13:36:00.187 [1192.744] <8> io_position_for_write: cannot locate on drive index 2, No more data is on the tape. 13:36:24.015 [1192.744] <2> io_read_block: read error on media id A00010, drive index 2 reading header block, len = 0; No more data is on the tape. (1104) 13:36:24.203 [1192.744] <2> io_read_block: read error on media id A00010, drive index 2 reading header block, len = 0; No more data is on the tape. (1104) 13:36:24.406 [1192.744] <2> io_read_block: read error on media id A00010, drive index 2 reading header block, len = 0; No more data is on the tape. (1104) 13:36:24.609 [1192.744] <2> io_read_block: read error on media id A00010, drive index 2 reading header block, len = 0; No more data is on the tape. (1104) 13:36:24.828 [1192.744] <2> io_read_block: read error on media id A00010, drive index 2 reading header block, len = 0; No more data is on the tape. (1104) 13:36:25.062 [1192.744] <2> io_read_block: read error on media id A00010, drive index 2 reading header block, len = 0; No more data is on the tape. (1104) 13:36:25.062 [1192.744] <2> db_error_add_to_file: dberrorq.c:midnite = 1434409200 13:36:25.062 [1192.744] <16> io_position_for_write: cannot position media id A00010 for write 13:36:25.062 [1192.744] <2> send_MDS_msg: DEVICE_STATUS 1 113 epany-archive A00010 4000020 HP.ULTRIUM6-SCSI.000 2000006 POSITION_ERROR 0 0 13:36:25.062 [1192.744] <2> log_media_error: successfully wrote to error file - 06/16/15 13:36:25 A00010 2 POSITION_ERROR HP.ULTRIUM6-SCSI.000 13:36:25.062 [1192.744] <2> check_error_history: just tpunmount: called from bptm line 20233, EXIT_Status = 86 13:36:25.062 [1192.744] <2> bptm: EXITING with status 86 <---------- 13:36:25.062 [2652.2688] <2> read_data: received exit status 86 from write side of duplicate 13:36:25.093 [2652.2688] <2> check_error_history: just tpunmount: called from bptm line 12149, EXIT_Status = 86 13:36:27.750 [2944.492] <4> create_tpreq_file: {7,0,0,0} 13:39:27.875 [1904.1964] <4> create_tpreq_file: {6,0,0,0}
.
It's definitely failing to write to media A00010.
.
Perhaps try a backup to it?
Can you expire what is already on A00010, then bpexpdate -deassignempty, and then create a temporary pool, and then assign this media to that pool, and then create a dummy backup job to write to that pool, so that it selects A00010, and try a big backup perhaps. Or try two small backups of the same job/policy, one after another, and see if the second backup job also fails to append to the same media?
06-16-2015 07:08 AM
I've expired the tape and moved it to a seperate pool, I then created a new policy to backup some local data... The following message appears in the log when trying to execute the policy. When performing an inventory on the TLD it does see the A00010 tape.
16/06/2015 15:04:03 - requesting resource LTO6-LIB
16/06/2015 15:04:03 - requesting resource epany-archive.NBU_CLIENT.MAXJOBS.epany-archive
16/06/2015 15:04:03 - requesting resource epany-archive.NBU_POLICY.MAXJOBS.LTO6DuplicationPool-BackupTest
16/06/2015 15:04:03 - Error nbjm(pid=2228) NBU status: 96, EMM status: No media is available
unable to allocate new media for backup, storage unit has none available(96)
06-16-2015 07:16 AM
Have you ensured the policy is using the correct pool? Also that the tape is of the same type as the tape drives in the storage unit?