08-28-2017 06:54 AM
I have a unusual problem with Netbackip 8 and tape HPE StoreEver MSL6480
Configuration:
Master server-01 (VM)
Media server-02, MS server 2016 (physical)
HPE StoreEver MSL6480 , connected (FC) to Media server-02
Situation:
MSL6480 detected and added on Master server-01, duplication-on-tape policy start task, MSL6480 move tape in drive and after that ... nothing happens, no any write process
Error at the end -
Error bpduplicate (pid=10492) Status = no images were successfully processed.
C:\Program Files\Veritas\NetBackup\logs\bptm - log -
-----------
11:34:34.193 [8144.7252] <2> JobInst::sendIrmMsg: starting
11:34:34.193 [8144.7252] <2> packageBptmResourceDoneMsg: msg (MEDIA_DONE 0 743099 0 AFNNNNN 4000043 180 {94E334AC-DDF0-47F1-B48D-9121EF862263})
11:34:34.193 [8144.7252] <2> packageBptmResourceDoneMsg: keyword MEDIA_DONE version 0 jobid 743099 copyNum 0 mediaId AFNNNNN mediaKey 4000043 unloadDelay 180 allocId {94E334AC-DDF0-47F1-B48D-9121EF862263}
11:34:34.193 [8144.7252] <2> packageBptmResourceDoneMsg: returns 0
11:34:34.209 [8144.7252] <2> JobInst::sendIrmMsg: returning
11:34:34.240 [8144.7252] <2> main: Got bpduplicate acknowledgement status: 0, err: 0
11:34:34.240 [8144.7252] <2> bptm: EXITING with status 84 <----------
---
What could be first point to troubleshoot, where problem is?
08-28-2017 12:02 PM
when you say "MSL6480 detected and added on Master server-01" i would assume you have added the tape drives to media server and configured storate unit for media server.
basic question:- are you able to run the backup fine on tapes ? ( i would suggest to test this first)
if backup also failing we would like to see the below details
1)vmoprcmd -d output from media server
2)tpconfig -d output from media server
3) failed job details status
4)bptm log for failed job process
08-29-2017 12:05 AM
We need a bit more info please.
Firstly we need to be sure that the MSL6480 was detected on server-02 and added on Master server-01 for server-02.
Sorry, we cannot just assume...
If all is well and devices configured for media server, then please get us output of all the commands that @RamNagalla asked for.
I also agree with Ram - first get a small backup going before you try duplication.
For additional troubleshooting, add VERBOSE entry in media server's vm.conf.
Additional hardware-related info and errors will be logged in Event Viewer System and Application logs.
Next, increase bptm logging level to 3.
After next failure, copy bptm log to bptm.txt and upload here. (We need to see everything that happened from the start of bptm process - not just the error at the end.)
Text export of System and Appliacation logs will also help.
08-29-2017 01:15 AM
>> i would assume you have added the tape drives to media server and configured storate unit for media server.
-
Yes. I see MSL6480 as robot and 2 drives on Media server in windows device manager, and as robot-tld-** in "storage units" on Master. I see tapes in slots and robot stat in NBU device monitor., and it's same tapes in slots as from MSL6480 web console.
Test policy for duplication - small (40Gb) Vm, not all disk/storage.
vmoprcmd -d
PENDING REQUESTS
<NONE>
DRIVE STATUS
Drv Type Control User Label RecMID ExtMID Ready Wr.Enbl. ReqId
0 hcart TLD - No - 0
0 hcart TLD - No - 0
1 hcart TLD - No - 0
1 hcart TLD - No - 0
ADDITIONAL DRIVE STATUS
Drv DriveName Shared Assigned Comment
0 HP.ULTRIUM7-SCSI.000 No -
0 HP.ULTRIUM7-SCSI.000 No -
1 HP.ULTRIUM7-SCSI.001 No -
1 HP.ULTRIUM7-SCSI.001 No -
----------------------------------------------
tpconfig –d
Id DriveName Type Residence
SCSI coordinates/Path Status
****************************************************************************
0 HP.ULTRIUM7-SCSI.000 hcart TLD(1) DRIVE=1
{2,0,7,0} UP
{1,0,6,0} UP
1 HP.ULTRIUM7-SCSI.001 hcart TLD(1) DRIVE=2
{2,0,6,0} UP
{1,0,7,0} UP
Currently defined robotics are:
TLD(1) SCSI coordinates = {1,0,6,1}
EMM Server =My_servername
08-29-2017 01:43 AM
3) failed job details status
--
1: (196) client backup was not attempted because backup window closed-
4)bptm log for failed job process
- attached
08-29-2017 01:47 AM
>>Text export of System and Appliacation logs will also help.
-
I can't find any record from NBU or from MSL6480 in this log.
08-29-2017 02:42 AM - edited 08-29-2017 02:45 AM
The log show a backup to disk not tape :
write_backup: begin writing backup id vm01_1503954160, copy 1, fragment 1, destination path F:\NBU_07\
Are you using the basic disk storage unit with stagening to tape ?
08-29-2017 02:47 AM
You need to add VERBOSE to vm.conf on media server and restart NBU Device Management service.
After this, hardware info will be logged to Event Viewer logs.
Please run a small backup to the tape STU.
If that fails, then collect bptm log and post here.
08-29-2017 02:49 AM
>>Are you using the basic disk storage unit with stagening to tape ?
-
Well, perhaps I made mistake and took another part from 2+Mb logfile for examlpe.
Sorry.
08-29-2017 03:08 AM
>>You need to add VERBOSE to vm.conf on media server and restart NBU Device Management service.
After this, hardware info will be logged to Event Viewer logs.
Please run a small backup to the tape STU.
-
ok. I add VERBOSE in C:\Program Files\Veritas\Volmgr - to vm.conf on media server , restart NetBackup Device Manager / "C:\Program Files\Veritas\Volmgr\bin\ltid.exe" , change policy storage in test policy for testVM to robot-tld, and change log-level in "host properties - media server - properties - logging - BPTM logging level" - from 5 to 3.
And I just wait 15 minutes, while previous task will be canceled\ended. It's strange, but NBU task did't see that I reboot media server
08-29-2017 03:09 AM
If a backup to tape fails, you will see bptm PID in Activity Monitor. Probably a child bptm PID as well.
Extract all entries for the particular PID(s) and paste in bptm.txt.
08-29-2017 04:07 AM
Win app log:
29.08.2017 13:20:21 starting tldcd
29.08.2017 13:20:23 TLD(1) Creating Process for INITIALIZE: "C:\Program Files\Veritas\Volmgr\bin\tldd.exe" -v -child -ro 9 -rn 1
../tldcd.c.3034, process_request(), received command=9, from peername=myMEDIA.mydomain.dom, version 50
TLD(1) [4656] opening robotic path {2,0,7,1} (bus -1, target -1, lun -1)
inquiry() function processing library HP MSL6480 5.20:
TLD(1) NumDrives = 2, FirstDriveAddr = 1
TLD(1) NumSlots = 80, FirstSlotAddr = 1001
TLD(1) NumIE = 0, InportAddr = 101, OutportAddr = 101
DecodeQuery() Actual status: STATUS_SUCCESS
TLD(1) going to UP state
TLD(1) DismountTape ****** from drive 1
TLD(1) Creating Process for DISMOUNT: "C:\Program Files\Veritas\Volmgr\bin\tldd.exe" -v -child -ro 3 -ltidev 0 -ltix 1 -rn 1 -slot -1 -lf 2 -flgs 0 -rs 65536 -pd 180 -sd 15 -path *NULL* -nh *NULL* -vsn ******
...
29.08.2017 13:22:37 TLD(1) [7248] opening robotic path {2,0,7,1} (bus -1, target -1, lun -1)
29.08.2017 13:22:55 TLD(1) initiating MOVE_MEDIUM from addr 1 to addr 1028
29.08.2017 13:23:27 TLD(1) closing/unlocking robotic path
29.08.2017 13:25:12 TLD(1) MountTape NNN6 on drive 2, from slot 29
TLD(1) Creating Process for MOUNT: "C:\Program Files\Veritas\Volmgr\bin\tldd.exe" -v -child -ro 1 -ltidev 1 -ltix 1 -rn 1 -slot 29 -lf 0 -flgs 0 -rs 65536 -pd 180 -sd 15 -path {1,0,7,0} -vsn NNN6 -b NNN6
29.08.2017 13:25:22 TLD(1) [372] opening robotic path {2,0,7,1} (bus -1, target -1, lun -1)
C:\Program Files\Veritas\NetBackup\logs\bptm - ALL_ADMINS.082917_00001 - attached
08-29-2017 04:30 AM - edited 08-29-2017 04:49 AM
After all black magic and reboot - direct write to tape started.
But it's so slooow (for FC 8G LTO7) - see attach
08-29-2017 07:14 AM
set NUMBER_DATA_BUFFERS and NUMBER_DATA_BUFFERS_RESTORE to 512.
https://www.veritas.com/support/en_US/article.TECH18422
08-30-2017 01:57 AM - edited 08-30-2017 02:19 AM
ok. I made C:\Program Files\Veritas\NetBackup\db\config dir and 2 files (without extension) -
- NUMBER_DATA_BUFFERS
- NUMBER_DATA_BUFFERS_RESTORE
with clear text 512 inside
Result: no difference -
Detailed status - begin writing and .. nothing more.
30.08.2017 11:39:26 - Info bptm (pid=5864) start
30.08.2017 11:39:36 - Info bptm (pid=5864) using 65536 data buffer size
30.08.2017 11:39:36 - Info bptm (pid=5864) setting receive network buffer to 263168 bytes
30.08.2017 11:39:36 - Info bptm (pid=5864) using 512 data buffers
30.08.2017 11:39:37 - Info bptm (pid=5864) start backup
30.08.2017 11:39:37 - Info bptm (pid=5864) Waiting for mount of media id NNN (copy 1) on server ***
30.08.2017 11:39:37 - mounting NNN
30.08.2017 11:43:34 - Info bptm (pid=5864) media id NNN mounted on drive index 1, drivepath {1,0,7,0}, drivename HP.ULTRIUM7-SCSI.001, copy 1
30.08.2017 11:43:35 - mounted NNN ; mount time: 0:03:58
30.08.2017 11:43:40 - positioning NNN to file 1
30.08.2017 11:44:05 - positioned NNN ; position time: 0:00:25
30.08.2017 11:44:05 - begin writing
08-30-2017 04:53 AM
Is this a VMware backup?
If so, are you using VMware Intelligent Policies with Resource Limits?
Please show us the last 10-15 lines for PID 5864 in bptm file to see the effect of buffer size and number.
Oh - the screenshot and your post above show different values for number of data buffers.
Why is that?
08-30-2017 05:34 AM
>>Is this a VMware backup?
yes, but today I made new task for file (windows) backup.
>>show different values for number of data buffers.
I change this value from def to 512.
Result now: process stop on line:
15:05:06.282 [5580.6928] <2> write_data: first write, twin_index: 0 cindex: 0 dont_process: 1 wrote_backup_hdr: 1 finished_buff: 0
15:05:06.282 [5580.6928] <2> write_data: received first buffer (65536 bytes), begin writing data
(and after 5 minutes I stop task) -
15:09:13.665 [1436.6960] <2> initialize: fd values STDOUTSOCK=520 STDERRSOCK=120
15:09:13.665 [1436.6960] <2> bptm: instance - 1398296
15:09:13.681 [1436.6960] <2> bptm: INITIATING (VERBOSE = 5): -rptdrv -jobid -1504093765 -jm
15:09:13.681 [1436.6960] <2> bptm: Event Global\NetBackup Terminate Event, pid: 1436 created.
15:09:13.681 [1436.6960] <2> bptm: PORT_STATUS = 0x00000000
08-30-2017 06:23 AM
There is no way that we can comment on performance or tell you where to look if we cannot see the end of a successful job.
08-30-2017 06:24 AM - edited 08-30-2017 06:25 AM
Set SIZE_DATA_BUFFERS to 262144, you are using a 64K block size (using 65536 data buffer size)
If data comming in too slow, buffer tuning won't help. But setting the buffers is always step number 1.
09-04-2017 04:18 AM
>>if we cannot see the end of a successful job.
I don't have any successul job, even if I try write small 2-Gb file from local drive.
Perhaps, best way is try another solution - veeam or bareos