cancel
Showing results for 
Search instead for 
Did you mean: 

Netbackip 8 and tape HPE StoreEver MSL6480

Dmitry_K
Level 4

I have a unusual problem with Netbackip 8 and tape HPE StoreEver MSL6480

Configuration:

Master server-01 (VM)

Media server-02, MS server 2016 (physical)

HPE StoreEver MSL6480 , connected (FC) to Media server-02

Situation:

MSL6480 detected and added on Master server-01, duplication-on-tape policy start task, MSL6480 move tape in drive and after that ... nothing happens, no any write process

Error at the end -

Error bpduplicate (pid=10492) Status = no images were successfully processed.

C:\Program Files\Veritas\NetBackup\logs\bptm - log -

-----------

11:34:34.193 [8144.7252] <2> JobInst::sendIrmMsg: starting

11:34:34.193 [8144.7252] <2> packageBptmResourceDoneMsg: msg (MEDIA_DONE 0 743099 0 AFNNNNN 4000043 180 {94E334AC-DDF0-47F1-B48D-9121EF862263})

11:34:34.193 [8144.7252] <2> packageBptmResourceDoneMsg: keyword MEDIA_DONE version 0 jobid 743099 copyNum 0 mediaId AFNNNNN  mediaKey 4000043 unloadDelay 180 allocId {94E334AC-DDF0-47F1-B48D-9121EF862263}

11:34:34.193 [8144.7252] <2> packageBptmResourceDoneMsg: returns 0

11:34:34.209 [8144.7252] <2> JobInst::sendIrmMsg: returning

11:34:34.240 [8144.7252] <2> main: Got bpduplicate acknowledgement status: 0, err: 0

11:34:34.240 [8144.7252] <2> bptm: EXITING with status 84 <----------

---

What could be first point to troubleshoot, where problem is?

27 REPLIES 27

Marianne
Level 6
Partner    VIP    Accredited Certified

We have been trying to help you for some time now, but you still have not given us sufficient info to even try and assist.

I have asked a couple of days ago:

"If a backup to tape fails, you will see bptm PID in Activity Monitor. Probably a child bptm PID as well.

Extract all entries for the particular PID(s) and paste in bptm.txt."

You have not given us any evidence to show that NBU is at fault.
NBU has been working fine writing to tape for over 20 years in thousands upon thousands installations across the world.
This says to me that there is something wrong in your environment.

So, if you really think that another product is going to magically work better and solve all your problems, then please go ahead.

If you need our assistance, then please stop threatening fellow NBU users with other products and help us to help you.

>>You have not given us any evidence to show that NBU is at fault.

-

I still don't have any troubleshoot method - like

1 - check config in file.conf , and try to change block \ buffer size

2 - run Robtest and do debug dm / m s1 d1 / inload d1

3 - enable advanced scsi log

4 etc

Problem is that the are no ERROR in C:\Program Files\Veritas\NetBackup\logs\bptm ALL_ADMINS.092917_00001.log - process look like waiting .. something. Tape move from slot to drive, and - there are no any action after.

 

>>So, if you really think that another product is going to magically work better and solve all your problems

-

Well, there I don't need any magic - I already tested Veeam, but still hope find solution for NBU. Perhaps, 8.1 will help

Well, I have first positive result -

after read lots themes with same problem -

https://vox.veritas.com/t5/NetBackup/Netbackup-job-hung-on-begin-writing-after-92h/td-p/717990

and https://www.veritas.com/support/en_US/article.000007259

and https://www.veritas.com/support/zh_CN/article.000012662

and https://vox.veritas.com/t5/NetBackup/Backup-is-downing-the-drive/td-p/648177

 and https://vox.veritas.com/t5/NetBackup/Netbackup-drive-going-down-randomly/td-p/603769

and http://www.tek-tips.com/viewthread.cfm?qid=1560455

and http://forum.ixbt.com/topic.cgi?id=66:8013

and http://blog.sina.com.cn/s/blog_4d22b9720100r0fj.html

and https://vox.veritas.com/t5/NetBackup/Issues-in-Tapelibrary-using-robtest/td-p/714709

and test lib from command line ( - <install_path>veritas\volmgr\bin\robtest.exe )

https://www.veritas.com/support/en_US/article.000084292

and Robtest commands that can be used to test the SCSI functionality of a robot

https://www.veritas.com/support/en_US/article.TECH83129

and read C:\ProgramData\Microsoft\Windows\WER\ReportQueue

- after that all - I had 3 (0.8 mb, 1.2 mb, 2 mb) completed task -

03.10.2017 14:37:10 - positioning NNN123 to file 3
03.10.2017 14:37:26 - positioned NNN123; position time: 0:00:16
03.10.2017 14:37:26 - begin writing
03.10.2017 14:39:17 - Info bptm (pid=4820) waited for full buffer 0 times, delayed 0 times
03.10.2017 14:40:32 - Info bptm (pid=4820) EXITING with status 0 <----------
03.10.2017 14:40:32 - Info bpbrm (pid=6804) validating image for client srv.contoso.com
03.10.2017 14:40:33 - Info bpbkar32 (pid=1748) done. status: 0: the requested operation was successfully completed
03.10.2017 14:40:33 - end writing; write time: 0:03:07
the requested operation was successfully completed  (0)

 

Stat:

def settings: 21 kb/sec, near 1 mb/min buffer speed. e.q 20 mb / 18 Min

Log:

03.10.2017 16:42:57 - begin writing

03.10.2017 17:00:27 - Info bptm (pid=5472) waited for full buffer 0 times, delayed 0 times

03.10.2017 17:01:42 - Info bptm (pid=5472) EXITING with status 0 <----------

new settings

C:\Program Files\Veritas\NetBackup\db\config

NUMBER_DATA_BUFFERS = 32 (def - using 30 data buffers)

SIZE_DATA_BUFFERS = 524288 (def - using 65536 data buffer size)

80 mb file / 10 min, 170 kb/sec == 8..10 Mb/min

03.10.2017 17:29:14 - Info bptm (pid=1648) setting receive network buffer to 2098176 bytes

03.10.2017 17:34:54 - begin writing
03.10.2017 17:41:39 - Info bpbkar32 (pid=5936) bpbkar waited 125 times for empty buffer, delayed 12266 times.
03.10.2017 17:42:55 - Info bptm (pid=5828) waited for full buffer 0 times, delayed 0 times
03.10.2017 17:44:12 - Info bptm (pid=5828) EXITING with status 0 <----------

Well, it's not as slow as a turtle, but i think 16G SAN and LTO7 drives could be faster.

Local speed test

https://www.veritas.com/support/en_US/article.TECH17541

How to benchmark the performance of the bpbkar32 process on a Windows client

test 01

12:10:10.843 [7124.7032] <2> tar_base::backup_finish: TAR - backup:          file data:  686747648 bytes  2 gigabytes

12:10:10.843 [7124.7032] <2> tar_base::backup_finish: TAR - backup:         image data:  686752768 bytes  2 gigabytes

12:10:10.843 [7124.7032] <2> tar_base::backup_finish: TAR - backup:       elapsed time:         28 secs    101222729 bps

test 02

12:18:51.153 [7336.7928] <2> tar_base::backup_finish: TAR - backup:          file data:  746261587 bytes  7 gigabytes

12:18:51.153 [7336.7928] <2> tar_base::backup_finish: TAR - backup:         image data:  746266624 bytes  7 gigabytes

12:18:51.153 [7336.7928] <2> tar_base::backup_finish: TAR - backup:       elapsed time:         17 secs    486027023 bps

 

Bptm log

14:40:21.634 [6832.5584] <2> write_data: received first buffer (524288 bytes), begin writing data

14:45:03.647 [1648.1840] <2> fill_buffer: [6832] socket is closed, waited for empty buffer 93 times, delayed 17950 times, read 82023424 bytes

***

14:48:12.645 [6832.5584] <2> set_job_details: Tfile (760566): LOG 1507204092 4 bptm 6832 waited for full buffer 0 times, delayed 0 times

14:48:12.645 [6832.5584] <2> send_job_file: job ID 760566, ftype = 3 msg len = 75, msg = LOG 1507204092 4 bptm 6832 waited for full buffer 0 times, delayed 0 times

***

I see this problem is in FC speed, between media server and FC tape, (becouse when I changed job destination to hdd, speed looks like hdd speed, and there are no difference in nbd /san / agent over network /agent from local SSD drive / backup speed) but there are no any settings and diagnostic utility to find where the problem really is.

 

Well, I think that our team found source of this problem.

Now Netbackup show speed news 60 mb/sec, 40Gbyte / 11 minutes.

I will write troubleshooting .. perhaps next week.

 

Short notes for Veritas netbackup and tape library - Hpe Storeever Msl2024/4048/6480 Tape Library troubleshooting.

 First.

You need check and recheck your configuration. For example, nowadays all LTO drives have second ports. Most time you cannot use the second port for MPIO – this port is only for failover. Besides, you need something like “High Availability Control Path Failover license”.

In addition, the best way for troubleshooting in a complicated situation is to use separate HBA for link with the library.

 

Second.

There are no integrated methods and utilities for full troubleshooting in Veritas Netbackup.

At the same time, there are two useful utilities –

 

- bpbkar32

How to benchmark the performance of the bpbkar32 process on a Windows client

https://www.veritas.com/support/en_US/article.TECH17541

 

- robtest

Troubleshooting Robot or Drive Issues in NetBackup

https://www.veritas.com/support/en_US/article.TECH169477

 

robtest doesn’t work in debug mode in Windows server 2016, but It’s a very helpful utility to check connectivity between the library and the host, apart from that utility can only move tapes from the library to the drive and back, and if you see this movement in the library’s WEB interface, this mean that it has connected.

 

Third.

The Veritas Netbackup logs are very useful, if a problem exist somewhere inside of Netbackup, and partially helpful if the problem is somewhere between Netbackup and Vmware API, e.g. if Netbackup media server lock snapshot file. In other cases these logs are mostly useless, but this means, that if you do not see any problem in the log – you must go deeper.

 

Fourth.

I could not find a utility to go as deep as possible inside SCSI over FC troubleshooting. Of course you can try to read a manual like “How to Capture Fiber Channel traffic” (with Xgig - Xgig 8G Fibre Channel Analyzer -   http://community.brocade.com/t5/Fibre-Channel-SAN/Sniffing-Fibre-Channel-Packets-from-Brocade-Switch... ), however you can check errors, firmware, bios, etc in your HBA by using Qlogic San serfer (for Win 2008/2012) or Qlogic converged console (for Win 2016).

Do not forget to install Windows SuperInstaller (x64) for install FC-FCoE, iSCSI, and Ethernet Networking Management Agents. For HPE this utility called “QConvergeConsole Management Utility GUI for HP Branded QLogic based Fibre Channel, Converged Network and Intelligent Ethernet Adapters.”

 

 

Fifth.

Try Linear Tape File System (LTFS) on the media server - LTFS Configuration (LTFS for Windows).

Don’t mix it with HPE StoreOpen Standalone and HPE StoreOpen Automation.

 

Sixth.

I found only one tape-tested utility – LTT by HPE.   

HPE Library and Tape Tools - is a free, downloadable, robust diagnostic tool for all of HP's tape storage and magneto-optical storage products.

LTT 4.24 is a new architecture with web based GUI and monitoring capability of TapeAssure.

https://www.hpe.com/ru/ru/product-catalog/storage/storage-software/pip.hpe-library-and-tape-tools.40...

 

Seventh.

Last, but not least. You should update all drivers and firmware on all devices. In my case that means, that I should use not only a Service Pack for ProLiant (SPP), but read the HPE Data Availability, Protection and Retention Compatibility Matrix https://support.hpe.com/hpsc/doc/public/display?sp4ts.oid=412183&docId=emr_na-c04616269&docLocale=en...

And update

- HBA firmware (in my case from 8.05.60 /12 Jul 2017 up to 8.05.61 / 25 Sep 2017)

- HBA drivers for windows (in my case from 9.2.2.20/ 29 Jule 2016 to 9.2.5.20 / 25 Sep 2017)

Despite this FW and driver work correct with SAN storage and work correct under Veeam control – all looks like that something was broken somewhere in firmware or in drivers or in some windows server 2016 subsystems, because LTT speed test did not work until I installed new FW and drivers, in spite of this, after some time everything worked fine.

 

- Tape library FW and drive FW

- Windows driver for robot (library) and drives.

- Perhaps you should update SAN switch firmware too, and you should check errors counts on interfaces from both sides – from server’s ports and from LTO ports. If you see any errors in the ports – check physical parts – patch cords, ports, connections between rack, etc.

 

All this means, that the shortest way, before starting to read endless logs and traces, is:

- Check errors counters on switch

- Install last FW on any devices in full patch - from server’s HBA and drivers to library and library’s and drive’s drivers. My main mistake was that I thought that if HBA work with LUN, it will work with LTO.

- Make this patch and all configuration as simply as possible, e.q remove any other LUN, software/service, etc.

- Look in Qlogic converged console for errors

- Try to check LTO with HPE Library and Tape Tools.