Solved: some driver can't be use to write data.

Frank_Xiao · ‎08-18-2014

Dear all

currently we meet some problem with NBU6.5, that we have 10 driver. but currently we just can use 6 driver.

other 4 driver nerver be used.

I had try to do nbrbutil.exe -dump

MDS allocations in EMM:

MdsAllocation: allocationKey=2371023 jobType=1 mediaKey=4002804 mediaId=0407L4 driveKey=2000406 driveName=IBM.ULT3580-TD4.015 drivePath={2,0,3
,0} stuName=sv5353-LTO4 masterServerName=sv5353 mediaServerName=sv5353 ndmpTapeServerName= diskVolumeKey=0 mountKey=0 linkKey=0 fatPipeKey=0 scsiResTy
pe=1 serverStateFlags=1
MdsAllocation: allocationKey=2374586 jobType=11 mediaKey=4002600 mediaId=0496L4 driveKey=2000412 driveName=IBM.ULT3580-TD4.017 drivePath={2,0,
7,0} stuName= masterServerName=sv5353 mediaServerName=sv5353 ndmpTapeServerName= diskVolumeKey=0 mountKey=0 linkKey=0 fatPipeKey=0 scsiResType=1 serve
rStateFlags=1
MdsAllocation: allocationKey=2374589 jobType=11 mediaKey=4002473 mediaId=0892L4 driveKey=2000404 driveName=IBM.ULT3580-TD4.014 drivePath={2,0,
4,0} stuName= masterServerName=sv5353 mediaServerName=sv5353 ndmpTapeServerName= diskVolumeKey=0 mountKey=0 linkKey=0 fatPipeKey=0 scsiResType=1 serve
rStateFlags=1
MdsAllocation: allocationKey=2374642 jobType=11 mediaKey=4003480 mediaId=0870L4 driveKey=2000375 driveName=IBM.ULT3580-TD4.001 drivePath={2,0,
6,0} stuName= masterServerName=sv5353 mediaServerName=sv5353 ndmpTapeServerName= diskVolumeKey=0 mountKey=0 linkKey=0 fatPipeKey=0 scsiResType=1 serve
rStateFlags=1
MdsAllocation: allocationKey=2374646 jobType=11 mediaKey=4003575 mediaId=1528L4 driveKey=2000414 driveName=IBM.ULT3580-TD4.002 drivePath={2,0,
8,0} stuName= masterServerName=sv5353 mediaServerName=sv5353 ndmpTapeServerName= diskVolumeKey=0 mountKey=0 linkKey=0 fatPipeKey=0 scsiResType=1 serve
rStateFlags=1
MdsAllocation: allocationKey=2374676 jobType=2 mediaKey=4003504 mediaId=0716L4 driveKey=2000377 driveName=IBM.ULT3580-TD4.000 drivePath={2,0,5
,0} stuName= masterServerName=sv5353 mediaServerName=sv5353 ndmpTapeServerName= diskVolumeKey=0 mountKey=0 linkKey=0 fatPipeKey=0 scsiResType=1 server
StateFlags=1
MdsAllocation: allocationKey=2374685 jobType=25 mediaKey=0 mediaId= driveKey=2000419 driveName=IBM.ULT3580-TD4.005 drivePath={6,0,11,0} stuNam
e= masterServerName=sv5353 mediaServerName=sv53610 ndmpTapeServerName= diskVolumeKey=0 mountKey=0 linkKey=0 fatPipeKey=0 scsiResType=1 serverStateFlag
s=3
MdsAllocation: allocationKey=2374686 jobType=25 mediaKey=0 mediaId= driveKey=2000424 driveName=IBM.ULT3580-TD4.003 drivePath={2,0,9,0} stuName
= masterServerName=sv5353 mediaServerName=sv5353 ndmpTapeServerName= diskVolumeKey=0 mountKey=0 linkKey=0 fatPipeKey=0 scsiResType=1 serverStateFlags=
3

and tory to do nbrbutil -releaseall. still can't use driver IBM.ULT3580-TD4.003 IBM.ULT3580-TD4.005. i had use robtest, it's can be move tape to driver.

I have no idea how to fix this problem.

mph999 · ‎09-11-2014

If crawlrelease by name doesn't release the drives, powercycle the actual drive (not via any library GUI, actual proper powercycle)

Could be worth selecting SCSI persistent reserve in Host Properties > Master server > Media - need to test backup and restore after to be sure there are no issues, but this should be fine certainly for physical drives. Some VTLs doen't like persistent reserve very much and slightly strangely I've seen a few issues but only on Solaris with physical drives recently.

If none of the above suggestions make a difference, I'd persoanlly just delete the drives using nbemmcmd -dletealldevice -allrecords and reconfigure via the wizard.

View solution in original post

RonCaplinger · ‎08-18-2014

Since your profile indicates you are new here, I don't know what your level of expertise is with NetBackup. So I need to make these statements first:

NetBackup 6.5 has not been supported in 2 years, you seriously need to upgrade to a supported version, preferable 7.5.0.7 or 7.6.0.3 ASAP.
You seem to have 12 LTO4 tape drives hooked up to one media server and are trying to push data to all of them at the same time. I know of no single server that can push that much data at once to keep the tape drives streaming at even their lowest speed. The max I would recommend (just my opinion, for a new person dealing with NetBackup) would be 3 LTO4 tape drives per media server (4 drives if you have a really fast x64-based server; 2 drives if you are using servers more than 3 years old).
Can you confirm that Windows sees all 12 tape drives correctly in Device Manager? No yellow exclamation points next to them?

SymTerry · ‎08-18-2014

Like Ron said, please upgrade to 7.x

Some of those drives show as down. NetBackup will down tape drives for many different reasons. TECH88076 gives a few of them. Some things to look at doing are:

Reconfiguration of the Device with NetBackup
Update of the Netbackup Device Mappings file.
Update to Firmware of the Device.
Update Device Driver or configuration files
Tape Drive Cleaning

You metioned rob test worked fine, but thats just moving the tape around. Write errors can also down a drive.

For testing I would create a policy that just uses the scratch pool with that drive.

Marianne · ‎08-18-2014

Please show us output of STU config and SSO device allocation.

On master:

...\Veritas\volmgr\bin\vmdareq
...\Veritas\Netbackup\bin\admincmd\bpstulist -U

You can see that 3 drives are in MIXED state. This means that they are DOWN on one or more media servers.
You need to find the reason for drives being DOWN'ed.
On ALL media servers, add VERBOSE entry to ...\Veritas\volmgr\vm.conf and restart NBU Device Manager service.
Create ...\Veritas\Netbackup\logs\bptm folder on all media servers.
I/O errors will be logged in bptm on media servers and the reason for drives being DOWN'ed will be logged in Event Viewer Application log.
Hardware errors will be logged in Event Viewer System log.

Oh, and the option to 'release all' is actually 'nbrbutil -resetAll'
http://www.symantec.com/docs/TECH46807

Or else:
nbrbutil -releaseMDS 2374686
nbrbutil -releaseMDS 2374685

I am curious to know how many media servers you have in your environment and how jobs are balanced across media servers.
From the out above, it seems that all but one drive is allocated to the master server.
As other said - really too many tape drives for one server to handle.

Handy NetBackup Links

Frank_Xiao · ‎08-19-2014

Hi All

thanks for u suggetion,

No we are use 6.5 , prepare migration to 7.5. but currently I need fix the problem of the driver issue.

Confirm we just have 10 driver, 6 driver are useable , 4 are nerver be use . why sv5353 use more the 3 driver. it's because it master server and usually do vault job .

I already reconfig the dirver ,but seems not work, and the seem there are ghost job running in driver 03 driver 05. even i release the job in those driver, but next time, it show again.

D:\Veritas\Volmgr\bin>vmdareq.exe
IBM.ULT3580-TD4.000 - AVAILABLE
sv5353 UP
sv5327 UP
sv5326 UP
sv5310 UP
sv5311 DOWN
sv53610 SCAN_HOST UP
ing_uhhod01_pip DOWN
prt_uhhod05_pip DOWN
prt_uhhod02_pip DOWN
IBM.ULT3580-TD4.001 - AVAILABLE
sv5353 UP
sv5327 DOWN
sv5326 DOWN
sv5310 UP
sv5311 UP
sv53610 SCAN_HOST UP
ing_uhhod01_pip DOWN
prt_uhhod05_pip UP
prt_uhhod02_pip UP
IBM.ULT3580-TD4.002 - AVAILABLE
sv5353 UP
sv5327 UP
sv5326 UP
sv5310 UP
sv5311 UP
sv53610 SCAN_HOST UP
ing_uhhod01_pip UP
prt_uhhod05_pip UP
prt_uhhod02_pip UP
IBM.ULT3580-TD4.003 - RESERVED on 8/20/2014 2:47:24 PM
sv5353 RESERVED SCAN_HOST UP
sv5327 UP
sv5326 UP
sv5310 UP
sv5311 UP
ing_uhhod01_pip UP
prt_uhhod05_pip UP
prt_uhhod02_pip UP
IBM.ULT3580-TD4.004 - AVAILABLE
sv5353 UP
sv5327 UP
sv5326 UP
sv5310 UP
sv5311 UP
ing_uhhod01_pip UP
prt_uhhod05_pip UP
prt_uhhod02_pip UP
IBM.ULT3580-TD4.005 - RESERVED on 8/20/2014 2:47:24 PM
sv5353 DOWN
sv5327 DOWN
sv5326 DOWN
sv5310 DOWN
sv5311 DOWN
sv53610 RESERVED SCAN_HOST UP
ing_uhhod01_pip DOWN
prt_uhhod05_pip DOWN
prt_uhhod02_pip UP
IBM.ULT3580-TD4.007 - AVAILABLE
sv5353 DOWN
sv5327 DOWN
sv5326 DOWN
sv5310 DOWN
sv5311 DOWN
sv53610 DOWN
ing_uhhod01_pip DOWN
prt_uhhod05_pip DOWN
prt_uhhod02_pip SCAN_HOST UP
IBM.ULT3580-TD4.014 - RESERVED on 8/20/2014 1:42:23 PM
sv5353 RESERVED SCAN_HOST UP
sv5327 DOWN
sv5326 UP
sv5310 UP
sv5311 UP
sv53610 DOWN
ing_uhhod01_pip UP
prt_uhhod05_pip DOWN
prt_uhhod02_pip UP
IBM.ULT3580-TD4.015 - AVAILABLE
sv5353 SCAN_HOST UP
sv5327 DOWN
sv5326 UP
sv5310 UP
sv5311 UP
sv53610 UP
ing_uhhod01_pip DOWN
prt_uhhod05_pip DOWN
prt_uhhod02_pip UP
IBM.ULT3580-TD4.017 - RESERVED on 8/20/2014 1:55:41 PM
sv5353 RESERVED UP
sv5327 UP
sv5326 UP
sv5310 UP
sv5311 UP
sv53610 SCAN_HOST UP
ing_uhhod01_pip UP
prt_uhhod05_pip UP

from this pic ,we can see the dirver in use, but can't find any tape in the libary from GUI.

Marianne · ‎09-11-2014

Apologies for missing your last post.

Obviously too late now, but it seems that this tape drive was reserved by this media server:

IBM.ULT3580-TD4.003 - RESERVED on 8/20/2014 2:47:24 PM
sv5353 RESERVED SCAN_HOST UP

So, in this case all troubleshooting must be done on this media server.

Troubleshooting steps will depend on OS on this media server.
It could be as reasy as using vmoprcmd command with 'crawlrelease' or 'reset' options.

Important that VERBOSE entry should be added to all media servers for device-level troubleshooting as well as bptm log folder.

Handy NetBackup Links

mph999 · ‎09-11-2014

If crawlrelease by name doesn't release the drives, powercycle the actual drive (not via any library GUI, actual proper powercycle)

Could be worth selecting SCSI persistent reserve in Host Properties > Master server > Media - need to test backup and restore after to be sure there are no issues, but this should be fine certainly for physical drives. Some VTLs doen't like persistent reserve very much and slightly strangely I've seen a few issues but only on Solaris with physical drives recently.

If none of the above suggestions make a difference, I'd persoanlly just delete the drives using nbemmcmd -dletealldevice -allrecords and reconfigure via the wizard.

revarooo · ‎09-11-2014

- Ensure your zoning is correct

- Release all allocations using nbrbutil -resetall ONLY when no jobs are running)

- Power cycle the drives

- finally do as mph999 suggested, remove the devices using: nbemmcmd -deletealldevices -allrecords = this will remove ALL drives from ALL media servers. Then reconfigure them again.

VOX

some driver can't be use to write data.