Solved: Always wirth running three

Iwan_Tamimi · ‎03-25-2013

We are using netbackup version 7.5.0.4 running on RedHat 6.1 and media servers running on multiplatform. The Tape Library is by HP, ESL322. First for the media server we only have HPUX and Windows. Then we add several LTO5 on the same Tape Library and the media running on Solaris (5.10 sparc).5

We add the 3 LTO5 to this Solaris and running the test backup. The test backup running looked fine but after sometimes the Tape status going down, we can just bring up it can run several backup and down again. It happened to all the 3 new LTO5 drives. We always can bring up but after several backup they all went down again.

What went wrong? I don't think the tape drives had problems, since it happened to all the new 3 LTO5 drives (?). is there some parameter needed to add on Solaris side?

Thank you,

Iwan Tamimi

Yasuhisa_Ishika · ‎03-25-2013

Should check if you have enabled MPxIO for HBAs that you use to connect the tape drives. MPxIO on Solaris 10 does not support tape devices, and this would let tape devices be unstable.

View solution in original post

Marianne · ‎03-25-2013

Please enable logging on the Solaris media server as follows:

Create bptm folder in /usr/openv/netbackup/logs

Add VERBOSE entry to /usr/openv/volmgr/vm.conf file and restart NBU.

NBU tape manager errors will be logged to bptm and device errors will be logged to /var/adm/messages (along with reason for drives to be DOWN'ed).

In the meantime, the following will help us to get an idea:

Post contents of /usr/openv/netbackup/db/media/error file on media server

Run 'Tape logs' report in the GUI, select Solaris media server, specify date range during which errors were seen and run the report.
If lots of info is displayed, please filter report to exclude 'Info'. Export the report to a .txt file and post as File attachment.

Handy NetBackup Links

Yasuhisa_Ishika · ‎03-25-2013

Should check if you have enabled MPxIO for HBAs that you use to connect the tape drives. MPxIO on Solaris 10 does not support tape devices, and this would let tape devices be unstable.

Mark_Solutions · ‎03-25-2013

Always wirth running three cleaning cycles on new tape drives - the manufacturers dont tend to pre clean them any more - three such read / write errors and it goes down The media logs should tell you if it keeps needing to be cleaned Hope this helps

Iwan_Tamimi · ‎03-26-2013

Hi All,

Thanks for the supports.

Marianne, I will try as you sugested. I will put the logs/error report.

Yasuhisa, I will check with my colleague that knows solaris better than me.

Regard,

Iwan Tamimi

Marianne · ‎03-26-2013

As per my previous post, you can in the meantime do the following:

Post contents of /usr/openv/netbackup/db/media/error file on media server

Run 'Tape logs' report in the GUI, select Solaris media server, specify date range during which errors were seen and run the report.
If lots of info is displayed, please filter report to exclude 'Info'. Export the report to a .txt file and post as File attachment.

Handy NetBackup Links

Iwan_Tamimi · ‎03-27-2013

Marrianne,

Thank you.

This is the content of /usr/openv/netbackup/db/media/errors

root@ebs12-bck # cat errors
03/14/13 12:06:56 600118 0 WRITE_ERROR ESL125_Drive4
03/25/13 16:19:11 600064 3 WRITE_ERROR ESL125_Drive1
03/26/13 00:06:00 600072 3 WRITE_ERROR ESL125_Drive1

I also attached the Tape Logs

Bellow I put the error on the policy backup from Java GUI.

BTW some facts:

o The tape library is HP ESL322E and inside consisst of LTO4 and LTO5

o One HPUX media server connected to same tape library (means share the same robot) running fine for years.

o The Solaris server is a new additon.

Thank you.

Iwan tamimi

ps:

error from one failed policy on Java GUI:

03/26/2013 02:33:44 - Info nbjm (pid=8143) starting backup job (jobid=70316) for client ebs12-bck, policy EBS12_FS_OS, schedule Daily_Incre
03/26/2013 02:33:44 - Info nbjm (pid=8143) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=70316, request id:{8623779A-957A-11E2-8449-9411033F729C})
03/26/2013 02:33:44 - requesting resource ESLM5_EBS12_MPX
03/26/2013 02:33:44 - requesting resource ebsbck.NBU_CLIENT.MAXJOBS.ebs12-bck
03/26/2013 02:33:44 - requesting resource ebsbck.NBU_POLICY.MAXJOBS.EBS12_FS_OS
03/26/2013 02:33:44 - granted resource ebsbck.NBU_CLIENT.MAXJOBS.ebs12-bck
03/26/2013 02:33:44 - granted resource ebsbck.NBU_POLICY.MAXJOBS.EBS12_FS_OS
03/26/2013 02:33:44 - granted resource 600192
03/26/2013 02:33:44 - granted resource ESL125_Drive1
03/26/2013 02:33:44 - granted resource ESLM5_EBS12_MPX
03/26/2013 02:33:44 - estimated 636801 kbytes needed
03/26/2013 02:33:44 - Info nbjm (pid=8143) started backup (backupid=ebs12-bck_1364236424) job for client ebs12-bck, policy EBS12_FS_OS, schedule Daily_Incre on storage unit ESLM5_EBS12_MPX
03/26/2013 02:33:46 - started process bpbrm (pid=20211)
03/26/2013 02:33:51 - Info bpbrm (pid=20211) starting bptm
03/26/2013 02:33:52 - Info bpbrm (pid=20211) Started media manager using bpcd successfully
03/26/2013 02:34:00 - Info bpbrm (pid=20211) ebs12-bck is the host to backup data from
03/26/2013 02:34:00 - Info bpbrm (pid=20211) telling media manager to start backup on client
03/26/2013 02:34:00 - Info bptm (pid=20214) using 65536 data buffer size
03/26/2013 02:34:00 - connecting
03/26/2013 02:34:00 - connected; connect time: 0:00:00
03/26/2013 02:34:01 - Info bptm (pid=20214) using 12 data buffers
03/26/2013 02:34:02 - mounting 600192
03/26/2013 02:34:03 - Info bpbrm (pid=20211) spawning a brm child process
03/26/2013 02:34:03 - Info bpbrm (pid=20211) child pid: 20282
03/26/2013 02:34:04 - Info bpbrm (pid=20211) sending bpsched msg: CONNECTING TO CLIENT FOR ebs12-bck_1364236424
03/26/2013 02:34:05 - Info bpbrm (pid=20211) start bpbkar on client
03/26/2013 02:34:05 - Info bpbkar (pid=20287) Backup started
03/26/2013 02:34:05 - Info bpbrm (pid=20211) Sending the file list to the client
03/26/2013 02:34:05 - Info bptm (pid=20214) start backup
03/26/2013 02:34:06 - Info bptm (pid=20214) Waiting for mount of media id 600192 (copy 1) on server ebs12-bck.
03/26/2013 02:34:07 - Info bpbrm (pid=20282) from client ebs12-bck: TRV - [/ors008/oraarch] is in a different file system from [/]. Skipping
03/26/2013 02:34:07 - Info bpbrm (pid=20282) from client ebs12-bck: TRV - [/osm009/oraarch] is in a different file system from [/]. Skipping
03/26/2013 02:34:08 - Info bpbrm (pid=20282) from client ebs12-bck: TRV - [/appspool] is in a different file system from [/]. Skipping
03/26/2013 02:34:12 - Info bpbrm (pid=20282) from client ebs12-bck: TRV - [/bpm008/oraarch] is in a different file system from [/]. Skipping
03/26/2013 02:34:13 - Info bpbrm (pid=20282) from client ebs12-bck: TRV - [/tmp] is in a different file system from [/]. Skipping
03/26/2013 02:34:14 - Info bpbrm (pid=20282) from client ebs12-bck: TRV - [/bpmadm] is in a different file system from [/]. Skipping
03/26/2013 02:34:15 - Info bpbrm (pid=20282) from client ebs12-bck: TRV - [/orsadm] is in a different file system from [/]. Skipping
03/26/2013 02:34:16 - Info bpbrm (pid=20282) from client ebs12-bck: TRV - [/proc] is on file system type PROC. Skipping
03/26/2013 02:34:17 - Info bpbrm (pid=20282) from client ebs12-bck: TRV - [/devices] is in a different file system from [/]. Skipping
03/26/2013 02:34:17 - Info bpbrm (pid=20282) from client ebs12-bck: TRV - [/backup-after] is in a different file system from [/]. Skipping
03/26/2013 02:49:27 - current media 600192 complete, requesting next media Any
03/26/2013 02:49:30 - Error bptm (pid=20214) error requesting media, TpErrno = Robot operation failed
03/26/2013 02:49:30 - Warning bptm (pid=20214) media id 600192 load operation reported an error
03/26/2013 02:49:57 - end writing
03/26/2013 02:50:01 - Error bptm (pid=20214) NBJM returned an extended error status: All compatible drive paths are down but media is available (2009)
03/26/2013 02:50:01 - Info bpbrm (pid=20211) got ERROR 252 from media manager
03/26/2013 02:50:01 - Info bpbrm (pid=20211) terminating bpbrm child 20282 jobid=70316
An extended error status has been encountered, check detailed status (252)

Iwan_Tamimi · ‎03-28-2013

I tried to disable MPxIO as Yasuhisa Ishikawa suggested , now it is promising, the error so far gone, but I still need to do some further test. I will post the result later. Thank you. Iwan

Marianne · ‎03-28-2013

Which media server is configured as robot control host?

Are any robot load operation errors logged on the control host?

We see this in job details:

03/26/2013 02:49:30 - Error bptm (pid=20214) error requesting media, TpErrno = Robot operation failed
03/26/2013 02:49:30 - Warning bptm (pid=20214) media id 600192 load operation reported an error

Then there is this error in Media logs report:

incorrect media found in drive index 3, expected 600073, found 600054, FREEZING 600073

This points to incorrect device mapping - either incorrectly configured initially or no Persistent Binding in place, which causes device names to change when server is rebooted.

If the problem is only with Solaris media server, start troubleshooting there.

Find out make/model of HBA used for tapes in this media server, then go to HBA manufacturer's web site and look for Persistent Binding info. There are normally hba tools that can be downloaded and used for this purpose.

Once Persistent Binding is correctly configured, delete all devices for this media server in NBU and OS.
Recreate devices at OS level with 'devfsadm'.
Ensure all devices are correctly seen by OS and NBU with sgscan command, then run Device Config Wizard again. Select robot control host and Solaris media server. Allow the wizard to restart NBU on the media server.

Let us know how it goes.

Handy NetBackup Links

Iwan_Tamimi · ‎04-07-2013

Thanks Mariane for the explanation.

After I disable MPxIO the problem seems went away (I am also new to the Solaris but your can read this for reading http://saifulaziz.com/2009/12/10/enabling-or-disabling-mpxio-multipathing-per-port/ )

We also have new AIX media servers, similiar things also happened then later we disabled the multipath, they problem also went away.

Thank All for the support.

Regards,

Iwan Tamimi

VOX

Netbackup Media Server on Solaris