Solved: Solaris Master/Media Server

hanskniep · ‎02-05-2015

Hello,

Have a Solaris master/media server with 4 CPUs (1500Mhz) and 32 GB of RAM that I inherited running NBU 6.5. I am only able to run 45 streams at a time. When I try to raise this limit, all the jobs fail with resource errors. I want to have the O/S team monitor the server, but unsure on specifically what to have them look at.

I know NBU 6.5 is EOSL, but can't upgrade 'till we are sure the hardware is sufficient. So, need help to see how many clients this shuld support and how to hav the OS team help me.

Thanks,

Genericus · ‎02-06-2015

Review the links Marianne has provided. Each child process uses resources. You need to make sure you have configured your server and NetBackup properly.

Check your default Project, or your NetBackup Project - what are the settings?

Check /etc/system - you may have settings there as well.

I have a solaris master and I have well over 200 jobs active with no issues.

But I have more memory and I am more up to date. I also write to a 990 - and that data flies!

NetBackup 9.1.0.1 on Solaris 11, writing to Data Domain 9800 7.7.4.0
duplicating via SLP to LTO5 & LTO8 in SL8500 via ACSLS

View solution in original post

Yasuhisa_Ishika · ‎02-05-2015

Maximun number of concurrent jobs which the master server can handle depends on various factors. But in my experience, SPARC Solaris server with less core and less memory can handle more jobs. You should firest identify which type of resource is running short by lokking into job details, bperror, and debug logs.

By the way, you should upgrade your NetBackup to currently supported version even if hardware is dufficient. Upgrade of NetBackup does not neccesaliry require hardware refresh or upgrade.

Marianne · ‎02-05-2015

45 streams backing up to what kind of devices and how many? And what kind of throughput per device? All of these streams across the network? It really sounds to me as if you need at least one more media server... A proper sizing exercise is needed here - amount of clients, data size per client, backup window, network bandwidth, available devices, etc... etc... And yes, you should upgrade as a matter of urgency. NBU 6.5 ran out of support in 2012.

Handy NetBackup Links

Andy_Welburn · ‎02-05-2015

A SPARC 6.5 Master/Media with 2x 1280MHz CPU & 6Gb memory - I'm sure we managed more than 45 streams at once over the many years that this system was actually in *full* production.

When I try to raise this limit, all the jobs fail with resource errors

Be interested to see these errors ....

Marianne · ‎02-06-2015

My understanding was that a single master/media server is writing 45 streams simultaneously...

Now, think of the data path (assuming that most master/media servers still only have a single 1Gb NIC).
How many network connections from clients can this 1Gb accommodate, all trying to transfer data at +- 10Mb/sec?
Well, if 45 clients are transferring data at 2Mb/sec each, then 45 streams will not be a problem.
The 1Gb NIC will just about be saturated and can stream a single LTO3 @ 90Mb/sec.

A master server with those specs with 10Gb NIC can handle a lot more incoming data from clients.
It all comes down to network infrastructure and the amount and speed of data transfer from the clients.

As per my previous post - a proper sizing exercise is needed to see where the bottleneck/resource issue is.
It could be a simple matter of insufficient shared memory configured at OS level.

The Planning and Performance Tuning Guides will help you:

Overview of NetBackup performance testing.
http://www.symantec.com/docs/TECH147296

Symantec NetBackup™ 7.0 - 7.1 Backup Planning and Performance Tuning Guide
http://www.symantec.com/docs/DOC4483

Key performance considerations for NetBackup 7.5 master servers http://www.symantec.com/docs/TECH202840

Performance issues observed when running UltraSparc-T series architecture on a NetBackup 7.x Master Server
http://www.symantec.com/docs/TECH204332

Designing your backup system
http://www.symantec.com/docs/HOWTO56078

Improving NetBackup backup performance
http://www.symantec.com/docs/HOWTO44836

NetBackup™ Architecture Overview
https://www-secure.symantec.com/connect/sites/default/files/b-whitepaper_nbu_architecture_overview_1...

NetBackup 7.6 Best Practices: Optimizing Performance

Updated NetBackup Backup Planning and Performance Tuning Guide for Release 7.5 and Release 7.6

(Not sure if all the HOWTO links are still working - some have been expired and SYMwise seems to be offline at the moment...)

Handy NetBackup Links

hanskniep · ‎02-06-2015

More info, all the clients are workstations and have 1GB link at max, most likely 100MB. The Master /media is writing to a newer DataDomain DD990. No tape. Upgrading to a newer version is not possible at this time.

Genericus · ‎02-06-2015

Review the links Marianne has provided. Each child process uses resources. You need to make sure you have configured your server and NetBackup properly.

Check your default Project, or your NetBackup Project - what are the settings?

Check /etc/system - you may have settings there as well.

I have a solaris master and I have well over 200 jobs active with no issues.

But I have more memory and I am more up to date. I also write to a 990 - and that data flies!

NetBackup 9.1.0.1 on Solaris 11, writing to Data Domain 9800 7.7.4.0
duplicating via SLP to LTO5 & LTO8 in SL8500 via ACSLS

Genericus · ‎02-06-2015

Make sure you are counting all child jobs.

I set up a script that runs from crontab

essentially uses bpdbjobs to get infomations

#!/bin/ksh
# this runs the bpdbjobs command, assigns field values
# stores the output in the OUTFILE

OUTFILE=/usr/openv/tmp/job.counts.`date "+%Y%m"`
DATE=`date "+%m/%d/%Y"`

HOUR=`date "+%H%M"`

## information about output and what each field is...

#/usr/openv/netbackup/bin/admincmd/bpdbjobs -ignore_parent_jobs -most_columns | sed "s/,/ /g" |while read F01 F02 F03 F04 F05 F06 F07 F08 F09 F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F20 F21 F22 F23 F24 F25 F26 F27 F28 F29 F30 F31 F32 F33 F34 F35 F36 F37 F38 F39 F40 F41 F42 F43 F44 F45 F46 F47 F48 F49 F50
#jobid,jobtype,state,status,policy,schedule,client,server,started,elapsed,ended,stunit,try,operation,kbytes,files,pathlastwritten,percent,jobpid,owner,subtype,classtype,schedule_type,priority,group,masterserver,retentionunits,retentionperiod,compression,kbyteslastwritten,fileslastwritten,filelistcount,[files]...,trycount,[trypid,trystunit,tryserver,trystarted,tryelapsed,tryended,trystatus,trystatusdescription,trystatuscount,[trystatuslines]...,trybyteswritten,tryfileswritten]...parentjob,kbpersec,copy,robot,vault,profile,session,ejecttapes,srcstunit,srcserver,srcmedia,dstmedia,stream,suspendable,resumable,restartable,datamovement,snapshot,backupid,killable,controllinghost
### Field 02 = jobtype
# 0=backup, 1=archive, 2=restore, 3=verify, 4=duplicate, 5=import, 6=catalog backup, 7=vault,
# 8=label, 9=erase, 10=tape request, 11=clean, 12=format tape, 13=physical inventory, 14=qualification
### Field 03 = state # 0=queued, 1=active, 2=wait for retry, 3=done
### Field 04 = status code
##

I use SLP so I want to see how many are active or queued

SLPREPORT=`/usr/openv/netbackup/bin/admincmd/nbstlutil report | grep "Total: "| cut -c60-80`
## find queued duplications
QUEUED=`/usr/openv/netbackup/bin/admincmd/bpdbjobs -most_columns | cut -d, -f 2,3,4,5,6,12,24,57 | grep -c "4,0,,SLP"`
EDLAQ=`/usr/openv/netbackup/bin/admincmd/bpdbjobs -most_columns | cut -d, -f 2,3,4,5,6,12,24,57 | grep "4,0,,SLP" | grep -c med04np`
EDLBQ=`/usr/openv/netbackup/bin/admincmd/bpdbjobs -most_columns | cut -d, -f 2,3,4,5,6,12,24,57 | grep "4,0,,SLP" | grep -c med03np`
## find active duplications # change 4,1 to 4,,
ACTIVE=`/usr/openv/netbackup/bin/admincmd/bpdbjobs -most_columns | cut -d, -f 2,3,4,5,6,12,24,57 | grep -c "4,1,,SLP"`
EDLAA=`/usr/openv/netbackup/bin/admincmd/bpdbjobs -most_columns | cut -d, -f 2,3,4,5,6,12,24,57 | grep "4,,,SLP" | grep -c med04np`
EDLBA=`/usr/openv/netbackup/bin/admincmd/bpdbjobs -most_columns | cut -d, -f 2,3,4,5,6,12,24,57 | grep "4,,,SLP" | grep -c med03np`
## find queued and active backups
BACKUPQ=`/usr/openv/netbackup/bin/admincmd/bpdbjobs -most_columns | cut -d, -f 2,3,4,5,6,12,24,57 | grep -c "0,0,,"`
BACKUPA=`/usr/openv/netbackup/bin/admincmd/bpdbjobs -most_columns | cut -d, -f 2,3,4,5,6,12,24,57 | grep -c "0,1,,"`
###
## echo DATE\ HOUR\ QUEUED\ EDLAQ\ EDLBQ\ ACTIVE\ EDLAA\ EDLBA BACKUPQ\ BACKUPA\ SLPSUM\ SLPREPORT >> $OUTFILE
echo $DATE\ $HOUR\ $QUEUED\ $EDLAQ\ $EDLBQ\ $ACTIVE\ $EDLAA\ $EDLBA\ $BACKUPQ\ $BACKUPA\ $SLPSUM\ $SLPREPORT >> $OUTFILE

I run this every 15 minutes to a file, then I can tail -f that file and see the job counts.

NetBackup 9.1.0.1 on Solaris 11, writing to Data Domain 9800 7.7.4.0
duplicating via SLP to LTO5 & LTO8 in SL8500 via ACSLS

Andrew_Madsen · ‎02-09-2015

Also validate your memory tuning is sufficient as well.

Look at this guide. I realize it is for 7.5 but there is some good memory information in it especially if you are using Solaris 10.

http://www.symantec.com/business/support/index?page=content&id=doc7449

Andrew_Madsen · ‎02-09-2015

Here is the NBU 6.5 guide:

http://www.symantec.com/business/support/index?page=content&id=TECH62317

hanskniep · ‎02-09-2015

I found this technote:

http://www.symantec.com/business/support/index?page=content&id=TECH62633

Going to have the O/S team verify this is set.

Thanks all for your responses!

Marianne · ‎02-10-2015

As per my post above, right?

It could be a simple matter of insufficient shared memory configured at OS level.

Handy NetBackup Links

VOX

Solaris Master/Media Server