06-08-2015 05:24 AM
Hello,
We have the following configuration in our netbackup enviroment:
Master:
eux390{root}# bpgetconfig -s eux390 -L|egrep 'Platform|Protocol|Version|Release'
NetBackup Client Platform = HP-UX-IA64, HP-UX11.31
NetBackup Client Protocol Level = 7.5.0.0.0.4
Version Name = 7.5
Version Number = 750000
Client OS/Release = HP-UX B.11.31
eux390{root}#
Client:
eux290{root}# /usr/openv/netbackup/bin/admincmd/bpgetconfig -s eux380 -L|egrep 'Platform|Protocol|Version|Release'
NetBackup Client Platform = HP-UX-IA64, HP-UX11.31
NetBackup Client Protocol Level = 7.0.0
Version Name = 7.0
Version Number = 700000
Client OS/Release = HP-UX B.11.31
eux290{root}#
We are taking raw backups for the clients but particularly the backup of the above mentioned client takes long to complete or I should say it doesn't get complete even the next backup job starts. I have mentioned the policy of the client below:
eux390{root}# bppllist bck_raw_vg380_gpx -L
Policy Name: bck_raw_vg380_gpx
Options: 0x0
template: FALSE
audit_reason: ?
Names: (none)
Policy Type: Standard (0)
Active: yes
Effective date: 08/04/2009 11:43:32
Client Compress: no
Follow NFS Mnts: no
Cross Mnt Points: no
Collect TIR info: yes, with move detection
Block Incremental: no
Mult. Data Stream: yes
Perform Snapshot Backup: no
Snapshot Method: (none)
Snapshot Method Arguments: (none)
Perform Offhost Backup: no
Backup Copy: 0
Use Data Mover: no
Data Mover Type: -1
Use Alternate Client: no
Alternate Client Name: (none)
Use Virtual Machine: 0
Hyper-V Server Name: (none)
Enable Instant Recovery: no
Policy Priority: 200
Max Jobs/Policy: Unlimited
Disaster Recovery: 0
Collect BMR Info: yes
Keyword: vg380 eux380
Data Classification: -
Residence is Storage Lifecycle Policy: no
Client Encrypt: no
Checkpoint: no
Residence: eux390-hcart-robot-tld-0
Volume Pool: HPUX_RETENTION_1W
Server Group: *ANY*
Granular Restore Info: no
Exchange Source attributes: no
Exchange 2010 Preferred Server: (none defined)
Application Discovery: no
Discovery Lifetime: 0 seconds
ASC Application and attributes: (none defined)
Generation: 188
Ignore Client Direct: no
Enable Metadata Indexing: no
Index server name: NULL
Use Accelerator: no
Client/HW/OS/Pri/DMI: eux390.sgp.st.com HP9000-800 HP-UX11.11 0 0 0 0 ?
Include: NEW_STREAM
Include: /dev/rdisk/disk1252
Include: /dev/rdisk/disk1586
Include: /dev/rdisk/disk1207
Include: /dev/rdisk/disk1323
Include: /dev/rdisk/disk1311
Include: /dev/rdisk/disk1587
Include: NEW_STREAM
Include: /dev/rdisk/disk1208
Include: /dev/rdisk/disk1332
Include: /dev/rdisk/disk1322
Include: /dev/rdisk/disk1682
Include: /dev/rdisk/disk1209
Include: /dev/rdisk/disk1384
Include: NEW_STREAM
Include: /dev/rdisk/disk1226
Include: /dev/rdisk/disk2447
Include: /dev/rdisk/disk2448
Include: /dev/rdisk/disk2459
Include: /dev/rdisk/disk2460
Include: /dev/rdisk/disk2857
Include: NEW_STREAM
Include: /dev/rdisk/disk2868
Schedule: monthly
Type: FULL (0)
Frequency: 28 day(s) (2419200 seconds)
Maximum MPX: 3
Synthetic: 0
Checksum Change Detection: 0
PFI Recovery: 0
Retention Level: 5 (3 months)
u-wind/o/d: 0 0
Incr Type: DELTA (0)
Alt Read Host: (none defined)
Max Frag Size: 0 MB
Number Copies: 1
Fail on Error: 0
Residence: eux390-hcart-robot-tld-0
Volume Pool: HPUX_RETENTION_3M
Server Group: (same as specified for policy)
Residence is Storage Lifecycle Policy: 0
Schedule indexing: 0
Daily Windows:
Day Open Close W-Open W-Close
Sunday 000:00:00 000:00:00
Monday 000:00:00 000:00:00
Tuesday 000:00:00 000:00:00
Wednesday 000:00:00 000:00:00
Thursday 000:00:00 000:00:00
Friday 000:00:00 000:00:00
Saturday 000:00:00 000:00:00
Schedule: weekly
Type: FULL (0)
Frequency: 7 day(s) (604800 seconds)
Maximum MPX: 3
Synthetic: 0
Checksum Change Detection: 0
PFI Recovery: 0
Retention Level: 3 (1 month)
u-wind/o/d: 0 0
Incr Type: DELTA (0)
Alt Read Host: (none defined)
Max Frag Size: 0 MB
Number Copies: 1
Fail on Error: 0
Residence: eux390-hcart-robot-tld-0
Volume Pool: HPUX_RETENTION_1M
Server Group: (same as specified for policy)
Residence is Storage Lifecycle Policy: 0
Schedule indexing: 0
Daily Windows:
Day Open Close W-Open W-Close
Sunday 000:00:00 000:00:00
Monday 000:00:00 000:00:00
Tuesday 000:00:00 000:00:00
Wednesday 000:00:00 000:00:00
Thursday 000:00:00 000:00:00
Friday 000:00:00 000:00:00
Saturday 000:00:00 000:00:00
Schedule: daily
Type: FULL (0)
Frequency: 1 day(s) (86400 seconds)
Maximum MPX: 3
Synthetic: 0
Checksum Change Detection: 0
PFI Recovery: 0
Retention Level: 0 (1 week)
u-wind/o/d: 0 0
Incr Type: DELTA (0)
Alt Read Host: (none defined)
Max Frag Size: 0 MB
Number Copies: 1
Fail on Error: 0
Residence: eux390-hcart-robot-tld-0
Volume Pool: (same as policy volume pool)
Server Group: (same as specified for policy)
Residence is Storage Lifecycle Policy: 0
Schedule indexing: 0
Daily Windows:
Day Open Close W-Open W-Close
Sunday 000:00:00 000:00:00
Monday 000:00:00 000:00:00
Tuesday 000:00:00 000:00:00
Wednesday 000:00:00 000:00:00
Thursday 000:00:00 000:00:00
Friday 000:00:00 000:00:00
Saturday 000:00:00 000:00:00
eux390{root}#
Let me know what all can be done!
regards,
Solved! Go to Solution.
06-09-2015 07:04 AM
Marianne - how did you got a clue on BCV's ?
As this is a normal file system backup I still don't understand how this work specifying the raw devices.
Testing the raw read speed can be done by:
time dd if=/dev/rdsk/c33t0d5 bs=256k of=/dev/null count=10000
output;
10000+0 records in
10000+0 records out
real 18.2
user 0.0
sys 0.1
# echo "256*10000/18" | bc
142222 KB/sec
On the media server please tell if NUMBER_DATA_BUFFERS & SIZE_DATA_BUFFERS files are located in /usr/openv/netbackup/db/config
06-08-2015 05:36 AM
Actual sizes of data-set to be backed-up? Multiple backup jobs? Multiple file-systems? Actual throughput observed? What networking is configured on the backup client? What tape type? How many tape drives in use? Is multi-plexing used?
What do you mean by 'raw backup'? Flash backup? Or plain file system backup?
Have you checked underlying storage/volume/array/parity-groups/disks for saturation/busy-ness and/or latency?
Have you done a bpbkar test to null device - to test raw disk read speed?
Have you checked your buffer wait counts (client side and media server side)?
06-08-2015 05:37 AM
Actual sizes of data-set to be backed-up? Multiple backup jobs? Multiple file-systems? Actual throughput observed? What networking is configured on the backup client? What tape type? How many tape drives in use? Is multi-plexing used?
What do you mean by 'raw backup'? Flash backup? Or plain file system backup?
Have you checked underlying storage/volume/array/parity-groups/disks for saturation/busy-ness and/or latency?
Have you done a bpbkar test to null device - to test raw disk read speed?
Have you checked your buffer wait counts (client side and media server side)?
06-08-2015 05:51 AM
In addition to answers to above questions - any particular reason why you need to take raw backups?
It looks like the client is also the media server, right?
Client/HW/OS/Pri/DMI: eux390....
Residence: eux390....
Client name is same as media server - not eux380....
(And policy is not configured to perform offhost snapshot backup for client eux380.)
How is data mapped to the media server?
Are you trying to read data off raw devices while still mounted on the client?
And tape drives in the library?
Are disks and tape zoned to different HBAs?
How many tape drives?
Media server resources?
What throughput is achieved per stream?
06-08-2015 07:23 AM
How can this be a raw backup with policy typer 0 (standard) and move detection enabled ?
Policy Type: Standard (0) Active: yes Effective date: 08/04/2009 11:43:32 Client Compress: no Follow NFS Mnts: no Cross Mnt Points: no Collect TIR info: yes, with move detection
Eh, what's up, doc?
06-09-2015 12:35 AM
Hello All,
Clarifying the configuration once again:
master and media server: eux390 (name)
client: eux380 (client)
Answering the questions of sdo first:
Actual sizes of data-set to be backed-up? 3790GB
Multiple backup jobs? single backup job
Multiple file-systems? Raw backup so 19 disks are concerned.
Actual throughput observed? upto 16 MB/s
What networking is configured on the backup client? disks are assigned through SAN
What tape type? LTO4,5,6 tapes and corresponding tape drives are configured in library
How many tape drives in use? in total 31 drives are used
Is multi-plexing used? yes
What do you mean by 'raw backup'? Raw backup means direct disk backkup, the STD are synced with BCV and through the BCV backup is taken onto tape using netbackup.
Flash backup? NA
Or plain file system backup? NA
Have you checked underlying storage/volume/array/parity-groups/disks for saturation/busy-ness and/or latency? Disks seems to be good.
Have you done a bpbkar test to null device - to test raw disk read speed? As we are using the BCV here so its dificult to test the raw speed!
Have you checked your buffer wait counts (client side and media server side)? : How to check?
I have also enabled the bpbkar logs on the master server to dig more onto it.
@Marriane: I hope That I have answered your queries for any more information plz let me know.
regarding the need for raw backups its the management requirement, cann't say anything more than that!
@Nicolai: I have checked out the BMR option in this policy, regarding the policy type we are using standard for all the raw backups. Let me know if any other option shoudl be choosed!
Policy Type: Standard (0)
Active: yes
Effective date: 08/04/2009 11:43:32
Client Compress: no
Follow NFS Mnts: no
Cross Mnt Points: no
Collect TIR info: no
Let me know if any other information is required by my side.
regards,
06-09-2015 02:37 AM
No. The client name is NOT eux380. Look at Client name in the policy:
Client/HW/OS/Pri/DMI: eux390....
So, it seems you are taking BCV snapshots that are mounted on the media server, right?
Check SAN configuration -
are all LUNs mapped to a single HBA?
Are tapes on media server mapped to same or different HBAs?
(Hopefully NOT 31 tape drives added to single media server!)
Good thing you have disabled TIR backup for raw filesystem backup.
Oh - about multiple backup jobs - the policy is configured for 4 streams with MPX in the schedule allowing 3 simultaneous jobs.
Buffer size and 'waits' can be seen in bptm log on the media server.
Log folders do not exist by default - create them under /usr/openv/netbackup/logs.
06-09-2015 04:17 AM
Hello Mariane,
Client/HW/OS/Pri/DMI: eux390....:The client name mentioned in the policy is eux390 (which is the media server and master as well). In our environment, the backup is taken from the BCVs which are configured on eux390 so we have to mention this name over there in the policy.
So, it seems you are taking BCV snapshots that are mounted on the media server, right?: yes.
Check SAN configuration -
are all LUNs mapped to a single HBA?: No , distributed.
Are tapes on media server mapped to same or different HBAs? different HBA's
(Hopefully NOT 31 tape drives added to single media server!): there are 20 drives in thsi backup server.
Regards,
06-09-2015 04:33 AM
I have never seen any media server that can stream more than 3 or 4 tape drives at 100Mbytes/sec or more simultaneously.
Can you try to use dd to read data off the disk and output to /dev/null in order to test read speed for raw device?
Note that this has nothing to do with NBU and will simply test read speed at hardware level.
The next test is to mount a scratch tape in a tape drive and use cpio or tar (not sure if tar can backup raw device) to backup directly to tape.
Once again - bypassing NBU to test at OS and hardware level.
What tier disk is used for the BCV snapshots?
PS:
Have you checked if bptm log folder exists?
And if buffer settings have been configured?
06-09-2015 07:04 AM
Marianne - how did you got a clue on BCV's ?
As this is a normal file system backup I still don't understand how this work specifying the raw devices.
Testing the raw read speed can be done by:
time dd if=/dev/rdsk/c33t0d5 bs=256k of=/dev/null count=10000
output;
10000+0 records in
10000+0 records out
real 18.2
user 0.0
sys 0.1
# echo "256*10000/18" | bc
142222 KB/sec
On the media server please tell if NUMBER_DATA_BUFFERS & SIZE_DATA_BUFFERS files are located in /usr/openv/netbackup/db/config
06-09-2015 07:40 AM
Hi Nicolai
There was a post earlier today that told us about BCV snapshots.
Backing up raw devices in Standard Policy is supported.
... backing up is the easy part - restoring them is the trickey part! (No file-level restore, LUN size being restored to, etc... ) Lots of stuff to look out for, all documented in NBU Admin Guide I.
06-12-2015 02:29 AM
Hello All,
Here I Have analysed at my master server (whioch is also media server) that raw backup speed is going gud as of now after reboot whereas the application backup is too slow.
Raw Backup: by raw backup I mean the BCV snapshots and backed up to tapes. the speed has improved since we have rebooted the system once but it seems that it is coming to older days slowly and slowly. immediatedly after reboot the speed increased significantly but now the speed is showing the same trends.
App backup: It is the scenario where the backup is being taken from disks on the client node to diskpool storage unit on master server or data from the client disks is directed to tapes directly. Ideally backing up to the diskpool stu should be faster but it is extremely slow and same is the case when data is backed directly from the client disks onto the tapes.
The stu list is mentioned below:
##############
Label: hcart-robot-tld-0
Storage Unit Type: Media Manager
Number of Drives: 29
On Demand Only: no
Density: hcart2 (14)
Robot Type/Number: TLD (8) / 0
Max Fragment Size: 1048575
Max MPX/drive: 2
Label: lto6-hcart-robot-tld-2
Storage Unit Type: Media Manager
Number of Drives: 4
On Demand Only: no
Density: hcart2 (14)
Robot Type/Number: TLD (8) / 2
Max Fragment Size: 1048575
Max MPX/drive: 2
Label: DISKPOOL
Storage Unit Type: Disk
Media Subtype: Basic (1)
Concurrent Jobs: 80
On Demand Only: yes
Path: "/nbdiskpool"
Robot Type: (not robotic)
Max Fragment Size: 524288
Max MPX: 1
Stage data: yes
Block Sharing: no
File System Export: no
High Water Mark: 50
Low Water Mark: 10
Ok On Root: no
Label: NAS_DISKPOOL
Storage Unit Type: Disk
Media Subtype: Basic (1)
Concurrent Jobs: 80
On Demand Only: yes
Path: "/NAS_DISKPOOL"
Robot Type: (not robotic)
Max Fragment Size: 524288
Max MPX: 1
Stage data: yes
Block Sharing: no
File System Export: no
High Water Mark: 60
Low Water Mark: 10
Ok On Root: no
Label: DISKPOOL_1
Storage Unit Type: Disk
Media Subtype: Basic (1)
Concurrent Jobs: 80
On Demand Only: yes
Path: "/nbdiskpool_new"
Robot Type: (not robotic)
Max Fragment Size: 524288
Max MPX: 1
Stage data: yes
Block Sharing: no
File System Export: no
High Water Mark: 60
Low Water Mark: 20
Ok On Root: no
#################
SIZE_DATA_BUFFERS=262144
NUMBER_DATA_BUFFERS=256
Kindly suggest!
06-17-2015 04:48 AM
Have you talked to yore local SAN/Storage admin about this ?. Software can't override the specification of hardware.
For testing tape drives or disk pool Netbackup GEN_DATA is a excellent tool to find bottle necks. GEN_DATA directive generate backup and in memory and restore data to memory where its then discarded.
Run backup restore test to tape and disk pool and lets us get figures of what the infrastructure is able to do. For read speed from BCV volume use the dd example in my previous post.
http://www.symantec.com/docs/TECH75213
06-17-2015 05:05 AM
This is a totally different issue and should ideally be posted as a separate discussion:
App backup: It is the scenario where the backup is being taken from disks on the client node to diskpool storage unit on master server or data from the client disks is directed to tapes directly. Ideally backing up to the diskpool stu should be faster but it is extremely slow and same is the case when data is backed directly from the client disks onto the tapes.
First step is to use bpbkar to see how fast data can be read from client disk.
See:
Measuring disk performance with bpbkar
http://www.symantec.com/docs/HOWTO99824
Overview of NetBackup performance testing.
http://www.symantec.com/docs/TECH147296
About the NetBackup data transfer path
http://www.symantec.com/docs/HOWTO99831
The NetBackup Backup Planning and Performance Tuning Guide, Release 7.5 and Release 7.6
http://www.symantec.com/docs/DOC7449