cancel
Showing results for 
Search instead for 
Did you mean: 

Raw backup taking too much time to complete

Harry_NBAD
Level 4
Certified

Hello,

We have the following configuration in our netbackup enviroment:

Master:

eux390{root}# bpgetconfig -s eux390 -L|egrep 'Platform|Protocol|Version|Release'
NetBackup Client Platform = HP-UX-IA64, HP-UX11.31
NetBackup Client Protocol Level = 7.5.0.0.0.4
Version Name = 7.5
Version Number = 750000
Client OS/Release = HP-UX B.11.31 
eux390{root}# 

Client:

eux290{root}# /usr/openv/netbackup/bin/admincmd/bpgetconfig -s eux380 -L|egrep 'Platform|Protocol|Version|Release'
NetBackup Client Platform = HP-UX-IA64, HP-UX11.31
NetBackup Client Protocol Level = 7.0.0
Version Name = 7.0
Version Number = 700000
Client OS/Release = HP-UX B.11.31 
eux290{root}# 

We are taking raw backups for the clients but particularly the backup of the above mentioned client takes long to complete or I should say it doesn't get complete even the next backup job starts. I have mentioned the policy of the client below:

eux390{root}# bppllist bck_raw_vg380_gpx -L

Policy Name:       bck_raw_vg380_gpx
Options:           0x0
template:          FALSE
audit_reason:         ?
Names:             (none)
Policy Type:       Standard (0)
Active:            yes
Effective date:    08/04/2009 11:43:32
Client Compress:   no
Follow NFS Mnts:   no
Cross Mnt Points:  no
Collect TIR info:      yes, with move detection
Block Incremental: no
Mult. Data Stream: yes
Perform Snapshot Backup:   no
Snapshot Method:           (none)
Snapshot Method Arguments: (none)
Perform Offhost Backup:    no
Backup Copy:               0
Use Data Mover:            no
Data Mover Type:           -1
Use Alternate Client:      no
Alternate Client Name:     (none)
Use Virtual Machine:      0
Hyper-V Server Name:     (none)
Enable Instant Recovery:   no
Policy Priority:   200
Max Jobs/Policy:   Unlimited
Disaster Recovery: 0
Collect BMR Info:  yes
Keyword:           vg380 eux380
Data Classification:       -
Residence is Storage Lifecycle Policy:    no
Client Encrypt:    no
Checkpoint:        no
Residence:         eux390-hcart-robot-tld-0
Volume Pool:       HPUX_RETENTION_1W
Server Group:      *ANY*
Granular Restore Info:  no
Exchange Source attributes:              no
Exchange 2010 Preferred Server: (none defined)
Application Discovery:      no
Discovery Lifetime:      0 seconds
ASC Application and attributes: (none defined)
Generation:      188
Ignore Client Direct:  no
Enable Metadata Indexing:  no
Index server name:  NULL
Use Accelerator:  no
Client/HW/OS/Pri/DMI:  eux390.sgp.st.com HP9000-800 HP-UX11.11 0 0 0 0 ?
Include:           NEW_STREAM
Include:           /dev/rdisk/disk1252
Include:           /dev/rdisk/disk1586
Include:           /dev/rdisk/disk1207
Include:           /dev/rdisk/disk1323
Include:           /dev/rdisk/disk1311
Include:           /dev/rdisk/disk1587
Include:           NEW_STREAM
Include:           /dev/rdisk/disk1208
Include:           /dev/rdisk/disk1332
Include:           /dev/rdisk/disk1322
Include:           /dev/rdisk/disk1682
Include:           /dev/rdisk/disk1209
Include:           /dev/rdisk/disk1384
Include:           NEW_STREAM
Include:           /dev/rdisk/disk1226
Include:           /dev/rdisk/disk2447
Include:           /dev/rdisk/disk2448
Include:           /dev/rdisk/disk2459
Include:           /dev/rdisk/disk2460
Include:           /dev/rdisk/disk2857
Include:           NEW_STREAM
Include:           /dev/rdisk/disk2868
Schedule:              monthly
  Type:                FULL (0)
  Frequency:           28 day(s) (2419200 seconds)
  Maximum MPX:         3
  Synthetic:           0
  Checksum Change Detection: 0
  PFI Recovery:        0
  Retention Level:     5 (3 months)
  u-wind/o/d:          0 0
  Incr Type:           DELTA (0)
  Alt Read Host:       (none defined)
  Max Frag Size:       0 MB
  Number Copies:       1
  Fail on Error:       0
  Residence:           eux390-hcart-robot-tld-0
  Volume Pool:         HPUX_RETENTION_3M
  Server Group:        (same as specified for policy)
  Residence is Storage Lifecycle Policy:         0
  Schedule indexing:     0
  Daily Windows:
   Day         Open       Close       W-Open     W-Close
   Sunday      000:00:00  000:00:00
   Monday      000:00:00  000:00:00
   Tuesday     000:00:00  000:00:00
   Wednesday   000:00:00  000:00:00
   Thursday    000:00:00  000:00:00
   Friday      000:00:00  000:00:00
   Saturday    000:00:00  000:00:00
Schedule:              weekly
  Type:                FULL (0)
  Frequency:           7 day(s) (604800 seconds)
  Maximum MPX:         3
  Synthetic:           0
  Checksum Change Detection: 0
  PFI Recovery:        0
  Retention Level:     3 (1 month)
  u-wind/o/d:          0 0
  Incr Type:           DELTA (0)
  Alt Read Host:       (none defined)
  Max Frag Size:       0 MB
  Number Copies:       1
  Fail on Error:       0
  Residence:           eux390-hcart-robot-tld-0
  Volume Pool:         HPUX_RETENTION_1M
  Server Group:        (same as specified for policy)
  Residence is Storage Lifecycle Policy:         0
  Schedule indexing:     0
  Daily Windows:
   Day         Open       Close       W-Open     W-Close
   Sunday      000:00:00  000:00:00
   Monday      000:00:00  000:00:00
   Tuesday     000:00:00  000:00:00
   Wednesday   000:00:00  000:00:00
   Thursday    000:00:00  000:00:00
   Friday      000:00:00  000:00:00
   Saturday    000:00:00  000:00:00
Schedule:              daily
  Type:                FULL (0)
  Frequency:           1 day(s) (86400 seconds)
  Maximum MPX:         3
  Synthetic:           0
  Checksum Change Detection: 0
  PFI Recovery:        0
  Retention Level:     0 (1 week)
  u-wind/o/d:          0 0
  Incr Type:           DELTA (0)
  Alt Read Host:       (none defined)
  Max Frag Size:       0 MB
  Number Copies:       1
  Fail on Error:       0
  Residence:           eux390-hcart-robot-tld-0
  Volume Pool:         (same as policy volume pool)
  Server Group:        (same as specified for policy)
  Residence is Storage Lifecycle Policy:         0
  Schedule indexing:     0
  Daily Windows:
   Day         Open       Close       W-Open     W-Close
   Sunday      000:00:00  000:00:00
   Monday      000:00:00  000:00:00
   Tuesday     000:00:00  000:00:00
   Wednesday   000:00:00  000:00:00
   Thursday    000:00:00  000:00:00
   Friday      000:00:00  000:00:00
   Saturday    000:00:00  000:00:00
eux390{root}# 

Let me know what all can be done!

regards,

1 ACCEPTED SOLUTION

Accepted Solutions

Nicolai
Moderator
Moderator
Partner    VIP   

Marianne - how did you got a clue on BCV's ?

As this is a normal file system backup I still don't understand how this work specifying the raw devices.

Testing the raw read speed can be done by:

time dd if=/dev/rdsk/c33t0d5 bs=256k of=/dev/null count=10000

output;

10000+0 records in
10000+0 records out

real       18.2
user        0.0
sys         0.1

#  echo "256*10000/18" | bc

142222 KB/sec

On the media server please tell if NUMBER_DATA_BUFFERS & SIZE_DATA_BUFFERS files are located in /usr/openv/netbackup/db/config 

 

View solution in original post

13 REPLIES 13

sdo
Moderator
Moderator
Partner    VIP    Certified

Actual sizes of data-set to be backed-up?   Multiple backup jobs?  Multiple file-systems?  Actual throughput observed?  What networking is configured on the backup client?  What tape type?  How many tape drives in use?  Is multi-plexing used?

What do you mean by 'raw backup'?  Flash backup?  Or plain file system backup?

Have you checked underlying storage/volume/array/parity-groups/disks for saturation/busy-ness and/or latency?

Have you done a bpbkar test to null device - to test raw disk read speed?

Have you checked your buffer wait counts (client side and media server side)?

sdo
Moderator
Moderator
Partner    VIP    Certified

Actual sizes of data-set to be backed-up?   Multiple backup jobs?  Multiple file-systems?  Actual throughput observed?  What networking is configured on the backup client?  What tape type?  How many tape drives in use?  Is multi-plexing used?

What do you mean by 'raw backup'?  Flash backup?  Or plain file system backup?

Have you checked underlying storage/volume/array/parity-groups/disks for saturation/busy-ness and/or latency?

Have you done a bpbkar test to null device - to test raw disk read speed?

Have you checked your buffer wait counts (client side and media server side)?

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

In addition to answers to above questions - any particular reason why you need to take raw backups?

It looks like the client is also the media server, right? 
Client/HW/OS/Pri/DMI:  eux390....
Residence:         eux390....

Client name is same as media server - not eux380....
(And policy is not configured to perform offhost snapshot backup for client eux380.)

How is data mapped to the media server?
Are you trying to read data off raw devices while still mounted on the client?

And tape drives in the library? 

Are disks and tape zoned to different HBAs?

How many tape drives?

Media server resources? 

What throughput is achieved per stream?

Nicolai
Moderator
Moderator
Partner    VIP   

How can this be a raw backup with policy typer 0 (standard) and move detection enabled ?

Policy Type:       Standard (0)
Active:            yes
Effective date:    08/04/2009 11:43:32
Client Compress:   no
Follow NFS Mnts:   no
Cross Mnt Points:  no
Collect TIR info:      yes, with move detection

Eh, what's up, doc?

 

Harry_NBAD
Level 4
Certified

Hello All,

Clarifying the configuration once again:

master and media server: eux390 (name)

client: eux380 (client)

Answering the questions of sdo first:

Actual sizes of data-set to be backed-up? 3790GB

Multiple backup jobs? single backup job

Multiple file-systems?  Raw backup so 19 disks are concerned.

Actual throughput observed? upto 16 MB/s

What networking is configured on the backup client? disks are assigned through SAN

What tape type? LTO4,5,6 tapes and corresponding tape drives are configured in library

How many tape drives in use? in total 31 drives are used

Is multi-plexing used? yes

What do you mean by 'raw backup'? Raw backup means direct disk backkup, the STD are synced with BCV and through the BCV backup is taken onto tape using netbackup.

Flash backup?  NA

Or plain file system backup? NA

Have you checked underlying storage/volume/array/parity-groups/disks for saturation/busy-ness and/or latency? Disks seems to be good.

Have you done a bpbkar test to null device - to test raw disk read speed? As we are using the BCV here so its dificult to test the raw speed!

Have you checked your buffer wait counts (client side and media server side)? : How to check?

I have also enabled the bpbkar logs on the master server to dig more onto it.

@Marriane: I hope That I have answered your queries for any more information plz let me know.

regarding the need for raw backups its the management requirement, cann't say anything more than that!

@Nicolai: I have checked out the BMR option in this policy, regarding the policy type we are using standard for all the raw backups. Let me know if any other option shoudl be choosed!

Policy Type:       Standard (0)
Active:            yes
Effective date:    08/04/2009 11:43:32
Client Compress:   no
Follow NFS Mnts:   no
Cross Mnt Points:  no
Collect TIR info:      no

Let me know if any other information is required by my side.

regards,

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

No. The client name is NOT eux380. Look at Client name in the policy:

Client/HW/OS/Pri/DMI:  eux390....

So, it seems you are taking BCV snapshots that are mounted on the media server, right?

Check SAN configuration -
are all LUNs mapped to a single HBA?
Are tapes on media server mapped to same or different HBAs?
(Hopefully NOT 31 tape drives added to single media server!)

Good thing you have disabled TIR backup for raw filesystem backup.

Oh - about multiple backup jobs - the policy is configured for 4 streams with MPX in the schedule allowing 3 simultaneous jobs.

Buffer size and 'waits' can be seen in bptm log on the media server.
Log folders do not exist by default - create them under /usr/openv/netbackup/logs.

 

Harry_NBAD
Level 4
Certified

Hello Mariane,

Client/HW/OS/Pri/DMI:  eux390....:The client name mentioned in the policy is eux390 (which is the media server and master as well). In our environment, the backup is taken from the BCVs which are configured on eux390 so we have to mention this name over there in the policy.

So, it seems you are taking BCV snapshots that are mounted on the media server, right?: yes.

Check SAN configuration -
are all LUNs mapped to a single HBA?: No , distributed.
Are tapes on media server mapped to same or different HBAs? different HBA's
(Hopefully NOT 31 tape drives added to single media server!): there are 20 drives in thsi backup server.

Regards,

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

I have never seen any media server that can stream more than 3 or 4 tape drives at 100Mbytes/sec or more simultaneously.

Can you try to use dd to read data off the disk and output to /dev/null in order to test read speed for raw device?
Note that this has nothing to do with NBU and will simply test read speed at hardware level.

The next test is to mount a scratch tape in a tape drive and use cpio or tar (not sure if tar can backup raw device) to backup directly to tape.
Once again - bypassing NBU to test at OS and hardware level.

What tier disk is used for the BCV snapshots?

PS:
Have you checked if bptm log folder exists?
And if buffer settings have been configured?

Nicolai
Moderator
Moderator
Partner    VIP   

Marianne - how did you got a clue on BCV's ?

As this is a normal file system backup I still don't understand how this work specifying the raw devices.

Testing the raw read speed can be done by:

time dd if=/dev/rdsk/c33t0d5 bs=256k of=/dev/null count=10000

output;

10000+0 records in
10000+0 records out

real       18.2
user        0.0
sys         0.1

#  echo "256*10000/18" | bc

142222 KB/sec

On the media server please tell if NUMBER_DATA_BUFFERS & SIZE_DATA_BUFFERS files are located in /usr/openv/netbackup/db/config 

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Hi Nicolai

There was a post earlier today that told us about BCV snapshots.

Backing up raw devices in Standard Policy is supported.
... backing up is the easy part - restoring them is the trickey part! (No file-level restore, LUN size being restored to, etc... ) Lots of stuff to look out for, all documented in NBU Admin Guide I.

Harry_NBAD
Level 4
Certified

Hello All,

Here I Have analysed at my master server (whioch is also media server) that raw backup speed is going gud as of now after reboot whereas the application backup is too slow. 

Raw Backup: by raw backup I mean the BCV snapshots and backed up to tapes. the speed has improved since we have rebooted the system once but it seems that it is coming to older days slowly and slowly. immediatedly after reboot the speed increased significantly but now the speed is showing the same trends.

App backup: It is the scenario where the backup is being taken from disks on the client node to diskpool storage unit on master server or data from the client disks is directed to tapes directly. Ideally backing up to the diskpool stu should be faster but it is extremely slow and same is the case when data is backed directly from the client disks onto the tapes.

The stu list is mentioned below:

##############

Label:                hcart-robot-tld-0
Storage Unit Type:    Media Manager
Number of Drives:     29
On Demand Only:       no
Density:              hcart2 (14)
Robot Type/Number:    TLD (8) / 0
Max Fragment Size:    1048575
Max MPX/drive:        2

Label:                lto6-hcart-robot-tld-2
Storage Unit Type:    Media Manager
Number of Drives:     4
On Demand Only:       no
Density:              hcart2 (14)
Robot Type/Number:    TLD (8) / 2
Max Fragment Size:    1048575
Max MPX/drive:        2

Label:                DISKPOOL
Storage Unit Type:    Disk
Media Subtype:        Basic (1)
Concurrent Jobs:      80
On Demand Only:       yes
Path:                 "/nbdiskpool"
Robot Type:           (not robotic)
Max Fragment Size:    524288
Max MPX:              1
Stage data:           yes
Block Sharing:        no
File System Export:   no
High Water Mark:      50
Low Water Mark:       10
Ok On Root:           no

Label:                NAS_DISKPOOL
Storage Unit Type:    Disk
Media Subtype:        Basic (1)
Concurrent Jobs:      80
On Demand Only:       yes
Path:                 "/NAS_DISKPOOL"
Robot Type:           (not robotic)
Max Fragment Size:    524288
Max MPX:              1
Stage data:           yes
Block Sharing:        no
File System Export:   no
High Water Mark:      60
Low Water Mark:       10
Ok On Root:           no

Label:                DISKPOOL_1
Storage Unit Type:    Disk
Media Subtype:        Basic (1)
Concurrent Jobs:      80
On Demand Only:       yes
Path:                 "/nbdiskpool_new"
Robot Type:           (not robotic)
Max Fragment Size:    524288
Max MPX:              1
Stage data:           yes
Block Sharing:        no
File System Export:   no
High Water Mark:      60
Low Water Mark:       20
Ok On Root:           no

#################

SIZE_DATA_BUFFERS=262144

NUMBER_DATA_BUFFERS=256

Kindly suggest!

Nicolai
Moderator
Moderator
Partner    VIP   

Have you talked to yore local SAN/Storage admin about this ?. Software can't override the specification of hardware. 

For testing tape drives or disk pool Netbackup GEN_DATA is a excellent tool to find bottle necks. GEN_DATA directive generate backup and in memory and restore data to memory where its then discarded.

Run backup restore test to tape and disk pool and lets us get figures of what the infrastructure is able to do. For read speed from BCV volume use the dd example in my previous post.

Documentation: How to use the GEN_DATA file list directives with NetBackup for UNIX/Linux Clients for Performance Tuning

http://www.symantec.com/docs/TECH75213

 

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

This is a totally different issue and should ideally be posted as a separate discussion:

App backup: It is the scenario where the backup is being taken from disks on the client node to diskpool storage unit on master server or data from the client disks is directed to tapes directly. Ideally backing up to the diskpool stu should be faster but it is extremely slow and same is the case when data is backed directly from the client disks onto the tapes. 

First step is to use bpbkar to see how fast data can be read from client disk.

See: 

Measuring disk performance with bpbkar 
http://www.symantec.com/docs/HOWTO99824

Overview of NetBackup performance testing. 
http://www.symantec.com/docs/TECH147296

About the NetBackup data transfer path 
http://www.symantec.com/docs/HOWTO99831 

The NetBackup Backup Planning and Performance Tuning Guide, Release 7.5 and Release 7.6 
 http://www.symantec.com/docs/DOC7449