Forum Discussion

BackupGuy2015

Level 3

10 years ago

netbackup duplication from disk to tape using slp and destaging

hello all, this is my first post but have been looking at this forum for sometimes now. basically here is my env:

3 sites (netbackup 7.6.1)

A (remote). Media server + MSDP

B (HQ). Master, Media server(B1) + MSDP, 4 tape drives LTO5, Basic disk for Destaging to tape.

Media server(B2), Basic disk for Destaging to tape on B1

C (remote). Media server + Basic disk

-backup for remote A goes to MSDP then duplicate to B using schedule slp.

-There are backup that uses B1 as destination and there are also backup that uses B2 as destination as well

-All backup to disk eventually will destage to tape, this is hourly

-All SLP from remote site will replicate to B1 MSDP then duplicate to tape. I have schedule slp setup for this. Only runs during offpeak (non business hours)

As you can see, all backup eventually will write to tape. Because of destaging to tape AND duplication from MSDP to tape runs in serial, the tape drives will gets very busy and all jobs will be competing for it.

the problems:

1. as more destaging jobs and slp jobs competing for tape drive resources, we have to keep increasing our the disk size for backup. Backup images are not fast enough being dumped to tape causing bakcup to fail with disk storage is full.

2. we are already using job priority on backup jobs.the problem is, those high priority job will use the tape drive and keeping those low priority job queue, when next round backup comes in then writing to tape, it will goes into a queue. So this is kind a snow ball effect eventually, duplication jobs and destaging jobs gets more and more in the queue but little gets completed.

We have put mail, database,etc as high priority but they are also have large backup in size, hence it will take long to finish writing to tape (hogging the tape drives).

my questions:

1.from my observation, we are in need more of tape drives, but how many more? is there some sort of formula to calculate this (best practise some sort)

2.has anyone have similar environment and setup? what would be your solution or way around the tape drive bottle neck?

thank you everyone for your assistance.

Windows Server (2003-2008)

sdo
10 years ago
True - it is best practice to NOT zone disk WWPN target and tape WWPN target to the same server/appliance initiator WWPN.

It also used to be best practice - for some older HBAs - to also not mix tape and disk on the same dual port HBA - but this is less and less true these days with more modern HBAs. So, most sites do now mix tape and disk on the same server side dual port HBA card, but just NOT on the same HBA port.

If your server has two dual port HBAs, and thus four initiator ports, and you have dual (resilient) fabric SAN design, then perhaps you can do this:

server HBA1 port 1 initiator - disk target - fabric A

server HBA1 port 2 initiator - tape target - fabric B

server HBA2 port 1 initiator - tape target - fabric A

server HBA2 port 2 initiator - disk target - fabric B

17 Replies

BackupGuy2015
Level 3
10 years ago
thanks for reply guys. sorry been a while since checking this again.

anyway, as nicolai said, if the disk is slow to read and sure enough the data are not fast enough send to tape, then surely there is a way to adjust the tape drive. As I understand, there are 2 config needs adjusting:

Tape drive:

NUMBER_DATA_BUFFERS = 32

SIZE_DATA_BUFFERS = 262144

Disk:

NUMBER_DATA_BUFFERS = 32

SIZE_DATA_BUFFERS = 262144

after changing both number_data_bufferes, here are the result:

7/07/2015 2:57:18 AM - begin Duplicate
7/07/2015 2:57:22 AM - requesting resource per1nbu04-hcart2-robot-tld-0
7/07/2015 2:57:22 AM - awaiting resource per1nbu04-hcart2-robot-tld-0 - No drives are available
7/07/2015 4:52:53 AM - granted resource 7490L5
7/07/2015 4:52:53 AM - granted resource HP.ULTRIUM5-SCSI.001
7/07/2015 4:52:53 AM - granted resource per1nbu04-hcart2-robot-tld-0
7/07/2015 4:52:55 AM - Info bpdbm(pid=6044) catalogued 1232 entries
7/07/2015 4:52:55 AM - Info bptm(pid=9688) start
7/07/2015 4:52:55 AM - started process bptm (9688)
7/07/2015 4:52:55 AM - Info bptm(pid=9688) start backup
7/07/2015 4:52:56 AM - Info bpdm(pid=5552) started
7/07/2015 4:52:56 AM - started process bpdm (5552)
7/07/2015 4:52:56 AM - Info bpdm(pid=5552) reading backup image
7/07/2015 4:52:56 AM - Info bpdm(pid=5552) using 32 data buffers
7/07/2015 4:52:57 AM - Info bptm(pid=9688) media id 7490L5 mounted on drive index 1, drivepath {4,0,5,0}, drivename HP.ULTRIUM5-SCSI.001, copy 2
7/07/2015 4:52:57 AM - begin reading
7/07/2015 4:52:57 AM - Info bptm(pid=9688) INF - Waiting for positioning of media id 7490L5 on server per1nbu04 for writing.
7/07/2015 5:19:31 AM - Info bptm(pid=9688) waited for full buffer 43488 times, delayed 97032 times
7/07/2015 5:19:39 AM - end reading; read time: 0:26:42
7/07/2015 5:19:39 AM - Info bptm(pid=9688) EXITING with status 0 <----------
7/07/2015 5:19:39 AM - Info bpdm(pid=5552) completed reading backup image
7/07/2015 5:19:43 AM - end Duplicate; elapsed time: 2:22:25
the requested operation was successfully completed (0)

@genericus, we are doing disk staging which doesn't show thruput..where do you see it?
sdo
Moderator
10 years ago
During duplications from DSSU to tape, open TaskMgr, click Performance tab, click Open Resource Monitor, maximize, click the Disk tab, expand the lower "Storage" panel...

...if you see Active Time (%) near 100% and a Disk Queue Length > 1.0 then this implies that the disk storage underneath the Windows NTFS drive/volume letter is not able to respond quickly enough to the reading/writing process(es).

If you see the above, then use "PerfMon.msc" to view the disk queues in more detail, i.e. check for long (i.e. > 1) read disk queue and/or long (i.e. > 1) write disk queues.

.

There are some other tips related to checking NTFS volume characteristics here:

https://www-secure.symantec.com/connect/forums/msdp-msdp-slp-duplications-slow-what-check
BackupGuy2015
Level 3
10 years ago
Sorry I meant:

Disk:

NUMBER_DATA_BUFFERS_DISK = 32

SIZE_DATA_BUFFERS_DISK = 262144
sdo
Moderator
10 years ago
When this is seen:

...waited for full buffer 43488 times, delayed 97032 times

...this means that of the 43,000 waits, that bptm was delayed, on average, twice by each wait. The delay count of 97,000 * 15ms is just over 24 minutes of bptm waiting for data. Reducing the buffer count probably won't improve the situation. The issue is that the client (data producer) (i.e. the media server acting as DSSU) is unable to move data quickly enough to bptm (data consumer). There is another layer to this, in that there are usually bptm parent and bptm child processes on the server, but there's probably no need to dig in to that layer, because the data just isn't getting to bptm quickly enough.
Nicolai
Moderator
10 years ago
Regardless of what feed and speed you get, NUMBER_DATA_BUFFERS should be at least 128 for tape.
Genericus
Moderator
10 years ago
To determine throughput speed, I have to do the math, 1TB processed in 3 hours for example is just under 100MB/Sec.

Some devices have limits on how many "restore" jobs can run at once.

Some vendors catagorize output from disk as a restore, even if you are writing to tape.

Check the media server for memory, since number of buffers * size of buffers * number of jobs = real memory used.

Here are my tape media server values: (note - buffers can have NDMP, DISK and RESTORE options)

NDMP must match filer for best results!

Now checking NUMBER_DATA_BUFFERS: 128
Now checking SIZE_DATA_BUFFERS: 262144
Now checking NUMBER_DATA_BUFFERS_NDMP: 64
Now checking SIZE_DATA_BUFFERS_NDMP: 262144
Now checking NET_BUFFER_SZ: 262144
BackupGuy2015
Level 3
10 years ago
hi sdo

thanks for that info. I checked performance monitor as per your link..and here are the result while duplicating is running:

Avg disk queue length: 7

avg read disk queue length: 6

avg write disk queue length: 0.018

disk activity : avg >95%

I would conclude that the disk performance is under performing for sure. Last time I ran camel tool from netbackup, it was not even reacing 130mb/s (recommended) only up to 90 mb/s.

Also, how much effect if the dedupe database and data reside on same drive letter? and if we want to split the msdp database and data on different drive letter, will that only involve changing the configuration or is there technote out there?

thank you

Forum Discussion

netbackup duplication from disk to tape using slp and destaging

17 Replies

Related Content

NetBackup GRT duplications to Tape

RMAN crosscheck marks destaged backups as expired

Duplication in NetBackup

NetBackup for MySQL

Re: netbackup 7.6.0.1 can't detect Storagetek SL150 robot but can detect its tape drive as standalone

Recent Discussions

"failed, status 6" error after increasing datastore (LUN) capacity

command: bperror

MS-SharePoint policy restore error (2804) .

How to restore a backup

How to configure RBAC