cancel
Showing results for 
Search instead for 
Did you mean: 

Netbackup DFSR backup very slow

rshehab
Level 3

Hello, we recently deployed netback 7.7.3 on windows 2012. we have a Data domain storage. Everything is configured by following the veritas and emc guides.

eveything is superb except when it comes to DFSR folders. After some reading we understood "Use Journal" is not supported with dfsr. 

we understood we should add the backup select like 

Shadow Copy Components:\User Data\Distributed File System Replication\DfsrReplicatedFolders\LosAngeles
Shadow Copy Components:\User Data\Distributed File System Replication\DfsrReplicatedFolders\NewYork
Shadow Copy Components:\User Data\Distributed File System Replication\DfsrReplicatedFolders\Denver

we have a 2 TB of data and the backup durations of full and incrimental backup are not ecceptable. I read a lot of posts from people complaining from the same issue when backing up the shadow copy components bot none was helpful.

i think i missed something. your help will be much appreciated. 

let me know if i should provide more info. 

38 REPLIES 38

Michael_G_Ander
Level 6
Certified

As DSFR now functions with a VSS snapshot, you will be limited by how fast you read the snapshot(s) which in turn depends on how much activity there is on the original volume(s). One thing to be aware of is that VSS is pretty heavy on the virtual memory, so a bigger pagefil can help.

Playing with the buffer and raw buffer size might give some improvement.

There also might be a setting in windows which tells it not use VSS for backup/restore of DSFR

And you can consider to stop the DSFR service before the backup and starting after, like you did in the earlier days.

The standard questions: Have you checked: 1) What has changed. 2) The manual 3) If there are any tech notes or VOX posts regarding the issue

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Please tell us what 'not acceptable' amounts to.
What kind of transfer rates are you seeing?

Are you running backups as a single stream or multiple simultaneous streams?

What type of network connectivity do you have between Fileserver -> media server -> DD?

When last was fragmentation checked on the filesystems?

Do you have Accelerator selected in the policy? 
Although Change Journal is not supported, Accelerator is supported.

sdo
Moderator
Moderator
Partner    VIP    Certified

Some simple calcs, show that (when based on size only) to take a 2TB full backup in say 12 hours (from say 20:00 Saturday night to Sunday 08:00), then we need to achieve an average minimum sustained throughput of 49 MB/s:

size 2 TB
size 2,048 GB
size 2,097,152 MB
time 12 hours
time 43,200 seconds
speed 49 MB/s

 

I'd be interested to know how many total leaf objects across the namespaces that make up that 2TB?  i.e. how many folders, plus how many files?

Nicolai
Moderator
Moderator
Partner    VIP   

What we found helpfull about DFS-R backup.

  • No more than two concurrent backup stream in the DFS-R areas
  • Ensure backup of local disks via VSS doesn't happen when backup of DFS-R run.

Best Regards

Nicolai

hello, thank you for replying.

check the image attached. i selected few folders around 100 GB only. see the time taken to complete. i am sure it is not a cummunication issue as i have other non-dfsr jobs running smoothly. 

in addition, we still have the old backup software from Arcserve running. backup job of dfsr folders were taking less time but without "Shadow Copy". 

Total Directories............ 549,174
Total File(s)................ 13,425,612
Total Skip(s)................ 0
Total Size (Disk)............ 2.16 TB
Total Size (Media)........... 2.21 TB
Elapsed Time................. 21h 1m 34s
Average Throughput........... 1.79 GB/min

 

please help as we need to shutdown the old backup software and continue with netbackup.

 

thanks. 

sdo
Moderator
Moderator
Partner    VIP    Certified

Ok - we're definitely not comparing apples here.

You are comparing:

a) ArcServe plain file backup, not SCC: based, unknown whether VSS based, reading unconfirmed client, using unconfirmed media server, sending to unconfirmed storage, selecting unconfirmed client source paths

...comparing with:

b) NetBackup SCC: based DFSR backup, will be using VSS, reading unconfrmed storage, using a different media server, sending to DD storage, selecting different source paths?

...so I think we are going to drop any notion of debugging a slow backup by comparing unknown situations between two different applications and configurations.

.

Now then, let's get back to the png screen shot that you posted.  Yes, this looks really slow:

elapsed 17:09:24 time
elapsed 61,764.0 seconds
total 106,744,880,128.0 bytes
sent 54,760,727,552.0 bytes
names 835,916.0 names
sent 53,477,273.0 KB
sent 52,223.9 MB
sent 51.0 GB
total 104,243,047.0 KB
total 101,799.9 MB
total 99.4 GB
total - disk read 1.6 MB/s
sent - LAN speed 0.8 MB/s
names throughput 13.5 names/second
waited for full buffer 44,537.0 times
delayed count 3,861,463.0 times
parent delay 1.5 ms
parent delay 0.0015 seconds
delayed seconds 5792.1945 seconds
delayed time 01:36:32 hh:mm:ss

.

What really strikes me about this is not only the very slow disk read of 1.6 MB/s (and FYI accelerator reduces this to 0.8 MB/s LAN send speed)... but it is the "bptm delayed count" which is most striking which sums to a total delay of about 1.5 hours in a 17 hour backup job. i.e. the NetBackup media server has only lost 1.5 hours out of the 17 hours.... which means that the NetBackup Media Server is probably struggling.

The backup job ID number 1401 is quite a low number which is indicative of either a new build NetBackup environment, or a test environment.

So, is this a new build production environment, or a virtualised test environment?

Was this backup job being sent over a WAN link?

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Could you please answer the rest of the questions that I have asked on Thursday?

As well as all the suggestions and questions asked by Nicolai and sdo.

Thanks!

Hello,

thank you all for your replies.

sdo, this is a new installation. i followed the guide and hopefully I didn't miss anything. 
this is a physical environment. the backup server and the dfs server are both on the local network. for testing i will be using only one data source. 
yes I am not comparing the products and i believe NetBackup should give us better results. hopefully we will achieve that with your help. 


Marianne, i am currently trying both single stream and multi stream but both are taking big amount of time. the backup server, media server and dd are all connected on a 10 gig network. 
defragmentation is run on weekly basis. yes, accelerator is selected.

Nicolai, yes i checked again and i am sure there was no backup of local disks via VSS and no more than two concurrent backup streams.

sdo
Moderator
Moderator
Partner    VIP    Certified

Is the Data Domain backup storage device on the same network or on same LAN infrastructure within the same room, building, site... or is the Data Domain accessed via a WAN link?

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Do you have bpbkar log enabled for this client? Level 3 or higher.

This TN describes a problem in older versions of NBU:
http://www.veritas.com/docs/000019459

The TN is good in the sense that it explains the 3 different phases of DFSR backup and what to look for in bpbkar log.

 

thank you all for your replies.

i ran some tests and still struggling to understand the reason behind this issue. i am currently looking into the buffer and waiting times

"01/17/2017 18:50:41 - Info bptm (pid=6300) waited for full buffer 142918 times, delayed 808260 times"

 

01/17/2017 15:19:24 - Info nbjm (pid=4776) starting backup job (jobid=1633) for client dfs01, policy test, schedule Manual
01/17/2017 15:19:24 - Info nbjm (pid=4776) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=1633, request id:{5AF2DE5C-613F-45CC-8D66-E2FFC51A54B3})
01/17/2017 15:19:24 - requesting resource HP_P2000_stu
01/17/2017 15:19:24 - requesting resource nbu01 .NBU_CLIENT.MAXJOBS. dfs01
01/17/2017 15:19:24 - requesting resource nbu01 .NBU_POLICY.MAXJOBS.test
01/17/2017 15:19:24 - granted resource nbu01 .NBU_CLIENT.MAXJOBS. dfs01
01/17/2017 15:19:24 - granted resource nbu01 .NBU_POLICY.MAXJOBS.test
01/17/2017 15:19:24 - granted resource MediaID=@aaaab;DiskVolume=PureDiskVolume;DiskPool=HP_P2000_Disk_Pool;Path=PureDiskVolume;StorageServer= nbu01 ;MediaServer= nbu01
01/17/2017 15:19:24 - granted resource HP_P2000_stu
01/17/2017 15:19:24 - estimated 0 kbytes needed
01/17/2017 15:19:24 - Info nbjm (pid=4776) started backup (backupid= dfs01_1484659164) job for client dfs01, policy test, schedule Manual on storage unit HP_P2000_stu
01/17/2017 15:19:25 - started process bpbrm (pid=8304)
01/17/2017 15:19:26 - Info bpbrm (pid=8304) dfs01 is the host to backup data from
01/17/2017 15:19:26 - Info bpbrm (pid=8304) reading file list for client
01/17/2017 15:19:26 - connecting
01/17/2017 15:19:27 - Info bpbrm (pid=8304) starting bpbkar32 on client
01/17/2017 15:19:27 - connected; connect time: 0:00:00
01/17/2017 15:19:29 - Info bpbkar32 (pid=17472) Backup started
01/17/2017 15:19:29 - Info bpbkar32 (pid=17472) change time comparison:<disabled>
01/17/2017 15:19:29 - Info bpbkar32 (pid=17472) archive bit processing:<enabled>
01/17/2017 15:19:29 - Info bptm (pid=6300) start
01/17/2017 15:19:39 - Info bptm (pid=6300) using 262144 data buffer size
01/17/2017 15:19:39 - Info bptm (pid=6300) setting receive network buffer to 1049600 bytes
01/17/2017 15:19:39 - Info bptm (pid=6300) using 30 data buffers
01/17/2017 15:19:42 - Info bptm (pid=6300) start backup
01/17/2017 15:19:45 - Info bptm (pid=6300) backup child process is pid 8052.9108
01/17/2017 15:19:45 - Info bptm (pid=8052) start
01/17/2017 15:19:45 - begin writing
01/17/2017 18:50:41 - Info bptm (pid=6300) waited for full buffer 142918 times, delayed 808260 times
01/17/2017 18:50:45 - Info bptm (pid=6300) EXITING with status 0 <----------
01/17/2017 18:50:45 - Info nbu01 (pid=6300) StorageServer=PureDisk: nbu01 ; Report=PDDO Stats (multi-threaded stream used) for ( nbu01 :( scanned: 65674673 KB, CR sent: 443258 KB, CR sent over FC: 0 KB, dedup: 99.3%, cache hits: 0 (0.0%)
01/17/2017 18:50:45 - Info bpbrm (pid=8304) validating image for client dfs01
01/17/2017 18:50:48 - Info bpbkar32 (pid=17472) done. status: 0: the requested operation was successfully completed
01/17/2017 18:50:48 - end writing; write time: 3:31:03
the requested operation was successfully completed (0)

i will enable bpbkar log and run the job again. 

sdo
Moderator
Moderator
Partner    VIP    Certified

You could also try a source client disk read speed test:

How to benchmark the performance of the bpbkar32 process on a Windows client
http://www.veritas.com/docs/000013839

...to dertermine how quickly the source file system can be walked and read.

Of course we need to remember that an actual DFSR backup has a fair bit of other functionality going on... like VSS and DFSR engagement... but the speed read test might reveal poor underlying storage performance on the client file-system.   Obviously choose a time when no backups and minimal user based IO is occuring.  And/or disable backups of this client whilst your speed test is running.

hello can you help me understand the log?

11:14:28.217 [15736.548] <4> tar_backup_tfi::determineEstimate: INF - Files: 1079795
11:14:28.217 [15736.548] <4> tar_backup_tfi::determineEstimate: INF - Folders: 6216
11:14:28.217 [15736.548] <4> tar_backup_tfi::determineEstimate: INF - Bytes Data: 416473664
11:14:28.217 [15736.548] <4> tar_backup_tfi::determineEstimate: INF - Gigabytes Data: 58
11:14:28.217 [15736.548] <4> tar_backup_tfi::determineEstimate: INF - Bytes Image: 0
11:14:28.217 [15736.548] <4> tar_backup_tfi::determineEstimate: INF - Gigabytes Image: 0

 

sdo
Moderator
Moderator
Partner    VIP    Certified

That appears to be an incomplete log of a backup job, i.e. it shows the estimates (based on previous backups).

1) Did the backup finish?

2) Is there any reason why you are not doing a bpbkar32 to null disk volume read speed test?

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

As per TN http://www.veritas.com/docs/000019459, you need a full bpbkar log to see which phase of DFSR backup is causing the bottleneck. 

Please try to upload files in .txt format.
Anything else cannot be viewed online and must be downloaded first.

manatee
Level 6

right. so i didn't read the rest of the thread. how have you configured your backup policy? do you use a dedicated NIC for backup? do you use multiple streams? where do you do duplication (server or client)? how long do you keep the backup image on disk?

some questions may not be relevant but those are the questions i used as guidelines to help me achieve acceptable backup time (2.48 hours for 1.8TB).

hth.

sdo
Moderator
Moderator
Partner    VIP    Certified

You seem to have the NetBackup Client Job Tracker running.  This will definitely slow down your backups.  I can see 1568 seconds lost just due to initial tracking, which is about 26 minues.  You, or another logged on user, need to close the tracker and exit it from the system tray.  The tracker is a bit of a toy, and no-one really uses it during real production backups.

I can't say how much time it will save on a large backup - but definitely worth turning it off, and then see if your backups speed-up.

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified
This TN explains how to disable the client job tracker in the registry:
http://www.veritas.com/docs/000026977

sdo
Moderator
Moderator
Partner    VIP    Certified

@rshehab any news?