cancel
Showing results for 
Search instead for 
Did you mean: 

backup to disk write issues

mad_jock
Level 3

Hello I'm new to netbackup after 5 years with HP Data Protector and I have some real concerns about write to disk performance.

It appears that we have queues waiting for the block to empty so we can write to disk.

we dont use SQL agents our DBA's want to do flat file backups which was a problem for HP and i was wondering if this could be the same issue where netbackup doesnt like millions of 10k files or 400gb + flatfile backups

could someone give me any idea where i should look to find out if this is the case 

Many Thanks in advance

1 ACCEPTED SOLUTION

Accepted Solutions

watsons
Level 6

write_data: waited for full buffer 388 times, delayed 1457 times

From my experience, the above, if it is a backup job, does not look too bad. But the bottleneck should be on client read I/O or network between media server and client. 

If writing to disk is slower than it reads from client, you should see a higher delay of "waited for empty buffer", not the "full buffer". 

Anyway,, you said the media server is Netbackup Appliance 5220, but what version (2.5.x or 2.6.x)?  It is better you start generating a DataCollect for Netbackup Support team to review, have them pointed out if there is any hardware problems that could degrade the performance. 

Netbackup Appliance does come with a tool to check disk I/O using nbperfchk:

http://www.symantec.com/docs/HOWTO101014

The above technote is for your reference, do not simply change the setting unless you have confirmation from Support team.

View solution in original post

12 REPLIES 12

Jaime_Vazquez
Level 6
Employee

What version of NetBackup is the client, the Master, and the Media Server? Same goes for OS versions.

 

You saw "It appears that we have queues waiting for the block to empty so we can write to disk." how/where?

If you are doing the backup of a database "hot", the resulting files will not be consistent and the recovery will be problematic, at best. This could explain your issues as there may be file locking going on with the running database engine.

Is the disk storage unit being used a straight up disk or is deduplication being used?

Debug logs are bpbkar on the client, bpbrm/bptm on the Media Server/Manager.

 

mad_jock
Level 3

Hello Jaime,

 

Thanks for the reply. We dont use SQL intgrated backups, so the SQL DBA's backup to a seperate drive as a flatfile (txt), we then backup these flatfiles.

 

Marianne
Level 6
Partner    VIP    Accredited Certified

Still don't understand what this means:

It appears that we have queues waiting for the block to empty so we can write to disk.

We have seen where other customers are dumping to disk followed by NBU backup that these 'disk backups' are fairly big and definately not millions of 10k files.

400gb+ flatfile backups should not be an issue.

We recommend using the NBU for SQL agent for the following reasons:

  1. 1-phase backup and 1-phase restore 
  2. Sheduling is much easier - no need to carefully co-ordinate with SQL dba's to ensure the SQL dump is finished by the time NBU backup starts
  3. No need to maintain disk space for SQL dumps

mad_jock
Level 3

Hello Marianne,

Thanks for your reply,

I had Pat (Veritas RPC) dial in and run a command that showed disk latecy during backups,

The write to disk was really slow (netbackup 5220). He put this down to due to the block either not emptying or not filling quickly enough (its one or the other way I cant remember)

Write to tape was great and this had no issues.

Our DBA's had a bad experieance with the Netbackup SQL intgrated backup agent, so now what they do is a SQL backup only and and dump the files to a seperate drive, which we backup as flatfile.

The database resides on the D:\ but the backup files reside on the H:\

I can understand why they do this, if a table withing the DB is corrupt they dont need to have a full restore of the DB they just recover the individual table within the flatfile backup.

Is there a command that you could give me that shows the output of disk usage when it comes to writing the data to the 5220 netbackup system.

Many Thanks.

Marianne
Level 6
Partner    VIP    Accredited Certified

The bptm log on the appliance will show throughput. 
At the end of the backup it will show '... waited for full buffers xxx times, delayed xxx times....'

Tape media server will have the same log and similar stats at the end of the backup.

mad_jock
Level 3

Hello Marianne

 

here is just two from the logs

write_data: waited for full buffer 310 times, delayed 1361 times
write_data: waited for full buffer 383 times, delayed 1534 times

Marianne
Level 6
Partner    VIP    Accredited Certified

You need to compare stats between disk and tape backups - similar entries in bptm logs for tape backup.

At the beginning of the backup, the buffer sizes and number of buffers being used for the backup can be seen. At the end, the 'waits' and the transfer rate (Kbytes/sec).

mad_jock
Level 3

Hi Marianne

 

here are some more details please note the complete log is massive.

 

00:29:43.392 [14462] <2> write_data: twin_index: 0 active: 1 dont_process: 0 wrote_backup_hdr: 0 finished_buff: 0 saved_cindex: -1 twin_is_disk 1 delay_brm: 0
00:29:43.392 [14462] <2> write_data: Total Kbytes transferred 0
00:29:43.392 [14462] <2> write_data: first write, twin_index: 0 cindex: 0 dont_process: 1 wrote_backup_hdr: 0 finished_buff: 0
00:29:43.392 [14462] <2> write_data: received first buffer (1048576 bytes), begin writing data
00:29:43.464 [14462] <2> send_media_kbytes_written_establish_threshold: CINDEX 0, RB Kbytes for monitoring = 3000000
00:30:10.666 [14462] <2> write_data: writing block shorter than BUFF_SIZE, 262144 bytes
00:30:10.668 [14462] <2> write_data: writing short block, 262144 bytes, remainder 0
00:30:10.668 [14462] <2> write_data: waited for full buffer 388 times, delayed 1457 times
00:30:10.668 [14462] <2> write_data: Total Kbytes transferred 499968
00:30:10.668 [14462] <2> write_backup: write_data() returned, exit_status = 0, CINDEX = 0, TWIN_INDEX = 0, backup_status = 0
 

We have been experiencing alot of performance issues and i think the disk on the 5220 cant cope with what we are backing up.

I need to find out if extra media servers would help with the backlog of over running jobs would help.

mad_jock
Level 3

what command do i need to use to see the compared difference between tape and disk ??

 

Sorry really new to netbackup and im still learning the ropes 

 

watsons
Level 6

write_data: waited for full buffer 388 times, delayed 1457 times

From my experience, the above, if it is a backup job, does not look too bad. But the bottleneck should be on client read I/O or network between media server and client. 

If writing to disk is slower than it reads from client, you should see a higher delay of "waited for empty buffer", not the "full buffer". 

Anyway,, you said the media server is Netbackup Appliance 5220, but what version (2.5.x or 2.6.x)?  It is better you start generating a DataCollect for Netbackup Support team to review, have them pointed out if there is any hardware problems that could degrade the performance. 

Netbackup Appliance does come with a tool to check disk I/O using nbperfchk:

http://www.symantec.com/docs/HOWTO101014

The above technote is for your reference, do not simply change the setting unless you have confirmation from Support team.

Marianne
Level 6
Partner    VIP    Accredited Certified

what command do i need to use to see the compared difference between tape and disk ??

The entries in bptm look the same. 

The only way to trace the different destination is by looking at the bptm PID in Activity Monitor job details for a job that you want to trace.

This PID can be found in square brackets [ ] after the timestamp.

mad_jock
Level 3

hello the versions are 7.5 and 2.5

I was looking through the attached weblink and we use both iscsi and thin provisioned drives for all our servers.

most of which are vm's