Netbackup 9.1.0.1 VMWARE backups question regardin...

bc1410 · 4 weeks ago

Netbackup 9.1.0.1 (soon to upgrade to 10.x) windows 2019 environment (physical server). One master/Media server writing backups to TAPE

We recently have got a new storage SAN and storage admins have been migrating data and the VMs that originally had RDM drives, well now they are making them VMDKs. So for example we have a VM that has 2 drives. One drive has a size of 100GB (such as a c:\ drive) then the VM has a second drive that is 5TB. On the 5TB drive - only like 200GB of space is actually being used as of right now.

I assumed that yes the initial FULL level vmware job would back everything (all blocks) up which it did --> 5 plus TBs.

We do a FULL level weekly and Incremental daily. But Im not sure why our FULL level is still backing up 5 plus TBs on each FULL. What are we doing wrong??

In the vmware admin guide I see on page 151 (https://www.veritas.com/content/support/en_US/doc/21902280-151824030-1) That when you use the "Enable block-level incremental backup" option in the policy Netbackup vmware changed vlock Tracking feature (CBT) to redue the backup size. I have confirmed from our vmware admin that CBT in vmware is definitely enabled. The netbackup vmware guide goes on to say a "backup of entire VM with a FULL schedule - Backs up only the blocks that have changed since the .vmdk was created. Note that the blocks that are not initialized are excluded from the backup."

So on our vmware policy we have the following . (image below)

Any help in this would be appreciated

additional questions..

If we were to just do a vmware incremental/ diff without a FULL level schedule in place is it accurate to say that netbackup would perform a FULL level for the initial backup?

Would it be safe to maybe just do a FULL level once a month on the vms that have large vmdks (like 5TB ) and then just do incremental daily? would we still be able to restore using an incremental?, since we have the dependent FULL level out there.?

Thanks BC

bc1410 · 4 weeks ago

thanks again for any help

davidmoline · 4 weeks ago

Hi @bc1410

First I trust you have accelerator enabled for this backup. This will greatly reduce the time to complete subsequent full (and incremental backups) as only the changed blocks are captured. This of course assumes that you are writing to a storage device capable of supporting accelerator (such as MSDP).

The reporting of VMware backups in the GUI is not entirely helpful, and you need to delve into the detailed messages for the backup to see what is happening. Although the backup is reporting 5TB, only the used blocks (200GB) will be captured and sent to storage, and then with with deduplication the data transferred will be less again. Review the last 10 or so lines from the detailed status for the job. For instance:

30/05/2024 7:03:44 PM - Info bpbkar (pid=2499385) accelerator sent 494618624 bytes out of 27686256640 bytes to server, optimization 98.2%
30/05/2024 7:03:44 PM - Info bpbkar (pid=2499385) bpbkar waited 120 times for empty buffer, delayed 114984 times, each delay is 1000 micro-seconds
30/05/2024 7:03:48 PM - Info XXXXX (pid=2499418) StorageServer=PureDisk:XXXXXX; Report=PDDO Stats (multi-threaded stream used) for (XXXXXX): scanned: 27040342 KB, CR sent: 256359 KB, CR sent over FC: 0 KB, dedup: 99.1%, cache disabled, where dedup space saving:98.1%, compression space saving:1.0%
30/05/2024 7:03:49 PM - Info XXXXX (pid=2499418) Using the media server to write NetBackup data for backup XXXXX_1717059614 to babmd1.bable.ad
30/05/2024 7:03:51 PM - Info bptm (pid=2499418) EXITING with status 0 <----------
30/05/2024 7:03:51 PM - Info XXXXX (pid=2499418) StorageServer=PureDisk:XXXXX; Report=PDDO Stats for (XXXXX): scanned: 179 KB, CR sent: 60 KB, CR sent over FC: 0 KB, dedup: 66.5%, cache disabled, where dedup space saving:3.4%, compression space saving:63.1%
30/05/2024 7:03:51 PM - Info bpbrm (pid=2499357) validating image for client XXXXX
30/05/2024 7:03:51 PM - Info bpbkar (pid=2499385) done. status: 0: the requested operation was successfully completed
30/05/2024 7:03:51 PM - end writing; write time: 0:03:33

The backup has only had to transfer 500MB of 27G to complete the backup. The GUI still reports this as a 17GB backup (this is a differential incremental).

There is a setting to change how the backup size is reported (but I can't find it quickly).

As for the second question - you are correct, if there is no full backup then an incremental will effectively be a full backup (this goes for any backup type). Using accelerator, the Full backup of your large volume will take about the same amount of time that an incremental will, so no benefit to reducing the frequency of fulls.

Cheers
David

bc1410 · 4 weeks ago

Hi @davidmoline - thanks for the reply!!

We dont have accelerator enable since we dont use MSDP. Im not to familiar with the accelerator option but read you need to have MSDP which we dont use and the storage should not be to tape which it is - Our backups are written to tape. Is this accurate to say regarding the accelerator option?

Im thinking we may need to scale back the FULL level jobs for the VMs that have very large vmdks. This whole topic of very large vmdks just became active recently as I mentioned that they (storage/vmware admins) started migrating the RDM disks to vmdks not knowing that netbackup will take a long time to backup these new large vmdks..

I mean if we retain our FULL level jobs for 6 months is it safe to say to just run differentials on these VMs that have large vmdks. Maybe run a FULL level once a month instead of once a week. How would a restore play out? Would we still be able to perform a FULL VM restore.?

Thanks

BC

davidmoline · 4 weeks ago

Hi @bc1410

Gawd - tape. Okay, then mostly ignore what I mentioned above (unless you can convince your management that some local MSDP to stage the backups before offloading to tape would help speed up the backups).

Anyway, I still believe even though the GUI shows the backup as the 5TB vmdk size, the actual tape used will only be 200TB - or the used space within the vmdk.

Certainly to answer your question, running a single monthly full backup may speed up the backup side, but the restore side could become a nightmare (especially if you lost one of the required incremental backups from tape). I would suggest not going down this path if you can avoid it.

If you can convince management to let you create MSDP, you may be able to utillise the old storage being migrated off to host an MSDP pool - it would not be a great idea to use the new SAN storage for this (hosting live data and backup data on the one array is a disaster waiting to happen).

Cheers
David

bc1410 · 4 weeks ago

hi @davidmoline

regarding - "Anyway, I still believe even though the GUI shows the backup as the 5TB vmdk size, the actual tape used will only be 200TB - or the used space within the vmdk. "

So the job details specify that the FULL job is taking like 24 hours to run where as the differential incremental backup takes a few mins. In opscenter I see that the FULL job specifies 5 plus TB. How is that possible if you are thinking its only backing up 200TB. I guess your stating it will actually write only 200 GB to tape but is running like its backing up 5 TBs as its take 24 hours.

I see what you mean about just doing a full once a month. I was thinking i could go that route but would need all the differentials incremental backups to retore this VM and there is always that possibility an issue with a tape etc..

So with MSDP which Im not to familiar with - Would we need another physical server with storage set up as a second media server along side our current single master/media physical server.

Thanks for all the great feedback / knowledge

bc1410 · 4 weeks ago

@davidmoline so I should have stated that this particular VM is a RedHat Linux server and not windows. for some reason I was thinking it was a windows box... to many vms.. not sure if it matters or not with regard to the vmdks getting all the blocks backed up regardless if they are being utilized or not.

Also - would the way the vmdks are provisioned make a difference - thin vs thick?

tcsimerson · 4 weeks ago

Since you are sending the backup directly to tape, the behavior you are seeing is expected. Regardless of the CBT tracking activation status at the VMware level, a full backup will read all blocks in the VMDK. The VMDK provisioning status (thin vs. thick) doesn't really matter. NetBackup makes an API call to VMware basically stating "give me all the blocks of the source VMDK". This is why it will take 24 hours for the full backup to stream (VMware reads all 5TB of VMDK storage regardless of block allocation status). At the tape level, all the empty VMDK blocks will compress tremendously and you will see an effective tape storage volume roughly equivalent to what the VM operating system is noting as the used space.

What I used to do when I was a backup admin was to implement a full backup once a month, weekly cumulative incremental backups and daily differential incremental backups. If you are not familiar with the difference between a cumulative and differential incremental backup, here's the difference:

A differential backup will backup all changed blocks since last successful backup.
A cumulative incremental backup will backup all changed blocks since last successful full or cumulative incremental backup.

Using cumulative incremental backups can reduce the backup images needed significantly. Let's say you implement full backups every 30 days and daily differential backups. If you lost the VM at day 29, you would need 29 backup images to get the system back to current state (the full backup and then every incremental since then). With a monthly full, weekly cumulative and daily differential, you would need only 9 backup images (the full backup, 2 cumulative and 6 differential) to get back to current state.

Tom Simerson

bc1410 · a week ago

sorry for the late reply. Been trying a few tests and wanted to report my findings.. Which makes me scratch my head even more.

So we do have two NetBackup environments. One that I have been talking about which is our Windows netbackup 9.1.0.1 that goes to tape (quantum i500 LTO6 tapes).

Our other environment is redhat linux and is netbackup 9.1.0.1 as well and it had a overland NEOXL80 LTO7 tape libraries connected to it.

Both environments are connected to the same "virtual Machine Server" - vmware. Each netbackup environment vmware policies query on the vmware networks to select the various vms for backing up.

So as you know I said we have this particular linux vm that contains a 5TB vmdk and it is taking 24 hours to back it up and its stating that its backing up just over 5TB which you guys stated is expected (all the blocks). I tend to agree with that logic.

But a co worker pointed out to me that we have a similar VM that is also linux VM with a 5TB vmdk as well but is getting backed up on our Linux red hat Netbackup 9.1.0.1 to the LT07 library. The full backup is not backing up all the blocks. its telling me that it only backed up 54 GB for the FULL which is the actual amount of data being used.

So I decided to adjust the vmware policy query that is on the linux netbackup environment to capture the VM that I originally opened this thread about which normally gets backed up in our netbackup windows environment. And the full job that ran this weekend didnt back up the entire 5TB and instead only backed up the amount of data that is in use.

So now Im really confused.. The only difference is that I can see between the 2 environments (windows & linux netbackups) is that the linux netbackup environment vmware policy has multiplexing set to 4 and doesnt have multiplexing Enabled on the storage unit. The windows netbackup environment has multiplexing on the vmware policy set to 1 and then on the storage unit it has multiplexing enabled with the "maximum streams per drive" set to 32. The other difference is within the vmware policy under the vmware tab "advance" options windows has the "snapshot creation interval = 10 and the netbackup linux environment has it = 120

Thanks

BC

bc1410 · a week ago

So I guess my main question is why is one netbackup environment backing up all the blocks on a full level vm vmdk when the other one isnt. How do I solve why one environment is making that decision and the other isnt.

bc1410 · a week ago

sorry for the multiple replies.. meant to add this all to one.

If I was to increase verbose logging is there a particular log that I can view to see what is going on with regard to vmware backups etc..

VOX

Netbackup 9.1.0.1 VMWARE backups question regarding vmdks