cancel
Showing results for 
Search instead for 
Did you mean: 

NetBackup not fully honoring exclude list on Windows Media server

DK12
Level 3

Good day, well afternoon where I am,

I've got a bit of a challenge - and posting details will require I sanitize some things, so if you ask for data please be patient with me!

The environment I'm managing is running NB 8.0. The DBA's don't want to use the NB SQL agent, so all backups are dumped to a volume by the application. That volume is then snapped and presented across the SAN to the media server. This actually works quite well for several other volumes from the same client, and ensures decent backup speeds.

The policy in question is MS Windows, is using multi streaming, and has 3 schedules - daily diff. incr., weekly full, monthly full.  Recently the policy started running one job out of the weekly streams as a Monthly, and returning a SC 90.

I resolved the messed up schedule by re-creating the policy from scratch, but this does not resolve the SC 90 issue.

The problem is that the job fires up, and tries to backup X:\pagefile.sys.  The incrementals correctly handle this, and shut down. The Weekly full does not, it complains about the job "exited with status 90: media manager received no data for backup image".

I've examined the FilesNotToBackup Reg. key, and \Pagefile.sys is listed, and the BackupRestore reg. key does indeed list the FilesNotToBackup key.

Looking at the exclude list I can see that previous admin's have added entries to specifically exclude {volume letter}:\pagefile.sys.  I tried adding this for the specific volume - no success. I cleand up the exclude list to simply have three entries for pagefile.sys with no volume entry, and stuck that under the All policies, all schedules portion of the exclude list, in case this was one of those times where NB on windows is getting case sensitive. (Pagefile.sys, pagefile.sys, PageFile.sys), again, no success.

When I examine the logs, I see that the file list is passed to the client, and I can see that eventually, NB acknolwedges that the file is in the FilesNotToBackup list, but, it still generates the SC 90, which is creating effectively false information in the reporting, because of course no one cares about pagefile.sys, yet here we are.

Anonymized logs will show up ... eventually. :)

DK

 

 

 

 

 

15 REPLIES 15

DK12
Level 3

A bit more info pulled from the client side bpbkar log:

14:57:19.066 [7160.8920] <2> tar_backup_tfi::setupFileDirectives: TAR - backup filename = X:\pagefile.sys
...
14:57:19.081 [7160.8920] <4> tar_backup_tfi::create: INF - after m_pBackupObject->V_Initialize(1)
14:57:19.081 [7160.8920] <4> tar_backup_tfi::create: INF - after m_pBackupObject->V_Initialize(1)
14:57:19.081 [7160.8920] <4> tar_backup_tfi::create: INF - after m_vxbsa_object->initialize
14:57:19.081 [7160.8920] <2> tar_backup_vxbsa::add: INF - called with 'X:\pagefile.sys'
14:57:19.081 [7160.8920] <2> _expand_root: INF - checking for root expansion:<X:\pagefile.sys>
14:57:19.081 [7160.8920] <2> _fix_registry: INF - checking for registry expansion:<X:\pagefile.sys>
14:57:19.081 [7160.8920] <4> dos_backup::V_VerifyFileSystem: INF - corrected name:<X:\pagefile.sys>
14:57:19.081 [7160.8920] <4> dos_backup::V_VerifyFileList: INF - UBS Local Type for 'X:\pagefile.sys' --> 10020002
14:57:19.097 [7160.8920] <2> VxCJUtilGetParams: INF - GetVolumePathName for <X:\pagefile.sys> returned <X:\>
14:57:19.097 [7160.8920] <2> VxCJUtilGetParams: INF - GetVolumeNameForVolumeMountPoint for <X:\> returned <\\?\Volume{8a554feb-9105-4aed-8342-d66aebc4b42e}\>
14:57:19.097 [7160.8920] <2> tar_base::V_vTarMsgW: JBD - not using change journal data for <X:\pagefile.sys>: not enabled
14:57:19.097 [7160.8920] <2> dtcp_write: TCP - success: send socket (516), 71 of 71 bytes
14:57:19.097 [7160.8920] <4> dos_backup::V_Initialize: INF - did not create change journal collection, not needed

.... we go off into the world of Shadow Copy Components, with no apparent errors, then move on to the actual processing

14:57:26.316 [7160.8920] <2> ov_log::V_GlobalLog: INF - v_beds::V_FindFirst() EXIT Name:Shadow Copy Components: bRC:true LastError:0x0:
14:57:26.316 [7160.8920] <2> TransporterLegacyLocal::winClientShm(): DBG - Mapping for shared memory is Global\NetBackup Media Manager SHM Info Path MEDIASERVER_1574193396 (../TransporterLegacyLocal.cpp:601)
14:57:27.332 [7160.8920] <2> TransporterLegacyLocal::winClientShm(): DBG - Multiplex tape backup is initiated. (../TransporterLegacyLocal.cpp:666)
14:57:27.332 [7160.8920] <2> TransporterLegacyLocal::winClientShm(): DBG - Shared memory detail media_shm_info = 1024 524288 967819078 12584 1 1 1 967819078 1 (../TransporterLegacyLocal.cpp:677)
14:57:27.332 [7160.8920] <2> TransporterLegacyLocal::winClientShm(): DBG - backup time in mpx headers is 1574193396 (../TransporterLegacyLocal.cpp:751)
14:57:27.332 [7160.8920] <2> TransporterLegacyLocal::winClientShm(): DBG - SHMID = 967819078, BUF_PTR = 0x000000009fff0000 , BUF_CONTROL = 0x0000000002556008 , readyptr = 0x000000000255c008 (../TransporterLegacyLocal.cpp:785)
14:57:27.519 [7160.8920] <2> Packer::open(): DBG - Started Backup... (../Packer.cpp:272)
14:57:27.519 [7160.8920] <4> backup_create: INF - after backup_class->create.
14:57:27.519 [7160.8920] <4> backup_create: INF - NetBackup Temp Directory: 'C:\Program Files\Veritas\\NetBackup\Temp'
14:57:27.519 [7160.8920] <4> backup_create: INF - before iINI->GetFileList(OV_EXCLUDE, &dwListCt)
14:57:27.519 [7160.8920] <4> backup_create: INF - after iINI->GetFileList(OV_EXCLUDE, &dwListCt)
14:57:27.519 [7160.8920] <4> backup_create: INF - before pOVMatchPList[i]->OVAdd loop dwListCt = 23
14:57:27.519 [7160.8920] <4> backup_create: INF - after pOVMatchPList[i]->OVAdd loop
14:57:27.519 [7160.8920] <4> backup_create: INF - after iINI->GetFileList
14:57:27.535 [7160.8920] <2> vnet_pcache_init_table: [vnet_private.c:214] starting cache size 200 0xc8
14:57:27.535 [7160.8920] <4> backup_create: INF - after backup_class->SetExcludeFileList
14:57:27.535 [7160.8920] <4> backup_create: INF - after iINI->GetFileList
14:57:27.535 [7160.8920] <4> tar_backup::SetIncludeFileList: INF - Starting VDS thread
14:57:27.535 [7160.8920] <4> tar_backup_tfi::UpdateExcludeListWithVHD: INF - UpdateExludeListWithVHD begin
14:57:27.535 [7160.8920] <4> backup_create: INF - after backup_class->SetIncludeFileList
14:57:27.535 [7160.8920] <2> tar_base::V_vTarMsgW: INF - Estimate:-1 -1
14:57:27.535 [7160.8920] <2> dtcp_write: TCP - success: send socket (516), 21 of 21 bytes
14:57:27.535 [7160.8920] <4> tar_backup_tfi::backup_arg_state: INF - V_MODE_BACKUP:Mark transition from estimation state
14:57:27.535 [7160.8920] <4> tar_backup_tfi::backup_arg_state: INF - remember start error count: 0
14:57:27.535 [7160.8920] <4> tar_backup_tfi::backup_arg_state: INF - remember start fatal count: 0
14:57:27.535 [7160.8920] <2> tar_backup_tfi::backup_arg_state: Is ALL_LOCAL_DRIVE Backup = 0
14:57:27.535 [7160.8920] <2> tar_backup_tfi::backup_arg_state: Backup selections for accurate licensing ---- backup_selections = X:\pagefile.sys, backup_size = 0
14:57:27.535 [7160.8920] <4> dos_backup::tfs_scanstart: INF - starting scan for file directive:<X:\pagefile.sys>
14:57:27.535 [7160.8920] <4> dos_backup::tfs_scanstart: INF - starting scan using name:<X:\pagefile.sys>
14:57:27.535 [7160.8920] <4> dos_backup::tfs_scanstart: INF - in 'backup' mode
14:57:27.550 [7160.8920] <4> dos_backup::tfs_startdir: INF - volume guid:<\\?\Volume{8a554feb-9105-4aed-8342-d66aebc4b42e}\>
14:57:27.550 [7160.8920] <4> dos_backup::tfs_startdir: INF - volume mount point:<X:\>
14:57:27.550 [7160.8920] <4> dos_backup::tfs_network_drive_check: DAT - GetDriveType(X:\) returned 3
14:57:27.550 [7160.8920] <4> tar_base::V_vTarMsgW: INF - tar message received from dos_backup::_check_filters
14:57:27.550 [7160.8920] <2> tar_base::V_vTarMsgW: INF - Filtered 'FilesNotToBackup' Object: X:\pagefile.sys
14:57:27.550 [7160.8920] <4> dos_backup::tfs_scannext: INF - no more path list entries
14:57:27.550 [7160.8920] <4> tar_backup::backup_endarg_state: INF - finish error count: 0
14:57:27.550 [7160.8920] <4> tar_backup::backup_endarg_state: INF - finish fatal count: 0
14:57:27.550 [7160.8920] <4> tar_backup_tfi::backup_arg_state: INF - V_MODE_BACKUP:No need to do the estimate. Just continue the backup with the next file
14:57:27.550 [7160.8920] <2> dtcp_write: TCP - success: send socket (516), 65 of 65 bytes
14:57:27.550 [7160.8920] <4> tar_backup::backup_add_last_entry_state: INF - catalog message: Dfi - 0 0 0 -1 23 292 user other 0 0 0 0 /VRTS_IMAGE_SIZE_RECORD
14:57:27.550 [7160.8920] <2> tar_base::backup_finish: TAR - backup: 0 files
14:57:27.550 [7160.8920] <2> tar_base::backup_finish: TAR - backup: file data: 0 bytes
14:57:27.550 [7160.8920] <2> tar_base::backup_finish: TAR - backup: image data: 0 bytes
14:57:27.550 [7160.8920] <2> tar_base::backup_finish: TAR - backup: elapsed time: 9 secs 0 bps
14:57:27.550 [7160.8920] <2> tar_base::V_vTarMsgW: INF - Total Size:0
14:57:27.550 [7160.8920] <2> dtcp_write: TCP - success: send socket (516), 19 of 19 bytes
14:57:27.550 [7160.8920] <4> tar_backup::backup_done_state: INF - V_MODE_BACKUP_POST:
14:57:27.550 [7160.8920] <4> tar_backup::backup_done_state: INF - number of file directives not found: 0
14:57:27.550 [7160.8920] <4> tar_backup::backup_done_state: INF - number of file directives found: 1
14:57:27.550 [7160.8920] <2> tar_base::V_vTarMsgW: INF - Client completed sending data for backup
14:57:27.550 [7160.8920] <2> ov_log::V_GlobalLogEx: KeepAliveThread::stopKeepAlive Waiting for keepalive thread to finish
14:57:27.550 [7160.8920] <4> tar_base::stopKeepaliveThread: INF - keepalive thread stopped
14:57:27.550 [7160.8920] <8> tar_backup_tfi::cleanupTemp: WRN - will not cleanup 'temp' directory, debug level(s) are too high
14:57:27.582 [7160.8920] <4> send_msg_to_monitor: INF - in send_msg_to_monitor()...
14:57:27.582 [7160.8920] <2> dtcp_write: TCP - success: send socket (516), 63 of 63 bytes
14:57:27.582 [7160.8920] <2> tar_base::V_vTarMsgW: INF - EXIT STATUS 0: the requested operation was successfully completed
14:57:27.582 [7160.8920] <2> dtcp_write: TCP - success: send socket (516), 72 of 72 bytes
14:57:27.582 [7160.8920] <2> dtcp_shutdown: TCP - success: shutdown socket (1456)
14:57:27.582 [7160.8920] <2> dtcp_shutdown: TCP - success: shutdown socket (532)


Yet, this is the error in the Activity Monitor:

Nov 19, 2019 2:57:18 PM - Info bpbrm (pid=11088) start bpbkar32 on client
Nov 19, 2019 2:57:18 PM - connected; connect time: 0:00:00
Nov 19, 2019 2:57:19 PM - Info bpbkar32 (pid=7160) Backup started
Nov 19, 2019 2:57:19 PM - Info bpbrm (pid=11088) Sending the file list to the client
Nov 19, 2019 2:57:19 PM - Info bpbkar32 (pid=7160) change time comparison:<disabled>
Nov 19, 2019 2:57:19 PM - Info bpbkar32 (pid=7160) archive bit processing:<disabled>
Nov 19, 2019 2:57:19 PM - Info bpbkar32 (pid=7160) not using change journal data for <X:\pagefile.sys>: not enabled
Nov 19, 2019 2:57:27 PM - Info bpbkar32 (pid=7160) bpbkar waited 0 times for empty buffer, delayed 0 times.
Nov 19, 2019 2:58:21 PM - mounted H06228; mount time: 0:01:14
Nov 19, 2019 2:58:27 PM - positioning H06228 to file 1
Nov 19, 2019 2:58:41 PM - positioned H06228; position time: 0:00:14
Nov 19, 2019 2:58:41 PM - begin writing
Nov 19, 2019 2:58:54 PM - Info bpbrm (pid=11036) media manager for backup id MEDIASERVER_1574193396 exited with status 90: media manager received no data for backup image
Nov 19, 2019 2:58:57 PM - Info bpbrm (pid=11036) child done, status 90
Nov 19, 2019 2:58:57 PM - end writing; write time: 0:00:16


As I understand things, we should have looked at the file list, looked at the FilesNotToBackup + the exclude list, and gone "Ah ha! we don't need to backup X:\pagefile.sys", and simply stopped processing the job and shut it down cleanly, not mount a tape in a drive, then proceed to complain we got no data. This appears to work perfectly well for the Diff. Inc. schedule as there are never any SC 90's for this.

 

Seems I've hit on one that no one has any feedback on. I do seem to get those. 

I'll be punting this to Veritas early next week - PTO starts in about an hour and a half, and I'd rather not be late for that!

I did do the same trace on the diff. backup and yup. it does not generate a SC 90 at any point.  To be honest, It would be better if the job simply didn't fire up, but I do get how product works most of it's logic.

 

davidmoline
Level 6
Employee

I'm a little confused - what exactly is in the backup selection for the policy - can you share please? 
From the detailed job status it appears that the client is being told to backup X:\pagefile.sys

If it is included then the behaviour is expected.

@davidmoline ,

Whups - meant to put that in, but forgot.  The file list is X:\*, with multi streaming enabled, so the parent is passing the sub job the path of X:\pagefile.sys.  As I said, the full backup generates the error, but the incremental doesn't for it's child job. 

So, the environment is now at 8.2 for all but two clients, and as this is the media server, it's not a concern. The problem still persists.  Looking back in history, it started about 2 months ago. Prior to that, the pagefile.sys is not getting listed as a problem child job. The Monthly job runs all child jobs with the correct schedule, and the Weekly job has gone back to running the pagefile.sys job as a Monthly schedule, when all the others are Weekly (I'm talking about the User friendly schedule name, both are full's - just having different retention levels).  The incremental job starts a child for pagefile.sys, sees it's on the exclude lists, and exits gracefully with SC 0. 

So @davidmoline, I don't think this is a "by design" element. If the incremental can cleanly detect it should not backup an item in the exclude list, and exit gracefully, a full backup should as well. Clearly something changed 2 months or so ago, but I'll be darned if I can tell you what. There were the usual OS patches applied but that would have been 2+ weeks prior. The data itself changes on a regular basis - it is MS SQL backups (dump and sweep style vs. using the agent), no patches to the NB binaries occured prior to Dec. 3. This is a very calm, conservative NB environment - nothing wild or cutting edge is being done.  

Since the problem exists in NB 8.2, I'm taking it over to Support this afternoon.

sdo
Moderator
Moderator
Partner    VIP    Certified

I assume that the X: volume is otherwise empty, so what happens if you create a placeholder file, e.g. named:   do-not-delete-me.txt          Does the full backup still fail with status 90 for that volume ?

No, the volume is not empty. $RECYCLE.BIN, a temp file and three folders, and the System Volume Information construct.  

The bulk of the data resides in the folders. the temp file is about 1kb in size. I cannot add data to this volume - it's a snapshot presented over the SAN to the media server daily.

 

sdo
Moderator
Moderator
Partner    VIP    Certified

I did a test NetBackup v8.1.1 on Windows 2016.  I created an empty D: volume, created nothing except for a dummy text file named "pagefile.sys" and created a backup policy to backup just D:\   (no wildcard star)   full backup runs fine and excludes the pagefile.sys file.  I changed the policy to multi-streaming and the selection to D:\*  and I get the same status 90 for the stream that is "D:\pagefile.sys"      So I conclude that exclusion filtering is taking place not at the stream creation stage but only at the file-system walking stage (by bpbkar32) inside each stream.

As for why your job was ok and then suddenly not... maybe the pagefile never used to be on the volume that is snapped.  Or maybe the backup client is configured to "let windows decide" where and how big the pagefile should be, and maybe the C: volume was low on space in the true source machine, and so Windows elected to (or someone had to manually) place the pagefile on the volume that is snapped - or even maybe this, that the Windows OS was configured to allow pagefile usage on C: and also on another volume, but until recently Windows had no virtual memory pressure need to create and use additional pagefile space on another volume and so until recently a pagefile never existed on the volume that is snapped.

@sdo, Thanks for that test - I don't have that luxury in this environment.

Are you getting that same result for a diff. incr. schedule?  In my case, the diff. incr. sees pagefile.sys, spawns a job, declares it's not part of the stuff to backup, and then exits with SC 0.  It's the weekly full, and the monthly full that are not behaving and generating SC 90.  As I mentioned before, the pagefile.sys is in Window's list of don't bother to backup reg. keys, and it's also in the exclude lists wrt. NB. 

If an incremental can exit with SC 0, a full should as well.

The other part of this that is bugging me is that the parent job and all other jobs spawned as schedule Weekly, but the pagefile.sys schedule is set to Monthly.  That problem was fixed, briefly by recreating the policy. It's not working now, and we're back to the wrong info being displayed.  This should NOT be happening - I've worked with NB for far to long at this point in my career, and that smacks of a bug.

The case is open, and I've sent in my data from the 8.0 install.  Just waiting for the current (and must run) incremental to complete so I can get the same data set under NB 8.2. 

Note - couple of edits above to fix typo's that obfuscated clarity.

 

sdo
Moderator
Moderator
Partner    VIP    Certified

If I run a diff schedule, it still creates a stream for my dummy D:\pagefile.sys but the steam completes with status 0.  Incidentally, both the full stream and the diff stream both report a file count of "blank" i.e. not even shown as zero, but I guess that's a cosmetic thing.

I think the reason that your pagefile stream gets a monthly is because NetBackup cannot locate a prior completed backup for X:\pagefile.sys and so the NetBackup Master must be electing to run either:   the schedule the longest retention, or the schedule that has accelerator forced-rescan.  This behaviour is, IMO, normal / expected.

I ran another test, where I renamed my pagefile.sys to no-backup.txt and added no-backup.txt to the client exclusion list and copied the policy to another policy name (so that NetBackup won't try to be clever and cache the resultant list of objects) and same thing, it created a stream for "D:\no-backup.txt" but the stream failed witht status 90 because bpbkar32 didn't send any data.  And, you're not going to like this, but this scenario feels familiar to me, whereby a wilcard was selecting objects that were in my exclusion list and so I was getting a status 90 - this feels oddly familiar to me, I must have experienced this before at some time.  And again the diff completes status 0 but no files returned in that stream.

The next conclusion I guess then is that incremental backup streams tolerate ending with nothing transfered, whereas full backup streams do not.

@sdo, very interesting results. 

What this suggests to me is we've an edge case that has not been caught before - We should not generate an error when we're not supposed to do work based on Windows Reg. Key and exclude lists. At this maturity level of the product, the wild cards should be cross checked against exclude lists both OS provided, and customized by us NB Admins to meet operational needs, and not generate errors.

For those still using OpsCenter for reporting, this is going to create false reporting on failures (esp. the Advanced Success Rate report or Success Rate - All Jobs - one of those, if memory serves me correctly, considers any child job failure sufficient to mark the entire client's backups under that policy as failed, even if all of the other sub jobs were 100% successful. My team spent many a month on that issue with Veritas for one customer who was required to submit this info for compliance audits, and they had fiscal impacts on the company and staff)

The wind mill tilting side of my nature says full backups should A) bear the correct schedule name, B) correctly exit with SC 0 in this situation, just like the Diff. Incr. does. 

Just because that's how it is, doesn't make it right. ;)

sdo
Moderator
Moderator
Partner    VIP    Certified

I couldn't agree more with you @DK12 on all points.

And I just performed a similar test with an Appliance 5230 NetBackup Master/Media v2.7.3 and a Windows 2016 NetBackup Client v7.7.3 and same result of full:status=90 diff:status=0 for items generated by multi-streaming yet present within exclusion list.

FYI - my test was on an E: volume, and client uses change journal tracking.

Interestingly I also got status 90 for the streams for each of the three change journal tracking "E:\VxCJ*.dat" files, yet these file names are not in the Windows registry key "FilesNotToBackup" nor in my client exclusion list, and so NetBackup Client v7.7.3 must have at least one other additional hidden list of items that it itself excludes.

.

Which links me to...

...back to my original v8.1.1 test client, I can see that NetBackup Client v8.1.1 does maintain an additional third private list :

$ nbgetconfig -private_exld_list

...yet the change journal tracking files are not listed / matched within this private list... and so there must be yet another additional fourth hardcoded list somewhere (either in NetBackup or in Windows) (or perhaps some other type or level of filtering) which contains at least the VxCJ*.dat files / pattern.

Oh dear. I can only think of Ogres now... y'know, they're like onions...

 

Sorry - watched Shrek too many times with Thing one and Thing two as they grew up.

We'll see if Veritas will accept this as a flaw, and put it in for correction.  Thanks for testing things out!

davidmoline
Level 6
Employee

Hey @DK12 - not withstanding that this appears to be wrong, what is the time stamp on the pagefile.sys (is it still being used at that location)?
If it is not being used (i.e. it is unmodified), the incrmental backup will skip it as it hasn't changed while the full backup will look to back it up (as one of the strams created by your backup selection), but the file is skipped due to the exclude list, so nothing is left - hence the status 90.
I'd still argue that if you direct NBU to backup something (in this case one thing) that is then excluded, a status 90 is the correct result - but that's my opinion only.

sdo
Moderator
Moderator
Partner    VIP    Certified

Irrespective of dates (any) then bpbkar32 clearly honours FilesNotToBackup.  What we want / need is for "selection and stream creation" to apply a pre-filter of *all* (well at least the same four that I have identified - but I guess there still may be more) of the same exclusoion and de-selection, with appropriate logging / description of course (i.e. not embedded at level 8+ etc) of what is being chosen, and what is being submitted (aka streams) for backup.