cancel
Showing results for 
Search instead for 
Did you mean: 

NDMP restore "niggle"

Andy_Welburn
Level 6

Solaris 9 master/media

NetBackup 6.5.6

NetApp ONTAP 7.3.3P3

 

Have just recently come across an annoying little niggle that I have not previously encountered in several years of NB admin!

Backups via NDMP are fine, altho' somewhat time consuming since we lost the ability to backup to SSO drives strung directly off the back of the filer!

However, when restoring folders we have an issue where the restore job just queues at “begin Restore”. If we select individual files from within those same folders then the restore progresses normally to a satisfactory conclusion. But any selection of any folder results in a queued restore.

Any ideas on what could be going on? This isn’t too much of an issue if we’re dealing with small amounts of data, but obviously this is going to be a right royal pain if we ever have to do a major restore on any of our filers.

1 ACCEPTED SOLUTION

Accepted Solutions

Andy_Welburn
Level 6

Restore finished successfully after about 10 hours - done, can forget about that now!

As a test, tried restoring a folder from my home directory (this backup is essentially every users home drives on our filer) - this completed successfully after 2 hours (5 gig). [[user: awelburn]]

I tried again the restore from yesterday [[user: r*****]] - this performed as yesterday, i.e. sat doing nothing, so I cancelled it.

Thinking maybe this was some spurious alpha issue, I tried for another restore [[user: z*****]] - same result.

And again [[user: aa*****]] - same result.

The really curious thing is tho' that for either user z****** (before I cancelled it) or aa***** NB decided, as yesterday, to uncompress ALL images in the catalog for this particular client in it's search for the image to restore.

Addendum: I'm now thinking that this uncompression was for the z****** users image as it ceased shortly after cancelling the job leaving some images still compressed. & I'm of the impression that the aa***** restore will finish that if I let it go on for long enough!

View solution in original post

11 REPLIES 11

Andy_Welburn
Level 6

(sorry for shouting!)

17/03/2011 11:39:32 - begin Restore
17/03/2011 14:11:17 - media 300156 required

it's finally decided to move on. All I can think is it's working something out behind the scenes, but ....

This is from a backup that totals around 4Tb, but normally I would expect the job to start & then take 2 & a half hours before it did anything. Now I'm expecting a similar wait & then some whilst it restores just a few gig!

Will keep you updated........

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

I have logged a call with Symantec some time ago about the same issue.

I was told that DAR was only supported for individual file restores, not directories angry.

*** EDIT ****

Directory restore is performed as non-DAR.

The engineer sent me a couple of TN's w.r.t. directory restore performance:

(this is the only one that still seems relevant):

URL http://www.symantec.com/docs/TECH21246

Some Network Data Management Protocol (NDMP) operations (such as a non-Direct Access Recovery (DAR) restore) may take a long time. The default Veritas NetBackup (tm) behavior is an 8-hour timeout value when waiting for NDMP operations to complete. It is possible to modify this timeout value by creating the NDMP_PROGRESS_TIMEOUT file on the NetBackup media server. The file is created in the /usr/openv/netbackup/db/config/ directory on a UNIX/Linux media server, and in the <install_path>\veritas\netbackup\db\config directory on a Windows media server.
The file must only contain a single number, which is the desired timeout value, in minutes.

 

Seems you're 'lucky' to have it started within 2 hours!!!!


 

Andy_Welburn
Level 6

"...
For restore of directories, by default DAR is always used when restoring a subdirectory but never used when restoring the directory containing an entire image. For example, if /vol/vol0 contains the entire image, and /vol/vol0/dir1 is a subdirectory, DAR is used by default when restoring /vol/vol0/dir1. But it is not used when restoring /vol/vol0. For restore of subdirectories, NetBackup does not attempt to gauge the effectiveness of using DAR. Unless DAR is manually disabled, NetBackup always uses DAR when restoring subdirectories.
..."

http://www.symantec.com/business/support/index?page=content&id=TECH69909

Essentially we were restoring /vol/vol0/home/user/dir/subdir/ .... so even further down the line! I would've also expected that the "issue" with DAR would come further down the line once it'd allocated resources etc. & not prior. Hmmmm.

Looking at "All Log Entries" report it certainly doesn't appear to've used DAR for this particular restore (at least not yet!). We carried out a file restore earlier & that mentioned that DAR was enabled, but after it allocated resources.

I'm sure we've done this before without any such issues, but then again maybe our oppos have been keeping things from me!

Will keep an eye on this one & see what it eventually does...

Marianne
Moderator
Moderator
Partner    VIP    Accredited Certified

Check bprd and bpdbm logs -  all of this is still bprd & bpdbm processing:

17/03/2011 11:39:32 - begin Restore
17/03/2011 14:11:17 - media 300156 required

Andy_Welburn
Level 6

There was an awful lot of info in bpdbm just searching for the string that matched the NDMP hostname! Not sure how relevant most of it was, but there was a lot of "uncompress" in there (which I thought would've been done at the BAR GUI access initially due to catalog compression in place) - it almost seems as if it didn't know exactly where to look for the image it needed & so trawled through every one in the catalog:

12:16:05.322 [5257] <2> image_by_backupid: trying MYCLIENT for MYCLIENT_1293420636
12:16:05.323 [5257] <2> image_by_backupid: likely found backupid in /usr/openv/netbackup/db/images/MYCLIENT/1293000000/MYPOLICY_1293420636_FULL
12:16:05.325 [5257] <2> image_by_backupid: found MYCLIENT_1293420636

but then did the same again at 12:16, 12:20 and 14:12 (when it finally decided on the tapes it needed) [[EDIT: realised these other times where due to the "testing" I was carrying out - doh!]]

If I get the time I'll have to go through the logs more thoroughly!

Andy_Welburn
Level 6

& "All Log Entries Report" reports "DAR enabled" - so it must be something to do with finding the image(s) it needs in the catalog in the first instance - more investigation required (leave that 'til tomorrow methinks!)

J_H_Is_gone
Level 6

From what I understand (only because I recently read the NDMP admin guide)  DAR goes right to the file on the tape which is why it is faster.

So could you in the BAR find your dir you want to restore and instead of checking the directory on the left  - highlight all the files and sub-dirs on the right and check them?   would that make it us DAR for the restore?

(I am just getting ready to setup our first NAS and am trying to understand the implications of the backups and restores)

Will_Restore
Level 6

File restore says it's writing many thousands of Kb when the file is a fraction the size.  And the job continues to stay active even after the file has been restored. 

That said, the advantage of very fast backup outweighs these quirks, IMO.

Andy_Welburn
Level 6

Also tried checking a few files & a few folders on the RHS, but as long as folders were selected that's when I started to have "issues".

As previously mentioned, it does work eventually but it's that 2 hour pause & the fact that it's apparently uncompressing all the catalog images for that client searching for the right one to restore from that's got me stumped!

It's possibly filer specific - I'll have to do a few more tests to tie it down tho' - we recently (a few months ago) "upgraded" & expanded our filers, so maybe it lies with that.

It's just annoying as we never used to have these issues - thankfully though restores are few & far between.

 

PS: Good luck with your step into the NDMP world, you shouldn't really encounter many, if any, problems.

Andy_Welburn
Level 6

It says "DAR enabled" - what's so direct about it? Takes an age to start, then restores the file, then reads a few more tapes (just in case) before finishing hours later.

But then again, if it wasn't using DAR it'd have to read ALL the tapes for that backup set from start to finish & I don't think 3 days for a 1 gig restore'd go down to well!

Andy_Welburn
Level 6

Restore finished successfully after about 10 hours - done, can forget about that now!

As a test, tried restoring a folder from my home directory (this backup is essentially every users home drives on our filer) - this completed successfully after 2 hours (5 gig). [[user: awelburn]]

I tried again the restore from yesterday [[user: r*****]] - this performed as yesterday, i.e. sat doing nothing, so I cancelled it.

Thinking maybe this was some spurious alpha issue, I tried for another restore [[user: z*****]] - same result.

And again [[user: aa*****]] - same result.

The really curious thing is tho' that for either user z****** (before I cancelled it) or aa***** NB decided, as yesterday, to uncompress ALL images in the catalog for this particular client in it's search for the image to restore.

Addendum: I'm now thinking that this uncompression was for the z****** users image as it ceased shortly after cancelling the job leaving some images still compressed. & I'm of the impression that the aa***** restore will finish that if I let it go on for long enough!