Forum Discussion

schrammd's avatar
schrammd
Level 5
8 years ago

Restores run forever, don't lay files down

Hi,

Here is the background- we backup about 80 NFS shares (off a VNX) through a single media server. The bulk of these gets staged to an Isilon, then copied to tape. We also scrape up daily RMAN-created backups off a seperate VNX and spool those to tape. Single library, 6 LTO5 tape drives, all same media type. Nothing fancy. Been running this setup for years. No changes to buffers or firmware recently.

I recently ran a restore from some of the RMAN tapes which I thought acted wrong in that the KB size written was going well beyond what the actual source files were according to the browse/restpore file list. The job ran and ran, never laid even a byte back down, so I killed it thinking it was a fluke. I then tried to restore a different set of files from that same time period, same results, runs forever, nothing gets written to the destination, whether same or redirected. I then went and randomly chose some other data from other NFS mounts and those all worked fine, in fact I cannot reproduce the problem except on the Oracle-created backups. The only difference I can see is that the RMAN stuff is going direct to tape, no Isilon staging involved. We have restored this same RMAN data before using the same process, but it's been a few months since the last time. Anyhow, I've never encountered anything like this where the restore job runs excessively long, appears to be 'working' (there is a process chewing away at CPU and memory, and the tape drive is spinning, yet nothing gets restored. What could be going on?

thanks

  • Have seen behaviour like this with unlimited fragment size, in that case the problem was that the system had read the whole tape before extracting the file(s).

    The solution/migation was to use a set fragment size, unfortunately this only solve/migate the this for future backups unless you duplicate the earlier/current backups after changing the fragment size.

  • Have seen behaviour like this with unlimited fragment size, in that case the problem was that the system had read the whole tape before extracting the file(s).

    The solution/migation was to use a set fragment size, unfortunately this only solve/migate the this for future backups unless you duplicate the earlier/current backups after changing the fragment size.

    • Michal_Mikulik1's avatar
      Michal_Mikulik1
      Moderator

      Hello,

      so, if I understand correctly, restores from tapes run forever, and restores from Isilon work?

      (The fact that problematic files were RMAN-created has probably no relevance here.)

      Number of kilobytes in restore job hanging or slowly increasing?

      Regards

       

       

      • schrammd's avatar
        schrammd
        Level 5

        If I restore files back to the Isilon or locally to the media server disk, those that "work" work fine, and very fast indeed. The KB matches the file size most of the time. If i try to restore those RMAN files to any target, the job just runs, the KB restored climbs and climbs. I never actually let it run to completion as I got too annoyed with the fact that it should have only taken a few minutes for a 200MB file or so. Are you saying that the if the fragment is huge the single file or files are contained in would need to be entirely "read" just to extract a few things out of it? If this was the case, I don't get why the other restores from similar media with the same max fragment size set work. And no, nothing changed AFAIK with regards to any stu settings, policies, etc. I just checked and the fragments on the oracle tapes ranges from 50-250GB each (we have probably too large of max fragment size).

        thanks 

    • schrammd's avatar
      schrammd
      Level 5

      I was rather interested in nailing this down, especially before it became a firedrill, so I re-submitted the same restore job as yesterday and let it run. Sure enough, after 50 minutes it restored the handful of files, and indeed it completed but does look like it had to chew through 500+GB to find them as they are mux'd images (2 streams). I guess I answered my own question on this as the other random restore tests I did also came from mux'd tapes, but for whatever reason, they took a couple minutes instead of nearly an hour. Those other articles and posts on the fragment topic helped alot too. I am going to do some tests with smaller-than 512GB fragments on this stuff and see how it goes.

      thanks all