Curiosity with OIP Policy 7601

jim_dalton · ‎03-09-2015

Ive got several instances in an OIP policy that backs up redo logs every few hours.

It has worked flawlessy until now.

I notice that for one instance the backup failed.

Then when the window opens next time around, netbackup/OIP doesnt even bother to execute the backup for that instance. All the other ones in the policy continue as normal. And again for the next iteration: its totally absent.

As an experiment I've removed the instance and readded it to the policy to see if it runs next time around.

Thoughts or similar observations?

Its all Solaris client and server.

Thanks in advance,Jim

Nicolai · ‎03-09-2015

The instance isn't deactivated right ?

If you take a look in \netbackup\logs\bpdbsbora on the client is there any signs of the Oracle instance being scheduled (scripts being generated) ?

http://www.mass.dk/netbackup/quick-hints/101-rman-input-a-output-using-intelligent-polices-in-netbackup-76.html

jim_dalton · ‎03-09-2015

Thanks for reply Nicolai, I will investigate your idea.

Update: removing and adding back the instance into the policy makes no difference, it still doesnt initiate a backup. And yes the instance is active. Jim

jim_dalton · ‎03-09-2015

On the client in the bpdbsbora log there is no sign of the backup being initiated, just the error that triggered earlier in the day:

08:00:27.160 [1410] <16> CFECallback::reportStatus: ERR - Unable to connect to <instance>. Error Code = 12,537: ORA-12537: TNS:connection closed
08:05:37.088 [7576] <16> CFECallback::reportStatus: ERR - Unable to connect to <instance>. Error Code = 12,537: ORA-12537: TNS:connection closed

Looking like a bug to me at this point. Jim

jim_dalton · ‎03-09-2015

Interesting...if I create a new policy to just cover the redo logs in my problem instance, the gui throws an error:

"Cannot do manual backup of policy. The policy does not have a list of files to back up."

Most intruiging. Jim

Nicolai · ‎03-09-2015

Ask the DBA if he/she can connect to the database. I suspect this a Oracle listner issue (but I may be wrong). If Netbackup is not able to talk to the instance it won't be able to perform backup.

There is plenty of hits on the TNS ORA-12537 error .e.g

http://www.oradev.com/ORA-12537_TNS_connection_closed.jsp

jim_dalton · ‎03-09-2015

Spot the error? I cant see it yet...

Policy Name:       adhoc_Oracle_redo
Options:           0x0
template:          FALSE
audit_reason:         ?
Names:             (none)
Policy Type:       Oracle (4)
Active:            yes
Effective date:    04/08/2008 11:37:41
Block Incremental: no
Mult. Data Stream: no
Perform Snapshot Backup:   no
Snapshot Method:           (none)
Snapshot Method Arguments: (none)
Perform Offhost Backup:    no
Backup Copy:               0
Use Data Mover:            no
Data Mover Type:           -1
Use Alternate Client:      no
Alternate Client Name:     (none)
Use Virtual Machine:      0
Hyper-V Server Name:     (none)
Enable Instant Recovery:   no
Policy Priority:   100
Max Jobs/Policy:   Unlimited
Disaster Recovery: 0
Collect BMR Info: no
Keyword:           To run OIP backup of redos.
Data Classification:       -
Residence is Storage Lifecycle Policy:    no
Client Encrypt:    no
Checkpoint:        no
Residence:         Master-SL500-LTO4
Volume Pool:       ENCR_Testing
Server Group:      *ANY*
Granular Restore Info: no
Exchange Source attributes:              no
Exchange DAG Preferred Server: (none defined)
Application Discovery:      no
Discovery Lifetime:      0 seconds
ASC Application and attributes: (none defined)
Generation:      19
Ignore Client Direct: no
Enable Metadata Indexing: no
Index server name: NULL
Use Accelerator: no
Client List Type: 1
Selection List Type: 2
Oracle Backup Data File Name Format: NULL
Oracle Backup Archived Redo Log File Name Format: NULL
Oracle Backup Control File Name Format: NULL
Oracle Backup Fast Recovery Area File Name Format: NULL
Oracle Backup Set ID: NULL
Oracle Backup Data File Arguments: SPECIFY_MAX_LIMITS=0,NUM_STREAMS=1,SKIP_READ_ONLY=0,SKIP_OFFLINE=0,OFFLINE=0
Oracle Backup Archived Redo Log Arguments: SPECIFY_MAX_LIMITS=0,NUM_STREAMS=1,INCLUDE_ARCH_LOGS=1
Instance Name/Client/Pri/DMI/CIT: PROPPROD oracle-propprod 0 0 1 0 ?
Include:           (none defined)
Schedule:              Redos_only
Type:                TLOG Archived Redo Log Backup (5)
Calendar sched: Enabled
   Included Dates-----------
       No days of week entered
   Excluded Dates----------
      No specific exclude dates entered
      No exclude days of week entered
Retention Level:     8 (1 day)
u-wind/o/d:          0 0
Incr Type:           DELTA (0)
Alt Read Host:       (none defined)
Max Frag Size:       0 MB
PFI Recovery:        0
Maximum MPX:         1
Number Copies:       1
Fail on Error:       0
Residence:           (specific storage unit not required)
Volume Pool:         (same as policy volume pool)
Server Group:        (same as specified for policy)
Residence is Storage Lifecycle Policy:         0
Schedule indexing:   0
Daily Windows:
   Day         Open       Close       W-Open     W-Close
   Sunday      000:00:00 000:00:00
   Monday      000:00:00 000:00:00
   Tuesday     000:00:00 000:00:00
   Wednesday   000:00:00 000:00:00
   Thursday    012:00:00 014:00:00   108:00:00 110:00:00
   Friday      000:00:00 000:00:00
   Saturday    000:00:00 000:00:00

Jim

jim_dalton · ‎03-09-2015

The db is up and running fine, I have connected to it no problem. There would be mayhem if it were truly not runnng. Even if it were down, I would probbly expect netbackup to trigger a backup, even if it fails.

Jim.

jim_dalton · ‎03-09-2015

If I copy the functioning policy and remove all the instances apart from 1 which does work, the gui throws the same error. Something amiss...if I leave it to run under the scheduler, it works perfectly. All very interesting.

Nicolai · ‎03-09-2015

I re-call a previous connect thread about this ..But I can't find it

jim_dalton · ‎03-09-2015

And so if I put my "problem" instance to run in a separate policy via scheduler, it also runs fine. That confirms the instance and everything associated with it is fine. And yet the official policy simply ignores it still. Jim

Nicolai · ‎03-09-2015

humm - Could it be the original policy is damaged.

Nicolai · ‎03-09-2015

Long shot - But could it be a clean-up issue

http://www.symantec.com/docs/TECH215948

jim_dalton · ‎03-10-2015

Update: without any further intervention on the orignal policy, my ignored instance is now no longer ignored and has run successfully at 0800.

Jim

Marianne · ‎03-11-2015

I still think Nicolai was right about his observation regarding Listener issue. The db can be up/active but listener not. Or incorrect config of the listener in a cluster.

The original error should give a clue of what was wrong.

Handy NetBackup Links

jim_dalton · ‎03-12-2015

Sorry but nothing to do with listener. Without listener, we'd have our clients banging on the door, and we'd know as it would show up on the cluster too. And besides, why did the instance backup spring back into life the following day? Jim

Marianne · ‎03-12-2015

Thanks for sharing your experience.

One possibility that comes to mind: the global setting on the master server for # tries per ## hours.
The default is 2 tries per 12 hours.
I have see users changing it to 1 try per 24 hours.

For some reason, this plays havoc with backups scheduled multiple times per day when a failure occurs.
All runs well as per the schedule, but when there is a failure, this setting in NBU causes backup to be overlooked until the number of hours in the global setting have passed.

Handy NetBackup Links

jim_dalton · ‎03-12-2015

Now that does sound like my issue Marianne. Perhaps it needs some logic to sort out this notion of a day as it doesnt apply to freq based backups that are supposed to run in hours.My setting is default. Im pretty sure thats the cause.Awaiting confirmation from Symantec but I reckon we are there.Jim

VOX

Curiosity with OIP Policy 7601