cancel
Showing results for 
Search instead for 
Did you mean: 

rman child jobs are not created

mahmoud_mohamed
Level 3

Hi , 

i have a windows server client containg oracle db .

backup was running successfully but suddendly parent process stop generating child jobs

this is job details of parent job

 

ob Details :
Aug 31, 2017 5:10:00 PM - Info nbjm (pid=5608) starting backup job (jobid=6852875) for client CAISVGIS05, policy SV_GIS_DB_Weekly_New, schedule Weekly_Full
Aug 31, 2017 5:10:00 PM - Info nbjm (pid=5608) requesting MEDIA_SERVER_WITH_ATTRIBUTES resources from RB for backup job (jobid=6852875, request id:{7545E76E-8E5E-11E7-9618-F0921C18FD94})
Aug 31, 2017 5:10:00 PM - requesting resource SV_DD4500_Boost_caisvvmb01
Aug 31, 2017 5:10:00 PM - requesting resource bkpsrv1.NBU_CLIENT.MAXJOBS.CAISVGIS05
Aug 31, 2017 5:10:00 PM - requesting resource bkpsrv1.NBU_POLICY.MAXJOBS.SV_GIS_DB_Weekly_New
Aug 31, 2017 5:10:00 PM - granted resource  bkpsrv1.NBU_CLIENT.MAXJOBS.CAISVGIS05
Aug 31, 2017 5:10:00 PM - granted resource  bkpsrv1.NBU_POLICY.MAXJOBS.SV_GIS_DB_Weekly_New
Aug 31, 2017 5:10:00 PM - granted resource  SV_DD4500_Boost_caisvvmb01
Aug 31, 2017 5:10:02 PM - estimated 0 kbytes needed
Aug 31, 2017 5:10:02 PM - Info nbjm (pid=5608) started backup (backupid=CAISVGIS05_1504192202) job for client CAISVGIS05, policy SV_GIS_DB_Weekly_New, schedule Weekly_Full on storage unit SV_DD4500_Boost_caisvvmb01
Aug 31, 2017 5:10:03 PM - started process bpbrm (pid=8384)
Aug 31, 2017 5:10:59 PM - Info bpbrm (pid=8384) CAISVGIS05 is the host to backup data from
Aug 31, 2017 5:10:59 PM - Info bpbrm (pid=8384) reading file list for client
Aug 31, 2017 5:10:59 PM - connecting
Aug 31, 2017 5:11:07 PM - Info bpbrm (pid=8384) starting bphdb on client
Aug 31, 2017 5:11:07 PM - connected; connect time: 0:00:00
Aug 31, 2017 5:12:14 PM - Info bphdb (pid=2604) Backup started
Aug 31, 2017 5:40:25 PM - Error bpbrm (pid=8384) cannot send mail to root
Aug 31, 2017 5:40:26 PM - Info bphdb (pid=2604) done. status: 150: termination requested by administrator
Aug 31, 2017 5:40:26 PM - end writing
termination requested by administrator  (150)
 
 
 
and the backup selection is : G:\RMAN_Net_Backup\Scripts\Weekly_Inc0.cmd
 
the attached files are log files for bpcd , bphdb , dbclient , rman script , log files of rman 
14 REPLIES 14

Thiago_Ribeiro
Moderator
Moderator
Partner    VIP    Accredited

Hi @mahmoud_mohamed,

Im assuming that this status code 150 was because you cancel this job right?

From rman log, I was seeing this script at the SEND and I didnt find NB_ORA_SCHED...Did you try to put?

As far I know ,the oracle scripts backups use some attributes like these:

Example:
NB_ORA_CLIENT=
NB_ORA_SERV=
NB_ORA_POLICY=
NB_ORA_SCHED=

Recovery Manager: Release 11.2.0.4.0 - Production on Thu Aug 31 15:28:38 2017

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.

connected to target database: EMEGS (DBID=1140985314)
connected to recovery catalog database

RMAN> RUN {
2>
3> ALLOCATE CHANNEL ch01 TYPE 'SBT_TAPE';
4> ALLOCATE CHANNEL ch02 TYPE 'SBT_TAPE';
5> ALLOCATE CHANNEL ch03 TYPE 'SBT_TAPE';
6>

Your SEND

7> SEND 'NB_ORA_CLIENT=CAISVGIS05,
NB_ORA_SERV=bkpsrv1,
NB_ORA_POLICY=SV_GIS_DB_Weekly_New'
NB_ORA_SCHED=???

 

Regards,

 

Thiago

Hi Thiago‚

Thanks for you reply ‚ yes you are right ‚  cancelled it

It didn't provide it before and it was working good

Do you think this is the problem ?

Can you bphdb errors  ?

Thiago_Ribeiro
Moderator
Moderator
Partner    VIP    Accredited

Hi,

Maybe not, once you said that this backup was working well before...What changed in your environment?

I saw these errors on bphdb and Im would like to ask how is the communication between master, media and client?

Can you test the communication using these commands bpclntcmd and bptestbpcd on Master-Media and client.

bphdb

17:01:55.438 [8960.7560] <16> bphdb: ERR - send() to server failed: An existing connection was forcibly closed by the remote host.
17:01:55.438 [8960.7560] <16> bphdb: ERR - could not write keepalive to the NAME socket
17:01:56.454 [8960.7560] <16> bphdb: ERR - failed executing command <"G:\RMAN_Net_Backup\Scripts\Daily_Inc1.cmd">
17:01:56.454 [8960.7560] <16> bphdb: ERR - send() to server failed: An existing connection was forcibly closed by the remote host.
17:01:56.454 [8960.7560] <16> bphdb: ERR - could not write ERR - failed executing command <"G:\RMAN_Net_Backup\Scripts\Daily_Inc1.cmd">

 

Regards,

 

Thiago

What should I test using bpclntcmd ?

I have tested telnet on port 13724 from all directions and it working good .

 

Thiago_Ribeiro
Moderator
Moderator
Partner    VIP    Accredited

Hi,

You can follow this TN to test 

Explanation of bpclntcmd options and recommended troubleshooting when the commands return errors - https://www.veritas.com/support/en_US/article.TECH50198

You can test these ports

Port

Description

Protocol

Direction of Connection

13724

VNETD – Netbackup Network Daemon

TCP

Bidirectional

1556

PBX – VxPBX Symantec Private Branch Exchange

TCP

Bidirectional

13782

BPCD – Netbackup Connection Daemon

TCP

Bidirectional

13720

BPRD – Netbackup Request Daemon

TCP

BiBidirectional

 

Regards,

 

Thiago

I have test all connection and they are working except port 13720 on client

As bprd process run only on master server

I check the routing. , every thing is OK 

When the DBA run the script manaul , it should generate job at Activity monitor and that doesn't happen 

 

Marianne
Level 6
Partner    VIP    Accredited Certified
Please show us output of this command on the client:
bpclntcmd -pn

Hi Marianne, Nice to see you here , thanks for support .

"C:\Program Files\Veritas\NetBackup\bin\bpclntcmd. exe" -pn e

xpecting response from server bkpsrv1 caisvgis05 CAISVGIS05 10.195.4.225 56866 >"

C:\Program Files\Veritas\NetBackup\bin\bpclntcmd. exe" -ip 10.195.4.55 host 10.195.4.55: bkpsrv1 at 10.195.4.55

DBA sent us this message

Some Rman sessions stuck with event “Backup: MML create a backup piece” for more than 23 hours , from activity monitor this was the parent job that last for 23 hours with out generating child jobs

I think the problem is from db itself , as we didn't change any thing , I also noticed that when he run manual job , it doesn't come to activity monitor And fs backup is running successfully

Marianne
Level 6
Partner    VIP    Accredited Certified
Seems you have identified the problem.

"Rman sessions stuck with event “Backup: MML create a backup piece” " means that there is something wrong on Oracle side.
Nothing to do from NBU side.

Hi Marianne ,

But why this error appear

bphdb: ERR - send() to server failed: An existing connection was forcibly closed by the remote host.
17:01:55.438 [8960.7560] <16> bphdb: ERR - could not write keepalive to the NAME socket

And should I tell DBA to do to the rman backup without netbackup ?

Like trying to backup control file on local disk ?

DBA was able to backup control file successfully on disk 

Marianne
Level 6
Partner    VIP    Accredited Certified
A normal rman backup script starts with databases, then archive logs, and only then control file.
So, doing a control file backup to disk does not proof anything.

Ask the dba to do control file backup to sbt_tape (NBU). And a database backup to disk.

About the errors in bphdb - it seems to be caused by os-level keepalive timeout:

"could not write keepalive to the NAME socket"
There seems to be long timeouts on the master and media server, causing the parent job to sit and wait for jobs to be initiated from the client.

Your dba should be able to increase logging level within rman. This should help with troubleshooting.

PS:
I have not been able to look at your logs. I only look at forum posts on my phone over weekends. Can only open .txt files.

Hi Marianne ,

after checking with support , we found allocate command  not initaing request to master server , however client can reach master server normally 

he think that it might be aproblem from database or corrupted files from netbackup agent

 

 

policy was running successfully , we uninstall netbackup agent and reboot the server . 

it seems there was some corrupted files  , or problems with windows server 

so we took downtime , and reboot the server and child jobs were creating successfully

Thanks all for your support